Week 5: DTD

Posted by Stephanie Vella

Monday, 7 November 2011

QUICK QUESTIONS

This is a smiley. Is it also a well-formed XML document? Say why.
<:-/>
Element names can start with a colon ":", therefore the above smiley is well-formed. However, it is better to avoid using the colon because a colon is used for namespaces. Preferably element names should start with letters (A-Z) or (a-z).
What is the difference between well-formed and valid XML?
An XML document is well-formed, when it follows the XML rules such as: ^[1]
- XML should have a root element;
- Proper nesting in tags;
- Tags are case sensitive;
- Element should have closing tags;
- Attributes should be quoted.
A valid document should be well-formed and also should follow the restrictions which have been specified in the Document Type Definition (DTD). ^[2]
Is it a good idea to start an XML document with a comment, explaining what the document is and what it’s for? Say why.
No, because the XML declaration should be allowed only at the start of the XML document. This is because the comments are not interpreted by the parser. Comments can then be inserted after the XML declaration before the root element, explaining what the document is and what it's for.

LONGER QUESTIONS

A set of documents is to be constructed as follows. The type of document is a college textbook. Every college textbook has a title page, on which is a title and an author and the publisher; optionally, there may be an aphorism. Every college textbook has a title page verso, on which is a publisher’s address, a copyright notice, an ISBN; there may be a dedication, or there may be more than one. Every college textbook has several chapters, and each chapter has several sections, and each section has several bodies of text. A chapter is identified by a chapter number and a chapter title. A section is identified by a section number and a section title. The name of the publisher will always be Excellent Books Ltd. The address of the publisher will always be 21 Cemetry Lane, SE1 1AA, UK. The application that will process the documents can accept Unicode.
Write a .dtd file for this specification.

1:   <?xml version="1.0" encoding="UTF-8"?>
2:   <!DOCTYPE collegeTextbooks [  
3:   <!ELEMENT collegeTextbooks (textbook+)>  
4:   <!ELEMENT textbook (titlePage, titlePageVerso, chapters)>  
5:   <!ELEMENT titlePage (title, author, publisher, aphorism?)>  
6:   <!ELEMENT title (#PCDATA)>  
7:   <!ELEMENT author (#PCDATA)>   
8:   <!ELEMENT publisher (#PCDATA)>  
9:   <!ELEMENT aphorism (#PCDATA)>  
10:  <!ELEMENT titlePageVerso (publisherAddress, copyright, ISBN, dedication*)>  
11:  <!ELEMENT publisherAddress (#PCDATA)>  
12:  <!ELEMENT copyright (#PCDATA)>  
13:  <!ELEMENT ISBN (#PCDATA)>  
14:  <!ELEMENT dedication (#PCDATA)>  
15:  <!ELEMENT chapters (chapter+)>  
16:  <!ELEMENT chapter (ctitle, section+)>  
17:  <!ELEMENT ctitle (#PCDATA)>  
18:  <!ELEMENT section (body+)>  
19:  <!ELEMENT body (#PCDATA)>  
20:  <!ATTLIST chapter cnum ID #REQUIRED>  
21:  <!ATTLIST section snum ID #REQUIRED>  
22:  <!ENTITY publisher "Excellent Books Ltd.">  
23:  <!ENTITY publisherAddress "21 Cemetry Lane, SE1 1AA, UK">  
24:  ]>

Write an XML document that contains the following information: the name of a London tourist attraction. The name of the district it is in. The type of attraction it is (official building, art gallery, park etc). Whether it is in-doors or out-doors. The year it was built or founded [Feel free to make this up if you don’t know]. Choose appropriate tags. Use attributes for the type of attraction and in-doors or out-doors status.
```
1:  <?xml version="1.0" encoding="UTF-8"?>
2:  <attractions>  
3:       <attraction type="castle" status="in-doors">  
4:         <name>Tower of London</name>  
5:         <district>Central London</district>  
6:         <founded>1066</founded>  
7:       </attraction>  
8:  </attractions>  
```
The following (found in the question sheet below) is the document element (root element) of an XML document.
1. It’s clear that it’s concerned with English phrases and their Russian translations. One of the start tags is <targLangPhrase> with </targLangPhrase> as its end tag. Why do you suppose this isn’t <russianPhrase> with </russianPhrase>?
  So that the attribute targLang="Russian" can be changed into another language for example "French" in <phraseBook>
2. Write a suitable prolog for this document.
  <?xml version="1.0" encoding="UTF-8"?>
3. Write a .dtd file to act as the Document Type Description for this document
```
1:   <!DOCTYPE phraseBook[  
2:   <!ELEMENT phraseBook (section+)>  
3:   <!ELEMENT section (sectionTitle, phaseGroup+)>  
4:   <!ELEMENT phaseGroup (engPhrase, translitPhrase, targLangPhrase)>  
5:   <!ELEMENT engPhrase (gloss?)>  
6:   <!ELEMENT translitPhrase (gloss?)>  
7:   <!ELEMENT targLangPhrase (#PCDATA)>  
8:   <!ELEMENT gloss (#PCDATA)>  
9:   <!ATTLIST phraseBook targLang CDATA #REQUIRED>  
10:  ]>  
```
4. The application that is to use this document runs on a Unix system, and was written some years ago. Is that likely to make any difference to the XML declaration?
  The Cyrillic alphabet, that is, Russian, can be represented on a Linux computer using KOI8-R, ISO 8859-5, Windows 1251 Codepage, and ISO 10646-1 UTF-8 Unicode 3.0. Therefore, it should not make any difference.^[3]

Question Sheet:

Question Sheet: Lab 5

Sources:

[1] http://www.w3schools.com/xml/xml_dtd.asp
[2] http://www.informit.com/guides/content.aspx?g=xml&seqNum=223
[3] https://www.ibm.com/developerworks/linux/library/l-u-cyr/

Advanced Web Technologies

Blog posts as part of the BSc Internet Application Development programme to discuss my experience while working on the tasks that will be given during the Advanced Web Technologies module.

Welcome

Blog Archive

Week 5: DTD

QUICK QUESTIONS

This is a smiley. Is it also a well-formed XML document? Say why.
<:-/>

What is the difference between well-formed and valid XML?

Is it a good idea to start an XML document with a comment, explaining what the document is and what it’s for? Say why.

LONGER QUESTIONS

The following (found in the question sheet below) is the document element (root element) of an XML document.

It’s clear that it’s concerned with English phrases and their Russian translations. One of the start tags is <targLangPhrase> with </targLangPhrase> as its end tag. Why do you suppose this isn’t <russianPhrase> with </russianPhrase>?

Write a suitable prolog for this document.

Write a .dtd file to act as the Document Type Description for this document

The application that is to use this document runs on a Unix system, and was written some years ago. Is that likely to make any difference to the XML declaration?

Question Sheet:

Sources:

Advanced Web Technologies

Blog posts as part of the BSc Internet Application Development programme to discuss my experience while working on the tasks that will be given during the Advanced Web Technologies module.

Welcome

Blog Archive

Week 5: DTD

QUICK QUESTIONS

This is a smiley. Is it also a well-formed XML document? Say why. <:-/>

What is the difference between well-formed and valid XML?

Is it a good idea to start an XML document with a comment, explaining what the document is and what it’s for? Say why.

LONGER QUESTIONS

The following (found in the question sheet below) is the document element (root element) of an XML document.

It’s clear that it’s concerned with English phrases and their Russian translations. One of the start tags is <targLangPhrase> with </targLangPhrase> as its end tag. Why do you suppose this isn’t <russianPhrase> with </russianPhrase>?

Write a suitable prolog for this document.

Write a .dtd file to act as the Document Type Description for this document

The application that is to use this document runs on a Unix system, and was written some years ago. Is that likely to make any difference to the XML declaration?

Question Sheet:

Sources:

This is a smiley. Is it also a well-formed XML document? Say why.
<:-/>