0:00 In the previous video, we learned the basics of XML. 0:02 In this video, we're 0:04 going to learn about Document Type Descriptors, 0:06 also known as DTDs, and also ID and ID ref attributes. 0:11 We learned that well-formed XML 0:13 is XML that adheres to 0:14 basic structural requirements: a single 0:16 root element, matched tags with 0:18 proper nesting, and unique 0:20 attributes within each element. 0:23 Now we're going to learn about what's known as valid XML. 0:26 Valid XML has to adhere 0:27 to the same basic structural requirements 0:30 as well-formed XML, but it 0:32 also adheres to content specific specifications. 0:35 And we're going to learn two languages for those specifications. 0:38 One of them is Document Type 0:39 Descriptors or DTDs, and the 0:41 other, a more powerful language, is XML schema. 0:44 Specifications in XML 0:46 schema are known as XSDs, for XML Schema Descriptions. 0:50 So as a reminder, here's how 0:52 things worked with well-formed XML documents. 0:54 We sent the document to a 0:55 parser and the parser would 0:57 either return that the document 0:58 was not well-formed or it would return parsed XML. 1:02 Now let's consider what happens with valid XML. 1:03 Now we use a validating 1:05 XML parser, and we have 1:07 an additional input to the 1:08 process, which is a 1:10 specification, either a DTD or an XSD. 1:12 So that's also fed to the parser, along with the document. 1:15 The parser can again 1:17 say the document is 1:18 not well formed if it doesn't meet the basic structural requirements. 1:22 It could also say that the 1:23 document is not valid, meaning 1:24 the structure of the document doesn't 1:26 match the content specific specification. 1:28 If everything is good, then 1:30 once again "parsed XML" is returned. 1:33 Now let's talk about the document-type descriptors, or DTDs. 1:36 We see a DTD in 1:37 the lower-left corner of the 1:38 video, but we won't look 1:39 at it in any detail, because we'll 1:40 be doing demos of DTDs a little later on. 1:44 A DTD is a language 1:45 that's kind of like a grammar, and 1:47 what you can specify in that language is for 1:49 a particular document what elements 1:51 you want that document to contain, 1:52 the tags of the elements, 1:54 what attributes can be in 1:55 the elements, how the different types of elements can be nested. 1:59 Sometimes the ordering of the 2:00 elements might want to be 2:01 specified, and sometimes the number of occurrences of different elements. 2:06 DTDs also allow the 2:07 introduction of special types of 2:09 attributes, called id and idrefs. 2:11 And, effectively, what these allow you 2:13 to do is specify pointers within 2:15 a document, although these pointers are untyped. 2:19 Before moving to the demo, 2:20 let's talk a little bit about 2:21 the positives and negatives about 2:22 choosing to use a DTD 2:24 or and XSD for one's XML data. 2:26 After all, if you're 2:27 building an application that encodes 2:29 its data in XML, you'll have 2:30 to decide whether you want the 2:32 XML to just be well formed 2:33 or whether you want to 2:34 have specifications and require the 2:37 XML to be valid to satisfy those specifications. 2:40 So, let's put a few positives 2:41 of choosing a later of requiring a DTD or an XSD. 2:44 First of all, one of 2:46 them is that when you write your 2:47 program, you can assume 2:49 that the data adheres to a specific structure. 2:52 So programs can assume a 2:54 structure and so the 2:56 programs themselves are simpler because they don't 2:57 have to be doing a lot of error checking on the data. 3:00 They'll know that before the data 3:01 reaches the program, it's been 3:03 run through a validator and it does satisfy a particular structure. 3:07 Second of all, we talked 3:08 at some time ago about 3:10 the cascading style sheet language 3:13 and the extensible style sheet languages. 3:15 These are languages that take XML 3:17 and they run rules on it 3:19 to process it into a different form, often HTML. 3:22 When you write those rules, if 3:24 you note that the data 3:25 has a certain structure, then those 3:26 rules can be simpler, so like 3:28 the programs they also can 3:30 assume particular structure and it makes them simpler. 3:33 Now, another use for DTDs 3:35 or XSDs is as a 3:36 specification language for conveying 3:39 what XML might need to look like. 3:41 So, as an example if you're 3:43 performing data exchange using 3:45 XML, maybe a company is 3:47 going to receive purchase orders in 3:48 XML, the company can 3:50 actually use the DTD as 3:51 a specification for what 3:53 the XML needs to look 3:54 like when it arrives at 3:56 the program it's going to operate on it. 3:59 Also documentation, it can 4:01 be useful to use one of 4:02 the specifications to just document 4:04 what the data itself looks like. 4:06 In general, really what 4:08 we have here is the benefits of typing. 4:11 We're talking about strongly typed data 4:13 versus loosely-typed data, if you want to think of it that way. 4:17 Now let's look at when we might prefer not to use a DTD. 4:21 So what I'm going describe down 4:22 here is the benefits of not using a DTD. 4:25 So the biggest benefit is flexibility. 4:27 So a DTD makes your 4:30 XML data have to conform to a specification. 4:33 If you want more flexibility or 4:34 you want ease of change 4:36 in the way that the data is 4:37 formatted without running into 4:39 a lot of errors, then, if 4:40 that's what you want, 4:42 then the DTD can be constraining. 4:45 Another fact is that DTDs can 4:46 be fairly messy and this 4:48 is not going to be obvious 4:49 to you yet until we get 4:50 into the demo, but if 4:52 the data is irregular, very irregular, then 4:55 specifying its structure can 4:57 be hard, especially for irregular documents. 5:00 Actually, when we see 5:02 the schema language, we'll 5:04 discover that XSDs can be, 5:06 I would say, really messy, so they can actually get very large. 5:10 It's possible to have a 5:11 document where the specification of 5:13 the structure of the document is 5:14 much, much larger than the 5:16 document itself, which seems not 5:18 entirely intuitive, but when we get to 5:19 learn about XSDs, I think you'll see how that can happen. 5:22 So, overall, this is 5:23 the benefits of nil typing. 5:26 It' s really quite similar to 5:28 the analogy in programming languages. 5:31 The remainder of this video will 5:33 teach about the DTDs themselves through a set of examples. 5:35 We'll have a separate video 5:36 for learning about XML schema and XSDs. 5:39 So, here we are 5:41 with our first document that we're 5:43 going to look at with a document type descriptor. 5:45 We have on the left the document itself. 5:47 We have on the right the document-type 5:49 descriptor, and then we have 5:50 in the lower right a command 5:51 line shell that we're going to use to validate the document. 5:55 So this is similar data to 5:56 what we saw on the last video, 5:57 but let's go through it just to see what we have. 5:59 We have an outermost element called 6:01 bookstore, and we have two books in our bookstore. 6:04 The first book has an ISBN number, price and editions. 6:08 As attributes and then it 6:09 has a sub-element called title, another 6:12 sub-element called authors with two 6:13 authors underneath; first names and last names. 6:16 The second book element is 6:18 similar, except it doesn't have a edition. 6:20 It also has, as we see, a remark. 6:23 Now let's take a look at 6:24 the DTD and I'm just going 6:25 to walk through DTD, not 6:27 too slowly, not too fast, and 6:29 explain exactly what it's doing. 6:30 So the start of the 6:31 DTD says this a 6:33 DTD named bookstore and the 6:35 root element is called bookstore, 6:37 and now we have the first grammar-like construct. 6:40 So these constructs, in fact, are 6:42 a little bit like regular expressions if you know them. 6:44 What this says is that 6:45 a bookstore element has as 6:47 its sub-element any number 6:49 of elements that are called book or magazine. 6:51 We have book or magazine. 6:53 We don't have any magazines yet but we'll add one. 6:55 And then this star says, zero or more instances. 6:58 It's the Kleene for those of you familiar with regular expression. 7:02 Now let's talk about 7:04 what the book element has, so that's our next specification. 7:07 The book element has a 7:09 title followed by authors, 7:11 followed by an optional remark. 7:13 So now we don't have an 7:14 "or", we have a comma, and 7:15 that says that these are going to 7:16 be in that order - title, 7:17 authors, and remark and the 7:19 question mark says that the remark is optional. 7:22 Next we have the attributes of our book elements. 7:24 So this bang attribute list 7:26 says we're going to describe 7:27 the attributes and we're going 7:28 to have three of them: the ISBN, 7:31 the price, and the edition. 7:33 C data is the type of the attribute. 7:35 It's just a string. 7:36 And then required says that 7:37 the attribute must be present, whereas 7:39 implied says it doesn't have to be present. 7:41 As you may remember, we have one book that doesn't have an edition. 7:45 Our magazines are simply going 7:46 to have titles and they're going 7:47 to have attributes that are month and year. 7:49 Again, we don't have any magazines yet. 7:51 A title is going to 7:53 consist of string data. 7:55 So here we see our title of first course and database system. 7:58 You can think of that as the leaf data in the XML tree. 8:02 And when you have a leaf that 8:03 consists of text data, this is 8:05 what you put in the DTD 8:06 - just take my word for it: 8:08 hash PC data in parentheses. 8:10 Now our authors are an element that still has structure . 8:14 Our authors have a sub-element, 8:16 author sub-elements or elements, 8:18 and we're going to 8:19 specify here that the 8:21 author's element must have one 8:23 or more author subelements. 8:25 So that's what the plus 8:26 is saying here, again taken from regular expressions. 8:29 "Plus" means one or more instances. 8:32 We have the remark, which 8:33 is just going to be pc data or string data. 8:36 We have our authors which consist 8:38 of a first name sub-element and 8:40 a last-name sub-element, and in that order. 8:42 And then finally, our first names and last names are also strengths. 8:46 So, this is the entire 8:47 DTD and it describes 8:49 in detail the structure 8:51 of our document. 8:53 Now we have a command, we're 8:54 using something called xmllint, 8:57 that will check to see if the document meets the structure. 9:00 We'll just run that command 9:02 here with a couple of options, and 9:03 it doesn't give us any output 9:05 which actually means that our document is correct. 9:09 Well be making some edits and seeing when our document is not correct what happens when we run the command. 9:13 So let's make our first edit, 9:14 let's say that we decide that 9:16 we want the additional attribute 9:17 of our books to be "required" rather than "applied". 9:21 So we'll change the DTD. 9:23 We'll save the file and now when we run our command. 9:27 So as expected we got an 9:28 error, and the error said 9:30 that one of our book elements does not have attribute addition. 9:33 Now that addition is required, every book element ought to have it. 9:36 So let's add an addition to our second book. 9:39 Let 's say that it's 9:41 the second edition, save the 9:43 file, we'll validate our 9:44 document again, and now everything is good. Let's 9:48 do an edit to the document 9:49 this time to see what 9:51 happens when we change the 9:52 order of the first name and the last name. 9:54 So we've swapped Jeffrey Ullman to be Ullman Jeffery. 9:58 We validate our document, and now 10:00 we see we got an error 10:02 because the elements are not in the correct order. 10:04 In this case, let's undo that 10:06 change, rather than change our DTD. 10:09 Let's try another edit to our document. 10:11 Let's add a remark to our first book. 10:13 But what we'll do is 10:14 we'll leave the remark empty, so 10:16 we'll add a opening and then 10:18 directly a closing tag, and let's see if that validates. 10:24 So, it did validate. 10:25 And in fact when we have 10:26 PC data as the type 10:27 of an element it's perfectly acceptable to have a empty element. 10:32 As a final change, let's add a magazine to our database. 10:34 You'll have to bear with me as I type. 10:37 I'm always a little bit slow. 10:39 So we see over here that 10:40 when we have a magazine there are 10:41 two required attributes, the month and the year. 10:44 So, let's say the month is 10:45 January and the year, 10:48 let's make that 2011, 10:50 and then we have a title for our magazine. 10:53 Here. 10:54 We'll go down here. 10:55 Our title, let's make it National Geographic. 11:00 We'll close the tag, title tag. 11:03 And then, sorry again about my typing. 11:05 Let's go ahead and validate the document. 11:08 we saw premature end of something or other. 11:11 We forgot our closing tag for 11:13 magazine, let's put that in. 11:17 My terrible typing, and here we go. 11:19 Let's validate, and we're done. 11:23 Now we're gonna learn about and id rep attributes. 11:26 The document on the left side 11:28 contains the same data as 11:29 our previous document but completely restructured. 11:32 Instead of having authors as 11:33 subelements of book elements, 11:35 we're going to have our authors listed separately, 11:37 and then effectively point from the books to the authors of the book. 11:41 We'll take a look at the 11:42 data first, and then 11:43 we'll look at the DTD that describes the data. 11:47 Let's actually start with the 11:48 author, so our bookstore element 11:51 here has two subelements that are books and three that are authors. 11:55 So, looking at the authors, we have 11:56 the first name and last name 11:58 as sub-elements as usual, but 11:59 we've added what we call the ident attribute. 12:02 That's not a keyword; we've just 12:03 called the attribute ident, and 12:05 then for each of the three authors, 12:07 we've given a string value 12:08 to that attribute that we're going 12:10 to use effectively for the pointers in the book. 12:12 So we have our three authors, now let's take a look at the books. 12:16 Our book has the ISBN number and price. 12:18 I've taken the addition out for now. 12:21 special attribute called authors. 12:23 Authors is an ID reps 12:25 attribute, and it's value 12:27 can refer to one or 12:28 more strings that are ID attributes. 12:31 attributes in another element. 12:32 So that's what we're doing here. 12:33 We're referring to the two author elements here. 12:36 And in our second book we're referring to the three author elements. 12:40 We still have the title subelement 12:41 and we still have the remarks subelement. 12:44 And furthermore, we have one 12:46 other cute thing here, which is, 12:47 instead of referring to 12:49 the book by name within the 12:51 remark when we're talking about 12:52 the other book, we have another type of pointer. 12:56 So we'll specify that the 12:57 ISBN is an ID 12:59 for books and then this 13:01 is an id reps attribute 13:03 that's referring to the id of the other book. 13:07 The DTD on the right that describes the structure of this document. 13:11 This time our bookstore is 13:12 going to contain zero or more 13:14 books followed by zero or more authors. 13:17 Our books contain a title and 13:18 an optional remark is subelements and 13:20 now they contain three attributes, 13:22 the IDBN which is 13:24 now a special type of 13:26 attribute called and ID, the 13:28 price,which is the string 13:30 value as usual and the 13:31 authors which is the special type 13:32 called id reps. Let's keep 13:34 going, our title is just string Value as usual. 13:37 A remark, here this is a actually interesting construct. 13:41 A remark consist of the 13:43 PC data which is string, 13:46 or a book reference and then 13:47 zero more instances of those. 13:50 This is the type of construct 13:51 that can be used to mix 13:52 strings and sub elements within an element. 13:55 So anytime you want an 13:56 element that might have some 13:57 strings and then another element and then more string value. 14:00 That's how it's done. 14:01 PC data or the element type zero or more. 14:05 Then we have our book reference 14:08 which is actually an empty element it's 14:09 only interesting because is has 14:11 an attribute so let's go 14:12 back here we see our book 14:13 wrap here it actually doesn't 14:14 have any data or sub 14:16 elements, but it has an 14:17 attribute called book and that is an ID ref. 14:20 That means it refers to an 14:22 ID attribute of another, another 14:26 element. 14:27 Now we have our authors the first 14:28 name and the last name and 14:30 our author attributes have again 14:33 an ID and we're calling it the ident. 14:35 And finally the first name and last name are string values. 14:39 This may seem overwhelming but the 14:40 key points in this DTD 14:43 are the ID the attributes. 14:44 So the ID attributes, the ISBN 14:46 attributes in the book, and 14:48 the ident, wherever it 14:50 went, ident attribute in the author 14:52 are special attributes, and by 14:53 the way, they do need to be 14:54 unique values for those attributes, 14:57 and they're special in that 14:58 ID refs attributes can refer 15:01 to them, and that will be checked as well. 15:03 Now, I did want to 15:04 point out that the book 15:05 reference here says ID ref singular. 15:08 When you have a singular 15:09 ID ref then the string has 15:11 to be exactly one ID value. 15:13 When you have the plural ID refs. 15:15 Then the string of the 15:17 attribute is one or 15:19 more ID ref value, I'm 15:21 sorry one or more ID values separated by spaces. 15:24 So it's a little bit clunky, but it does seem to work. 15:27 Now let's go to our command line, and let's validate the document. 15:31 So the document is in fact valid. 15:33 That's what it means when we 15:34 get nothing back, and let's 15:35 make some changes, as we did 15:36 before, to explore what structure 15:39 is imposed and what's checked with this DTD in the presence. 15:42 IDs and ID refs. 15:44 As a first change, let's change 15:46 this ID, this identifier 15:48 HG to JU. 15:51 That should actually cause a couple of problems 15:52 when we do that let's 15:53 validate the document and see what happens. 15:56 And we do in fact get two different errors. 15:58 The first error says that 16:00 we have two instances of "JU". 16:03 As you can see here, we 16:04 now have JU twice where 16:06 ID values do have to be unique. 16:08 They have to be globally unique throughout the document. 16:10 The second error that occurred 16:12 when we changed HG to JU 16:14 is we effectively have a dangling pointer. 16:17 We refer to HG here 16:19 in this ID refs attribute but there's 16:21 no longer an element whose value is HG. 16:24 So that's an error as well. 16:25 So let's change it back to 16:27 HG just so our document is valid again. 16:31 Now let's make another change, let's take our book reference. 16:34 We can see that our book reference is referring to the other book. 16:37 We're in the complete book here 16:39 and the comment, the remark is 16:40 referring to the first course 16:41 through the ISBN number, but let's 16:44 change this string instead to refer to HG. 16:47 So now we're actually referring 16:49 to an author rather than another book. 16:51 Let's check if the document validates. 16:54 In fact it does. 16:55 And that shows that the 16:56 pointers when you have a DTD are untyped. 16:59 So it does check to make 17:01 sure that this is an 17:02 id of another element, but we 17:03 weren't able to specify that 17:05 it should be a book element 17:07 in our DTD, and since we're 17:08 not able to specify it, of 17:10 course it's not possible to check it. 17:11 We will see that in XML 17:13 schema, we can have typed 17:14 pointers but it's not possible to have them in DTDs. 17:17 The last change I'm going to 17:19 show is to add a 17:20 second book reference within our remark. 17:22 So as I pointed out over 17:24 here, when we write PC data 17:26 or in an element type 17:28 Kleene, the 17:29 zero or more star, that 17:31 means we can freely mix text and sub-elements. 17:34 So just right in the middle here, let's put a book reference. 17:39 and we can put, let's say 17:41 book equals JU, and that 17:45 will be the end of our reference 17:46 there and now we 17:48 see that we have text followed 17:50 by a subelement followed by more 17:51 text then so on. 17:53 That should validate fine, and in fact it does. 17:56 That completes our demonstration of 17:58 XML documents with DTDs.