0:00 In this video we'll be learning about XML schema. 0:03 Like document type descriptors, XML 0:05 schema allows us a way 0:07 to give content specific specifications for our XML data. 0:11 As you may remember, we send 0:12 to a validating XML parser or 0:15 XML document as well as a description. 0:17 We talked about DTDs in the last video. 0:19 We'll talk about XSDs in this one. 0:21 The validating XML parser will 0:23 check that the document is well 0:24 formed, and it will also check that it matches it's specification. 0:27 If it does, XML comes out. 0:30 If it doesn't we get 0:31 an error that the document is not valid. 0:34 XML schema is an extensive language, very powerful. 0:38 Like document type descriptors we 0:39 can specify the elements we 0:40 want in our XML data, 0:42 the attributes, the nesting of 0:43 the elements, how elements need 0:45 to be ordered, and and number of occurrences of elements. 0:47 In addition we can 0:49 specify data types we can 0:50 specify keys, the pointers 0:52 that we can specify are now 0:54 typed like in DTDs and much, much more. 0:57 Now, one difference between XML 1:00 schema and DTDs is that 1:01 the specification locations in XML schemas 1:03 called XSD's are actually written in the xml language itself. 1:06 That can be useful for example 1:08 if we have a browser that nicely renders the XML. 1:12 The languages I said is vast. 1:14 In this video, we're going 1:15 to show one sort of quote easy example. 1:18 But that example will give 1:19 very much the flavor of XML schema. 1:21 And we'll try to highlight the 1:23 differences between XML schema and using document type descriptors. 1:27 Ok, here were are with our XML document on the left. 1:31 On the right we have our 1:32 XML schema descriptor or XSD 1:34 and we have a 1:36 little command line that we're gonna use for our validation command. 1:39 Now let me just say up front 1:40 that we're not going to be 1:41 going through the XSD line 1:43 by line in detail the way we did with DTDs. 1:45 As you can see it's 1:46 rather long and that 1:48 would take us far too long and be rather boring. 1:52 So what I highly suggest is 1:53 that you download the file for 1:55 the XSD so you can 1:56 look at it yourself and look 1:57 at the entire file as well 1:59 as the XML and give it a try with validating. 2:02 What I'm gonna do in this 2:03 demo primarily is focus 2:05 on those aspects of 2:07 the XSD that are different, 2:09 are more powerful than we had in document type descriptors. 2:12 First, let's take a look at the data itself. 2:15 So we have our bookstore data as usual with two books and three authors. 2:18 Its slightly restructured from any of the versions we've used before. 2:22 It looks closest to the 2:23 last one we used because the 2:24 books and authors are separate 2:26 and the authors are actually exactly the same. 2:28 The have an identifier and a 2:30 first name - last name sub element. 2:31 But the primary difference is in 2:33 the books, instead of using 2:35 ID refs attributes to refer 2:37 from books to authors, we still, 2:38 we now back our back having 2:40 an author's sub-element with the 2:42 two authors underneath and then 2:44 those authors themselves have what 2:46 are effectively the pointers to the identifiers for the authors. 2:49 And we'll see how that's 2:50 going to mesh with the XML 2:52 schema descriptor that we're using for this file. 2:55 So, the other thing I want 2:56 to mention is that right now 2:57 we have the XML schema 2:59 descriptor in one file and the XML in another. 3:02 You might remember for the DTD, 3:04 we simply placed the DTDs 3:05 specification at the top 3:07 of the file with the XML. 3:08 For DTDs you can do it either way in the same file or in a separate file. 3:12 For XSDs, we always put those in a separate file. 3:15 Also notice that the XSD 3:17 itself is in XML. 3:20 It is using special tags. 3:21 These are tags that are part 3:23 of the XSD language, but 3:25 we are still expressing it in XML. 3:28 So we have two XML files, the data file and the schema file. 3:31 To validate the data file 3:33 against the schema file, we 3:34 can use again the XML link feature. 3:37 We specify the schema file, 3:39 the data file and when 3:41 we execute the command 3:43 we can see that the file validates correctly. 3:46 So I'm now going 3:47 to highlight four features of 3:49 XML schema that aren't present in DTD's. 3:52 One of them is typed values. 3:54 One of them is key declarations. 3:57 Similar to IDs but a little bit more powerful. 3:59 One is references which are again 4:01 similar to pointers But a little 4:02 more powerful and finally a currents constraints. 4:06 So let's start with tights. 4:08 In our data we see 4:10 that the price attribute is 4:12 denoted with a string and 4:14 when we had DTDs, all attribute 4:16 values were in fact stringed. 4:18 In excess fees we can 4:20 say that we want to check 4:21 that the values which are 4:22 still look like strings actually confirm to specific types. 4:25 For example we can say that the price must be in integer. 4:29 Again I'm not going to 4:30 be labor the syntactic details but rather 4:32 I'm just going to highlight the 4:34 places in the XSD where 4:35 we're declaring things of interest. 4:37 So specifically here's where we 4:38 declare the attribute price and 4:41 we say that the type of price must be an integer. 4:43 So our document validated correctly, what 4:45 if we change this one hundred to be foo instead. 4:49 Of course with a DTD this 4:50 would be fine because all attributes are treated as strings. 4:53 But if we try to validate 4:54 now we see an error, 4:56 specifically foo is not a value of the correct type. 4:59 So let's change that foo back 5:00 to a hundred so that we validate correctly. 5:04 Next, let's talk about keys. 5:06 In DTD's, we were able to specify ID's. 5:09 ID's were globally unique 5:11 values that could be 5:12 used to identify specific elements. 5:15 For example, when we wanted 5:16 to point to those elements using ID refs . 5:18 Keys are a little 5:20 bit more powerful or more specific I should say. 5:23 If you think about the relational model 5:24 a key in the relational model 5:26 is an attribute or set of 5:27 attributes that must be 5:28 unique for each tuple in a table. 5:31 So, we don't have tables or 5:32 tuples right now but, we 5:33 do have elements and we often have repeated elements. 5:36 So similarly we can specify 5:38 that a particular attribute or component 5:41 must be unique within every element of the same type. 5:44 And we have two keys 5:46 in our specification, one key 5:48 which we can see here for books and one for authors. 5:51 Specifically we say for books 5:53 that the ISBN attribute must be a key. 5:55 And we say for authors 5:57 that the ident attribute must be a key. 6:00 So let's go over to 6:01 our data and let's start by looking at the authors. 6:04 So if we change, for 6:05 example, U to HG 6:08 then we should get a key 6:09 violation because we'll have two 6:11 authors that have the same ident attribute. 6:14 Let's try to validate. 6:16 In fact, we do correctly get 6:17 a key validation we also get 6:18 a couple of other errors and 6:19 those have to do with the fact 6:20 that we are using these items as 6:22 the destination of what are affect doubly pointers or references. 6:25 So let's change that back 6:27 to JU, make sure everything 6:28 now validates fine, and it does. 6:31 Now lets make another change. 6:33 So we have the ident 6:35 key here and we have 6:37 the ISBN number, being the 6:38 number for books, what if changed 6:41 the ISBN number to one 6:43 of the values we used as a key for the author, say 2HG. 6:47 When we did something similar with 6:49 DTDs we got an error 6:50 because in DTDs, IDs have be globally unique. 6:53 Here we should not get an error. 6:54 HG should be a perfectly 6:56 reasonable key for books because 6:57 we don't have another value that's the same. 7:01 And in fact it does validate. 7:03 Now let's undo that change. 7:05 Next, let's talk about references. 7:07 So, references allow us to 7:08 have what are Possibly typed 7:10 pointers, using the dtd. 7:12 So, they are called key 7:13 refs, and here we 7:15 have an example - let me just change this to the middle of the document. 7:18 So, one of the reference types 7:20 that we've defined in our DTD 7:21 is a pointer to authors 7:23 that we're using in our books. 7:26 Specifically, we want to specify that this 7:27 attribute here, the auth ident, 7:29 has a value that is 7:30 a key for the author elements. 7:33 And we want to make sure it's 7:34 author elements that its pointing 7:35 to and not other types of elements. 7:37 Now the syntax for doing 7:39 this in XML schema is rather detailed. 7:43 Its alright here and 7:45 just to give you a flavor, 7:46 this middle selector here is 7:48 actually using the XPath language 7:50 which we'll be using, which we'll 7:52 be learning later but what it 7:53 says is that when we navigate 7:54 in the document down to one of these auth elements. 7:58 Within that auth element, the 8:00 auth ident attribute is 8:01 a reference to what we 8:03 have already defined as author keys. 8:06 We've done something similar with books. 8:08 We have our book /remark/bookref 8:12 that brings us down to this element here. 8:15 And there we specified that the 8:17 book attribute must be 8:19 a reference to a book key, 8:21 and the book key was earlier 8:22 defined to be the ISBN number. 8:25 Again, I know this is all 8:25 complicated, and the syntax is 8:27 very clunky, so I urge 8:28 you to download the specification and spend time looking at it on your own. 8:33 Now let's make a couple of 8:34 changes to our document to 8:35 demonstrate how the checking of these typed pointers works. 8:39 For example lets change 8:41 our first reference here to food. 8:46 Let's validate the document and 8:48 we should get an error and indeed 8:49 we do, the author key rep is incorrect. 8:55 Now lets change that FU to JW, 8:56 so originally it was JU 8:59 But now we're going to have two 9:00 authors, both of whom refer to JW. 9:02 Now this should not be a problem. 9:04 It's simply two pointers to 9:05 the same author, and we did 9:06 not prohibit that in our 9:08 XMLs schema specification and indeed our document validates. 9:13 We'll change that one back. 9:14 And the last, as a last 9:16 change, we'll change our book 9:17 reference here to refer 9:19 to JW. 9:22 This should not validate because this 9:24 time, unlike with DTDs, we're, 9:27 we've actually specified typed pointers. 9:28 In other words, we've specified that 9:30 this pointer or this 9:31 reference must be to 9:34 a book element and not to an author element. 9:36 So we'll validate and indeed it fails. 9:39 I've undone that change and 9:41 now let's move to the last 9:42 feature that we're gonna look at 9:43 in this demonstration which is a currents constraint. 9:47 So in, let me just 9:48 bring up the first instance of 9:50 it, in XML schema, 9:51 we can specify how many 9:53 times an element type is allowed to occur. 9:55 Specifically we can specify the 9:57 minimum number of occurrences and the maximum number of occurrences. 10:00 As a default if 10:02 we don't specify for an 10:03 element the minOccurs or maxiOccurs the default for both of them is one. 10:08 So here for books we've 10:09 said that we can have 10:10 zero Books and we can have any number. 10:12 So this is the maximum flexibility, any number of elements. 10:16 For authors we've also said 10:17 we can have any number of authors 10:19 that's in the actual database itself. 10:22 Remember that our book store consists of a set of books and a set of authors. 10:25 But we are going to specify 10:26 something a little different for 10:27 how many authors we have within a specific book. 10:31 So let's continue to look 10:32 at other cases where we've specified 10:35 occurrence constraints. 10:36 Here is the case where we're 10:37 specifying how many authors we 10:39 have within a book and 10:40 again few boy this 10:43 is a lot of XML here 10:44 so take your time when looking 10:46 at it; or for now just take my word for it. 10:47 What we're specifying here is 10:49 how many sub elements, how 10:51 many auth sub elements we 10:53 have within each author's element. 10:55 And here we have no minOccurs 10:57 specification only a maxOccurs. 10:59 That means by default minOccurs is one. 11:02 So what this is saying specifically, is 11:04 that every book has in 11:06 it's authors sub element, atleast 11:09 one off, but we can have. 11:10 any number of them, that's the string unbounded. 11:14 Looking at the remaining occurrence constraints, 11:16 for remarks, we have the 11:18 minimum number of occurrences is zero. 11:20 In other words, we don't have to have a remark. 11:22 And we haven't specified max 11:23 occurs so the default max occurs is one. 11:27 So what we're saying here is that 11:28 every book may have either 11:29 no remark or exactly one 11:31 remark but it may not 11:32 have more than that. 11:34 And there's a few more occurrence constraints that 11:36 you can take a look at again 11:37 as you browse the XML schema description on your own. 11:40 Now let's make some changes in 11:42 the document to test these occurrence constraints. 11:45 So first let's remove the authors from our first book. 11:47 We won't remove the whole author 11:49 sub element but just the two 11:50 off sub elements of authors. 11:53 We attempt to validate and we see that it doesn't validate. 11:56 We're missing some child elements, 11:58 specifically the off-child elements 12:00 because we expected there to be at least one of them. 12:03 Incidentally, if we took 12:05 the entire author sub-element out, 12:07 we'll also get an error since 12:08 we've specified the books must have author sub element. 12:11 So now we're missing the entire 12:13 author structure in that book and again we don't validate. 12:17 Let's put authors back; and 12:20 now let's look at the 12:21 remark occurrence constraint so 12:23 we said that every book can 12:25 have zero or one 12:26 remarks, so let's just add another remark to this book. 12:34 Oh, hi, actually remarks are allowed to be empty. 12:38 In any case, we have added a small remark. 12:40 We validate and we see 12:42 that we have too many remarks 12:44 again because we specified that 12:46 every book can have at most one remark. 12:48 So that concludes our 12:50 demonstration of XML schema again, 12:52 it's been rather cursory we've 12:53 only covered a few of 12:54 the constructs but I did 12:56 focus on the constructs that 12:57 we have in XML schema 12:59 that are not specifiable in DTDs. 13:02 Finally one more time I 13:03 urge you download the access 13:05 fee and the document and play around with it yourself