0:00
In this video we'll be learning about XML schema.
0:03
Like document type descriptors, XML
0:05
schema allows us a way
0:07
to give content specific specifications for our XML data.
0:11
As you may remember, we send
0:12
to a validating XML parser or
0:15
XML document as well as a description.
0:17
We talked about DTDs in the last video.
0:19
We'll talk about XSDs in this one.
0:21
The validating XML parser will
0:23
check that the document is well
0:24
formed, and it will also check that it matches it's specification.
0:27
If it does, XML comes out.
0:30
If it doesn't we get
0:31
an error that the document is not valid.
0:34
XML schema is an extensive language, very powerful.
0:38
Like document type descriptors we
0:39
can specify the elements we
0:40
want in our XML data,
0:42
the attributes, the nesting of
0:43
the elements, how elements need
0:45
to be ordered, and and number of occurrences of elements.
0:47
In addition we can
0:49
specify data types we can
0:50
specify keys, the pointers
0:52
that we can specify are now
0:54
typed like in DTDs and much, much more.
0:57
Now, one difference between XML
1:00
schema and DTDs is that
1:01
the specification locations in XML schemas
1:03
called XSD's are actually written in the xml language itself.
1:06
That can be useful for example
1:08
if we have a browser that nicely renders the XML.
1:12
The languages I said is vast.
1:14
In this video, we're going
1:15
to show one sort of quote easy example.
1:18
But that example will give
1:19
very much the flavor of XML schema.
1:21
And we'll try to highlight the
1:23
differences between XML schema and using document type descriptors.
1:27
Ok, here were are with our XML document on the left.
1:31
On the right we have our
1:32
XML schema descriptor or XSD
1:34
and we have a
1:36
little command line that we're gonna use for our validation command.
1:39
Now let me just say up front
1:40
that we're not going to be
1:41
going through the XSD line
1:43
by line in detail the way we did with DTDs.
1:45
As you can see it's
1:46
rather long and that
1:48
would take us far too long and be rather boring.
1:52
So what I highly suggest is
1:53
that you download the file for
1:55
the XSD so you can
1:56
look at it yourself and look
1:57
at the entire file as well
1:59
as the XML and give it a try with validating.
2:02
What I'm gonna do in this
2:03
demo primarily is focus
2:05
on those aspects of
2:07
the XSD that are different,
2:09
are more powerful than we had in document type descriptors.
2:12
First, let's take a look at the data itself.
2:15
So we have our bookstore data as usual with two books and three authors.
2:18
Its slightly restructured from any of the versions we've used before.
2:22
It looks closest to the
2:23
last one we used because the
2:24
books and authors are separate
2:26
and the authors are actually exactly the same.
2:28
The have an identifier and a
2:30
first name - last name sub element.
2:31
But the primary difference is in
2:33
the books, instead of using
2:35
ID refs attributes to refer
2:37
from books to authors, we still,
2:38
we now back our back having
2:40
an author's sub-element with the
2:42
two authors underneath and then
2:44
those authors themselves have what
2:46
are effectively the pointers to the identifiers for the authors.
2:49
And we'll see how that's
2:50
going to mesh with the XML
2:52
schema descriptor that we're using for this file.
2:55
So, the other thing I want
2:56
to mention is that right now
2:57
we have the XML schema
2:59
descriptor in one file and the XML in another.
3:02
You might remember for the DTD,
3:04
we simply placed the DTDs
3:05
specification at the top
3:07
of the file with the XML.
3:08
For DTDs you can do it either way in the same file or in a separate file.
3:12
For XSDs, we always put those in a separate file.
3:15
Also notice that the XSD
3:17
itself is in XML.
3:20
It is using special tags.
3:21
These are tags that are part
3:23
of the XSD language, but
3:25
we are still expressing it in XML.
3:28
So we have two XML files, the data file and the schema file.
3:31
To validate the data file
3:33
against the schema file, we
3:34
can use again the XML link feature.
3:37
We specify the schema file,
3:39
the data file and when
3:41
we execute the command
3:43
we can see that the file validates correctly.
3:46
So I'm now going
3:47
to highlight four features of
3:49
XML schema that aren't present in DTD's.
3:52
One of them is typed values.
3:54
One of them is key declarations.
3:57
Similar to IDs but a little bit more powerful.
3:59
One is references which are again
4:01
similar to pointers But a little
4:02
more powerful and finally a currents constraints.
4:06
So let's start with tights.
4:08
In our data we see
4:10
that the price attribute is
4:12
denoted with a string and
4:14
when we had DTDs, all attribute
4:16
values were in fact stringed.
4:18
In excess fees we can
4:20
say that we want to check
4:21
that the values which are
4:22
still look like strings actually confirm to specific types.
4:25
For example we can say that the price must be in integer.
4:29
Again I'm not going to
4:30
be labor the syntactic details but rather
4:32
I'm just going to highlight the
4:34
places in the XSD where
4:35
we're declaring things of interest.
4:37
So specifically here's where we
4:38
declare the attribute price and
4:41
we say that the type of price must be an integer.
4:43
So our document validated correctly, what
4:45
if we change this one hundred to be foo instead.
4:49
Of course with a DTD this
4:50
would be fine because all attributes are treated as strings.
4:53
But if we try to validate
4:54
now we see an error,
4:56
specifically foo is not a value of the correct type.
4:59
So let's change that foo back
5:00
to a hundred so that we validate correctly.
5:04
Next, let's talk about keys.
5:06
In DTD's, we were able to specify ID's.
5:09
ID's were globally unique
5:11
values that could be
5:12
used to identify specific elements.
5:15
For example, when we wanted
5:16
to point to those elements using ID refs .
5:18
Keys are a little
5:20
bit more powerful or more specific I should say.
5:23
If you think about the relational model
5:24
a key in the relational model
5:26
is an attribute or set of
5:27
attributes that must be
5:28
unique for each tuple in a table.
5:31
So, we don't have tables or
5:32
tuples right now but, we
5:33
do have elements and we often have repeated elements.
5:36
So similarly we can specify
5:38
that a particular attribute or component
5:41
must be unique within every element of the same type.
5:44
And we have two keys
5:46
in our specification, one key
5:48
which we can see here for books and one for authors.
5:51
Specifically we say for books
5:53
that the ISBN attribute must be a key.
5:55
And we say for authors
5:57
that the ident attribute must be a key.
6:00
So let's go over to
6:01
our data and let's start by looking at the authors.
6:04
So if we change, for
6:05
example, U to HG
6:08
then we should get a key
6:09
violation because we'll have two
6:11
authors that have the same ident attribute.
6:14
Let's try to validate.
6:16
In fact, we do correctly get
6:17
a key validation we also get
6:18
a couple of other errors and
6:19
those have to do with the fact
6:20
that we are using these items as
6:22
the destination of what are affect doubly pointers or references.
6:25
So let's change that back
6:27
to JU, make sure everything
6:28
now validates fine, and it does.
6:31
Now lets make another change.
6:33
So we have the ident
6:35
key here and we have
6:37
the ISBN number, being the
6:38
number for books, what if changed
6:41
the ISBN number to one
6:43
of the values we used as a key for the author, say 2HG.
6:47
When we did something similar with
6:49
DTDs we got an error
6:50
because in DTDs, IDs have be globally unique.
6:53
Here we should not get an error.
6:54
HG should be a perfectly
6:56
reasonable key for books because
6:57
we don't have another value that's the same.
7:01
And in fact it does validate.
7:03
Now let's undo that change.
7:05
Next, let's talk about references.
7:07
So, references allow us to
7:08
have what are Possibly typed
7:10
pointers, using the dtd.
7:12
So, they are called key
7:13
refs, and here we
7:15
have an example - let me just change this to the middle of the document.
7:18
So, one of the reference types
7:20
that we've defined in our DTD
7:21
is a pointer to authors
7:23
that we're using in our books.
7:26
Specifically, we want to specify that this
7:27
attribute here, the auth ident,
7:29
has a value that is
7:30
a key for the author elements.
7:33
And we want to make sure it's
7:34
author elements that its pointing
7:35
to and not other types of elements.
7:37
Now the syntax for doing
7:39
this in XML schema is rather detailed.
7:43
Its alright here and
7:45
just to give you a flavor,
7:46
this middle selector here is
7:48
actually using the XPath language
7:50
which we'll be using, which we'll
7:52
be learning later but what it
7:53
says is that when we navigate
7:54
in the document down to one of these auth elements.
7:58
Within that auth element, the
8:00
auth ident attribute is
8:01
a reference to what we
8:03
have already defined as author keys.
8:06
We've done something similar with books.
8:08
We have our book /remark/bookref
8:12
that brings us down to this element here.
8:15
And there we specified that the
8:17
book attribute must be
8:19
a reference to a book key,
8:21
and the book key was earlier
8:22
defined to be the ISBN number.
8:25
Again, I know this is all
8:25
complicated, and the syntax is
8:27
very clunky, so I urge
8:28
you to download the specification and spend time looking at it on your own.
8:33
Now let's make a couple of
8:34
changes to our document to
8:35
demonstrate how the checking of these typed pointers works.
8:39
For example lets change
8:41
our first reference here to food.
8:46
Let's validate the document and
8:48
we should get an error and indeed
8:49
we do, the author key rep is incorrect.
8:55
Now lets change that FU to JW,
8:56
so originally it was JU
8:59
But now we're going to have two
9:00
authors, both of whom refer to JW.
9:02
Now this should not be a problem.
9:04
It's simply two pointers to
9:05
the same author, and we did
9:06
not prohibit that in our
9:08
XMLs schema specification and indeed our document validates.
9:13
We'll change that one back.
9:14
And the last, as a last
9:16
change, we'll change our book
9:17
reference here to refer
9:19
to JW.
9:22
This should not validate because this
9:24
time, unlike with DTDs, we're,
9:27
we've actually specified typed pointers.
9:28
In other words, we've specified that
9:30
this pointer or this
9:31
reference must be to
9:34
a book element and not to an author element.
9:36
So we'll validate and indeed it fails.
9:39
I've undone that change and
9:41
now let's move to the last
9:42
feature that we're gonna look at
9:43
in this demonstration which is a currents constraint.
9:47
So in, let me just
9:48
bring up the first instance of
9:50
it, in XML schema,
9:51
we can specify how many
9:53
times an element type is allowed to occur.
9:55
Specifically we can specify the
9:57
minimum number of occurrences and the maximum number of occurrences.
10:00
As a default if
10:02
we don't specify for an
10:03
element the minOccurs or maxiOccurs the default for both of them is one.
10:08
So here for books we've
10:09
said that we can have
10:10
zero Books and we can have any number.
10:12
So this is the maximum flexibility, any number of elements.
10:16
For authors we've also said
10:17
we can have any number of authors
10:19
that's in the actual database itself.
10:22
Remember that our book store consists of a set of books and a set of authors.
10:25
But we are going to specify
10:26
something a little different for
10:27
how many authors we have within a specific book.
10:31
So let's continue to look
10:32
at other cases where we've specified
10:35
occurrence constraints.
10:36
Here is the case where we're
10:37
specifying how many authors we
10:39
have within a book and
10:40
again few boy this
10:43
is a lot of XML here
10:44
so take your time when looking
10:46
at it; or for now just take my word for it.
10:47
What we're specifying here is
10:49
how many sub elements, how
10:51
many auth sub elements we
10:53
have within each author's element.
10:55
And here we have no minOccurs
10:57
specification only a maxOccurs.
10:59
That means by default minOccurs is one.
11:02
So what this is saying specifically, is
11:04
that every book has in
11:06
it's authors sub element, atleast
11:09
one off, but we can have.
11:10
any number of them, that's the string unbounded.
11:14
Looking at the remaining occurrence constraints,
11:16
for remarks, we have the
11:18
minimum number of occurrences is zero.
11:20
In other words, we don't have to have a remark.
11:22
And we haven't specified max
11:23
occurs so the default max occurs is one.
11:27
So what we're saying here is that
11:28
every book may have either
11:29
no remark or exactly one
11:31
remark but it may not
11:32
have more than that.
11:34
And there's a few more occurrence constraints that
11:36
you can take a look at again
11:37
as you browse the XML schema description on your own.
11:40
Now let's make some changes in
11:42
the document to test these occurrence constraints.
11:45
So first let's remove the authors from our first book.
11:47
We won't remove the whole author
11:49
sub element but just the two
11:50
off sub elements of authors.
11:53
We attempt to validate and we see that it doesn't validate.
11:56
We're missing some child elements,
11:58
specifically the off-child elements
12:00
because we expected there to be at least one of them.
12:03
Incidentally, if we took
12:05
the entire author sub-element out,
12:07
we'll also get an error since
12:08
we've specified the books must have author sub element.
12:11
So now we're missing the entire
12:13
author structure in that book and again we don't validate.
12:17
Let's put authors back; and
12:20
now let's look at the
12:21
remark occurrence constraint so
12:23
we said that every book can
12:25
have zero or one
12:26
remarks, so let's just add another remark to this book.
12:34
Oh, hi, actually remarks are allowed to be empty.
12:38
In any case, we have added a small remark.
12:40
We validate and we see
12:42
that we have too many remarks
12:44
again because we specified that
12:46
every book can have at most one remark.
12:48
So that concludes our
12:50
demonstration of XML schema again,
12:52
it's been rather cursory we've
12:53
only covered a few of
12:54
the constructs but I did
12:56
focus on the constructs that
12:57
we have in XML schema
12:59
that are not specifiable in DTDs.
13:02
Finally one more time I
13:03
urge you download the access
13:05
fee and the document and play around with it yourself