0:00 This video introduces the basics of XML. 0:04 XML can be thought of as 0:05 a data model, an alternative to 0:06 the relational model, for structuring data. 0:09 In addition to introducing XML, 0:10 we will compare it to the 0:11 relational model, although it 0:13 is not critical to have watched 0:15 the relational model videos in order to get something out of this one. 0:19 The full name of XML is the extensible markup language. 0:23 XML is a standard for 0:24 data representation and exchange, and 0:26 it was designed initially for exchanging 0:28 information on the Internet. 0:30 Now don't worry if you 0:31 can't read the little snippet in the corner of the video here. 0:33 You're not expected to at this point. 0:36 XML can be thought 0:38 of as a document format similar 0:40 to HTML, if you're familiar with HTML. 0:42 Most people are. 0:43 The big difference is that 0:45 the tags in an HTML document 0:47 describe the content of the 0:49 data rather than how to 0:50 format the data, which is 0:51 what the tags in HTML tend to represent. 0:55 XML also has a streaming 0:56 format or a streaming standard, 0:58 and that's typically for the use 1:00 of XML in programs, for 1:01 admitting XML and consuming XML. 1:04 So now let's take a look at the XML data itself. 1:07 You see on the left side of the video a portion of an XML document. 1:11 The entire document is available 1:13 from the website for the course. 1:16 XML has three basic components. 1:18 Again, fairly similar to HTML. 1:20 The first is tagged element. 1:22 So, for example let's take a look at this element here. 1:24 This is an element saying, 1:26 that the data here is a first name. 1:28 So we have a opening tag and we have a matching closing tag. 1:31 We also have nesting development. 1:33 So for example here we have an element that's authored. 1:37 We have the opening tag here, the 1:38 closing tag here, and we 1:40 have a nesting of the first name and last name elements. 1:42 Even larger we have 1:44 a book element here with opening 1:46 and closing tags with a 1:47 nesting of numerous elements inside 1:50 and the entire document actually is 1:52 one element whose opening tag 1:53 is bookstore and the closing tag 1:55 isn't visible on the video here. 1:58 So that's what elements consist 2:00 of, an opening tag, text or 2:02 other sub-elements and a closing tag. 2:06 In addition we have have 2:07 attributes so each element 2:09 may have within its opening 2:11 tag and let's take a look at the book element here. 2:13 A set of attributes and 2:15 an attribute consists of 2:16 an attribute name, the equal 2:18 sign and then an attribute value. 2:20 So, our book element 2:22 right here has three attributes. 2:24 One called ISPN, one called 2:26 Price and one called 2:27 Edition. And any element 2:29 can have any number of 2:31 attributes as long as the attribute names are unique. 2:34 And finally, the third component of 2:36 XML is the text itself 2:38 which is depicted here in black. 2:39 So, within elements, we can have strengths. 2:42 We have a strength all right 2:43 here, we have a title 2:45 here, here we have a remark. 2:47 And so, that's generally sort 2:49 of, think of XML as 2:50 a tree, the strings form, or 2:52 the text form, the leaf element of the tree. 2:55 So, again, those are the three major components of xml. 2:57 Look's a lot like HTML, except 3:00 the tags are describing the 3:01 content of the data, and not how to format it. 3:05 Now let's spend some time comparing the relational model against XML. 3:09 Again, it's not critical, that you 3:10 learn about the relational model and 3:12 you can skip this material if 3:13 you're not interested, but in many 3:15 cases when designing an application that's dealing 3:17 with data you might have to 3:18 make a decision whether you want 3:20 to use a relational database or whether 3:22 you want to store the data in XML. 3:23 So let's look at a 3:24 few different aspects of the 3:26 data and how it's used and 3:27 how it compares between relational and XML. 3:30 Let's start with the structure of the data itself. 3:32 So as we learn, the structure 3:34 in a relational model is basically a set of tables. 3:38 So we define the set of columns and we have a set of rows. 3:41 XML is generally, again it's 3:44 usually in a document or 3:45 a string format, but if you 3:46 think about the structure itself, the structure is hierarchical. 3:50 The nested elements induce a hierarchy or a tree. 3:55 There are constructs that actually allow 3:57 us to have links within 3:59 documents and so, you can 4:00 also have XML representing 4:03 a graph though, in general, it's 4:04 mostly thought of as a tree structure. 4:08 Next, let's talk about schemas. 4:10 In the relational model the schema is very important. 4:12 You fix your schema in 4:14 advance, when you design your database, 4:16 and them you add the data to conform to the schema. 4:19 Now, in XML, you have a lot more flexibility. 4:22 So the schema is flexible. 4:24 In fact, a lot of 4:26 people refer to XML as self-describing. 4:28 In other words, the 4:29 schema and the data kind of mixed together. 4:32 The tags on elements are 4:34 telling you the kind of data 4:35 you'll have, and you can have a lot of irregularity. 4:38 Now I will say that 4:39 their are many mechanisms for introducing 4:41 schemas into XML but they're not required. 4:44 In the relational model schemas are absolutely required. 4:47 In XML they're more optional. 4:50 In particular, let's go 4:52 back and take a look 4:53 at our example, and we'll 4:55 see that we have sort of 4:56 some structure in our example, 4:58 but not everything is perfectly structured, 5:00 as it would be in the model. 5:02 So, coming back here and taking a look, 5:04 first of all, we have 5:05 the situation where in this 5:07 first book, we have an attribute called edition, the third edition. 5:10 Whereas in the second book 5:12 we only have two attributes, so there's no addition in this book. 5:15 Now in the relational model, 5:17 we would have to have a column 5:18 for addition, and we have one for every book. 5:20 Although of course we could have null editions for some books. 5:24 In XML, it's perfectly acceptable 5:26 to have some attributes for some 5:27 elements and those attributes don't appear in other elements. 5:31 Here's another example where we 5:32 have a component in one book 5:34 that's not in another and it's this remark component. 5:36 So here we have a book 5:37 where we happen to have a 5:38 remark and incidentally, you can 5:41 see that this book suggests, this 5:43 remark suggests that we buy 5:44 the complete book together with the first course. 5:46 The first course is a subset, 5:47 so it's not a very 5:48 good suggestion, although Amazon actually did make that one. 5:51 Anyway, enough of the asides. 5:53 We do see that we have 5:54 remark for the first book 5:55 and we have no remark for the 5:56 second book and that's not 5:58 a problem whatsoever in XML. 6:00 In the relational model, we 6:02 would again have to use null values for that case. 6:04 And the third example I 6:06 just wanted to give is the number of authors. 6:08 So this first book has two authors. 6:10 The second book - you can't see them all, but it has three authors. 6:13 Not a problem in XML. 6:15 Having different numbers of things 6:18 is perfectly standard. 6:19 So the main point being 6:21 that there's a lot of flexibility 6:23 in XML in terms of the schema. 6:25 You can create your database with 6:27 certain types of elements, later 6:28 add more elements, remove elements, 6:30 introduce inconsistencies in 6:33 the structure, and it's not a problem. 6:35 And again, I'll mention one more 6:37 time that there are mechanisms for 6:39 adding schema-like elements to 6:42 XML or schema-like specifications to XML. 6:44 We will be covering those in the next two videos actually. 6:49 Next, let's talk about how this data is queried. 6:51 So for the relational model, we have relational algebra. 6:53 We have SQL. 6:55 These are pretty simple, nice languages, I would say. 6:58 It's a little bit of a 6:59 matter of opinion, but I'm going to give them a smiley face. 7:03 XML querying is a little trickier. 7:06 Now, one of the 7:07 factors here is that XML 7:09 is a lot newer than the 7:10 relational model and querying XML 7:12 is still settling down to some extent. 7:14 But I'm just gonna say, it's a little 7:16 less, so I'm gonna give 7:17 it a neutral face here, in 7:19 terms of how simple and 7:20 nice the languages are for querying 7:22 XML and we'll be spending some 7:23 time in later videos learning some of those languages. 7:27 Next, in our chart is the aspect of ordering. 7:30 So the relational model is 7:32 fundamentally an unordered model 7:34 and that can actually be considered a bad thing to some extent. 7:35 Sometimes in data applications it's nice to have ordering. 7:36 We learned the order by clause in SQL and that's a way to get order in query results. 7:38 But fundamentally, the data in our table, in our relationship database, is a set of data, without an ordering within that set. 7:39 Now, in XML we do have, I would say, an implied ordering. 7:39 So XML, as I said, can be thought of as either a document model or a stream model. 7:48 And either case, 8:03 just the nature of the 8:05 XML being laid out in 8:06 a document as we have here 8:08 or being in a stream induces an order. 8:10 Very specifically, let's take a look at the authors here. 8:13 So here we have two authors, 8:14 and these authors are in an order in the document. 8:17 If we put those authors in a relational database, there would be no order. 8:20 They could come out in either 8:21 order unless we did 8:23 a order-by clause in our 8:25 query, whereas in XML, 8:27 implied by the document structure is an order. 8:30 And there's an order between these two books as well. 8:32 Sometimes that order is meaningful; sometimes it's not. 8:34 But it is available to be used in an application. 8:38 Lastly, let's talk about implementation. 8:40 As I mentioned in earlier 8:42 videos, the relational model has 8:43 been around for as least 8:45 35 years, and the systems 8:47 that implement it have been around almost as long. 8:50 They're very mature systems. 8:52 They implement the relational model 8:54 as the native model of the 8:56 systems and they're widely used. 8:59 Things with XML are 9:00 a little bit different, partly again because 9:02 XML hasn't been around as long. 9:04 But what's happening right now 9:06 in terms of XML and conventional 9:08 database systems is XML is typically an add-on. 9:11 So in most systems, XML 9:13 will be a layer over the relational database system. 9:16 You can enter data in 9:17 XML; you can query data in XML. 9:19 It will be translated to a relational implementation. 9:23 That's not necessarily a bad thing. 9:24 And it does allow you to 9:26 combine relational data and 9:28 XML in a single system, sometimes 9:30 even in a single query, but 9:31 it's not the native model of the system itself.