0:00 In this video, we'll see a demonstration of JSON data. 0:04 As a reminder, JSON stands for 0:05 Java Script Object Notation, and 0:08 it's a standard for writing 0:09 data objects into human readable format, typically in a file. 0:13 It's useful for exchanging data 0:16 between programs, and generally 0:18 because it's quite flexible, it's useful 0:20 for representing and for storing data that's semi-structured. 0:23 A reminder of the 0:24 basic constructs in JSON, we 0:26 have the atomic value, such 0:28 as integers and strings and so on. 0:30 And then we have two types of 0:31 composite things; we have 0:33 objects that are sets of 0:34 label-value pairs and then we have arrays that are lists of values. 0:38 In the demonstration, we'll go through 0:40 in more detail the basic constructs 0:41 of JSON and we'll look at 0:44 some tactic correctness, we'll demonstrate 0:46 the flexibility of the data 0:47 model and then we'll 0:49 look briefly at JSON's schema, 0:50 not widely used yet but 0:52 still fairly interesting to look at 0:54 and we'll look at some validation 0:55 of JSON data against a particular schema. 0:59 So, here's the JSON 1:00 data that we're gonna be working with during this demo. 1:03 It's the same data that appeared 1:04 in the slides, in the introduction 1:07 to JSON, but now we're going 1:08 to look into the components of the data. 1:11 It's also by the way, the 1:13 same example pretty much that we 1:14 used for XML, it's reformatted 1:17 of course to meet the JSON 1:18 data model, but you can compare the two directly. 1:22 Lastly, we do have 1:23 the file for the data on 1:25 the website, and I do 1:26 suggest that you download the 1:28 file so that you can 1:29 take a look at it closely on your own computer. 1:31 All right. 1:32 So, let's see what we have, 1:33 right now we're in 1:34 an editor for JSON data. 1:36 It happens to be the Eclipse 1:38 editor and we're going to 1:38 make make some edits to the 1:39 file after we look through 1:41 the constructs of the file. 1:43 So, this is JSON 1:45 data representing books and 1:48 magazines, and we have 1:49 a little more information about our books and our magazines. 1:52 So, at the outermost, the 1:53 curly brace indicates that this is a JSON object. 1:57 And as a reminder, an object 1:59 is a set of label-value 2:01 pairs, separated by commas. 2:03 So, our first value is the label "books". And 2:07 then our first element in 2:09 the object is the label books 2:11 and this big value and the 2:14 second, so there's only two label-value 2:16 pairs here, is the 2:17 label magazines and this big value here. 2:21 And let's take a look first at magazines. 2:23 So magazines, again, is the 2:25 label and the value we 2:26 can see with the square 2:27 brackets here is an array. 2:30 An array is a list of 2:31 values and here we 2:33 have two values in our array. 2:35 They're still composite values. 2:37 So, we have two values, each 2:38 of which is an object, 2:40 a set of label-value pairs. 2:42 Let me mention, sometimes people call these labels 'properties', by the way. 2:46 Okay. So, now we are inside 2:48 our 2 objects that are 2:49 the 2 elements in the array that's the value of magazines. 2:53 And each one of those has 2:54 3 labels and 3 values. 2:56 And now we're finally down to the base values. 2:58 So, we have the title being "National 3:00 Geographic", a string, the 3:02 month being January, a string 3:04 and the year 2009, where 2009 is an integer. 3:06 And again, we have 3:08 another object here that's a different magazine 3:12 with a different name, month and happens to be the same year. 3:15 Now, these two have exactly the 3:16 same structure but they don't 3:18 have to and we will 3:19 see that as we start editing the file. 3:21 But before we edit the file, 3:23 let's go and look at 3:24 our books here. 3:26 The value of our other 3:28 label-value pair inside the 3:30 outermost object, "books" is 3:32 also an array, and 3:34 the array in this case also 3:35 has just two elements, so we've represented two books here. 3:38 It's a little more complicated than the 3:40 magazines, but those elements 3:42 are still objects that are label-value pairs. 3:45 So, we have now the ISBN, 3:47 the price, the addition, the title, 3:49 all either integers or strings, 3:51 and then we have one nested composite 3:54 object which is the authors 3:56 and that's an array again. 3:57 So, the array again, is indicated by the square brackets. 4:02 And inside this array, we 4:04 have two authors and each 4:06 of the authors has a first 4:07 name and a last name, 4:08 but again, that uniformity is 4:10 not required by the model itself, as we'll see. 4:13 So, as I mentioned, 4:15 this is actually an editor for 4:16 JSON data and we're going to come back to this editor in a moment. 4:19 But what I wanted to do is 4:20 show the same data 4:22 in a browser because browsers 4:23 actually offer some nice features 4:25 for navigating in JSON. 4:27 So here we are in the 4:28 Chrome browser, which has nice 4:30 features for navigating JSON, 4:32 and other browsers do as well. 4:34 We can see here again that we 4:35 have an object in 4:37 our JSON data, that consists 4:39 of two label-value pairs; 4:40 books and magazines, which are 4:42 currently closed and and then 4:43 this plus allows us to open them up, and see the structure. 4:47 For example, we open magazines 4:48 and we see that magazines is an array containing two objects. 4:52 We can open one of those 4:53 objects, and see that the three label-value pairs. 4:55 Now we're at the lowest levels and similarly for the other object. 4:59 We can see here that Books 5:00 is also an array, and we go ahead and open it up. 5:03 It's an array of two objects. 5:05 We open one of those 5:06 objects and we see again 5:07 the set of label-value pairs, 5:09 where one of the values 5:10 is a further nesting. 5:12 It's an array and we open 5:14 that array, and we see 5:15 two objects, and we open 5:16 them and finally see the data at the lowest levels. 5:19 So again, the browser 5:21 here gives us a nice way 5:22 to navigate the JSON data and see its structure. 5:26 So now we're back to our JSON editor. 5:28 By the way, this editor, Eclipse, does 5:30 also have some features for 5:32 opening and closing the structure 5:34 of the data, but it's 5:35 not quite as nice as the browser that we use. 5:38 So we decided to use the browser instead. 5:39 What we are going to 5:40 use the editor for is to 5:42 make some changes to the 5:43 JSON data and see which 5:44 changes are legal and which aren't. 5:47 So, let's take a look at the first change, a very simple one. 5:50 What if we forgot a comma. 5:52 Well, when we try to 5:53 save that file, we get a 5:54 little notice that we have an 5:55 error, we expected an 5:56 N value, so that's a 5:58 pretty straightforward mistake, let's put that comma back. 6:02 Let's say insert an 6:04 extra brace somewhere here, for whatever reason. 6:07 We accidentally put in an extra brace. 6:09 Again we see that that's marked as an error. 6:13 So an error that can 6:13 be fairly common to make is 6:15 to forget to put quotes around strings. 6:18 So, for example, this ISBN 6:20 number here, if we don't quote it, we're gonna get an error. 6:23 As we'll see the only things that can 6:24 be unquoted are numbers and 6:27 the values null, true and false. 6:29 So, let's put our quotes back there. 6:31 Now, actually, even more 6:33 common is to forget to 6:34 put quotes around the labels in label-value pairs. 6:37 But if we forget to quote that, that's going to be an error as well. 6:40 You might have noticed, by the 6:41 way, when we use the browser 6:43 that the browser didn't even show 6:44 us the quotes in the labels. 6:46 But you do when you make 6:47 the raw JSON data, you do need to include those quotes. 6:51 Speaking of quotes, what if we quoted our price here. 6:56 Well that's actually not an 6:57 error, because now we've simply turned 6:58 price into a string, and 7:00 string values are perfectly well allowed anywhere. 7:03 Now we'll see when we use 7:04 JSON's schema that we 7:05 can make restrictions that don't allow 7:07 strings in certain places, but 7:08 just for syntactic correctness of 7:10 JSON data any of our values can be strings. 7:15 Now, as I mentioned, there are 7:16 a few values that are 7:17 sort of reserved words in JSON. 7:20 For example, true is a 7:22 reserved word for a bullion value. 7:24 That means we don't need to 7:25 quote it because it's actually 7:27 its own special type of value. 7:28 And so is false. 7:30 And the third one is null, 7:32 so there's a built-in concept of null. 7:35 Now, if we wanted to 7:36 use nil for whatever reason 7:38 instead of null, well, now 7:39 we're going to get an error because 7:40 nil is not a reserved word, 7:42 and if we really wanted nil 7:43 then we would need to actually make it a quoted string. 7:47 Now, let's take a look inside our author list. 7:50 And I'm going to show you 7:51 that arrays do not have 7:52 to have the same type of 7:54 value for every element in the array. 7:56 So here we have a homogeneous 7:58 list of authors. Both of them 7:59 are objects with a first 8:01 name and a last name as 8:02 separate label-value pairs, 8:04 but if I change that 8:05 first one, the entire value 8:07 to be, instead of a 8:09 composite one, simply the string, 8:11 Jefferey Ullman. Oops, sorry 8:13 about my typing there, and that 8:15 is not an error, it 8:17 is allowed to have a string, 8:18 and then a composite object. 8:20 And we could even have an array, and anything we want. 8:22 In an array, when you 8:24 have a list of values, all 8:25 you need is for each one 8:26 to be syntactically a correct value in JSON. 8:30 Now let's go visit our magazines 8:32 for a moment here and let 8:33 me show that empty objects are okay. 8:35 So a list of label 8:37 value pairs, comprising an object, can be the empty list. 8:41 And so now I've turned this magazine 8:42 into having no information about 8:44 it, but that is legal in JSON. 8:46 And similarly, arrays are allowed to be of zero length. 8:50 So I can take these authors 8:52 here and I can just take 8:53 out all of the authors, and 8:54 make that an empty list, but that's still valid JSON. 8:58 Now, what if I took this array out altogether? 9:01 In that case, now we 9:02 have an error because this is 9:04 an object where we have 9:05 label-value pairs and every 9:08 label-value pair has to 9:09 have both a label and a value. 9:12 So let's put our array back 9:13 and we can have anything in 9:15 there so let's just make it 9:16 "fu" and that corrects the error. 9:19 What if we didn't want an 9:20 array here instead and we 9:21 tried to make it, say, an object,? 9:24 Well, we're going to see an 9:26 error there, because an object 9:28 as a reminder and this is an 9:29 easy mistake to make. Objects 9:30 are always label-value pairs. 9:33 So if you want just a value, 9:34 that should be an array if 9:36 you want an object, then we're 9:37 talking about a label-value pair, so 9:39 we can just add "fu" as 9:40 our value, and then we're all set. 9:42 So what we've seen so far is syntactic correctness. 9:46 Again, there's no required 9:48 uniformity across values in 9:50 arrays or in the 9:52 label-value pairs in objects we 9:55 just need to ensure that 9:56 all of our values, our basic 9:57 values, are of the right types, 9:59 and things like our commas and 10:00 curly braces are all in place. 10:02 What we're gonna do next is look 10:04 at JSON's schema where we 10:05 have a mechanism for enforcing certain 10:08 constraints beyond simple syntactic correctness. 10:11 If you've been very observant, you 10:13 might even have noticed that we 10:14 have a second tab up 10:15 here in our editor for a 10:17 second JSON file, and this file 10:18 is going to be the schema 10:20 for our bookstore data. We're using 10:22 JSON schema, and JSON 10:25 schema, like, XML schema 10:27 is expressed in the data model itself. 10:29 So, our schema description for 10:31 this JSON data is itself 10:33 JSON data, and here it is. 10:35 And it's going to take a bit of time to explain. 10:37 Now the first thing that you might 10:37 notice is wow, the schema 10:39 looks more complicated and in 10:41 fact longer than the data itself. 10:43 Well, that is true, but that's mostly because our data file is tiny. 10:47 So, if we had thousands, you know, tens 10:49 of thousands of books and magazines, 10:51 our schema file wouldn't 10:53 change, but our data file would 10:54 be much longer and that's the typical case, in reality. 10:57 Now, this video is not a 10:58 complete tutorial about JSON's schema. 11:01 There's many constructs in JSON's 11:02 schema that weren't needed to 11:04 describe the bookstore data, for example. 11:06 And even this file here, 11:08 I'm not gonna go through every detail of it right here. 11:11 You can download the file and 11:12 take a look, read a little more about JSON schema. 11:15 I'm just going to give the 11:16 flavor of the schema 11:17 specification and then we're 11:19 going to work with validating the data 11:20 itself to see how the schema and data work together. 11:24 But to give you the flavor here, let's go through at least some portions of the schema. 11:28 So, in some sense, 11:29 the structure of the schema file 11:31 reflects the structure of the data file that it's describing. 11:34 So, the outermost constructs in 11:37 the schema file are the 11:38 outermost in the data file and 11:39 as we nest it parallels the nesting. 11:42 Let me just show a little 11:43 bit here, we'll probably look at most of it in the context of validation. 11:48 So, we see here that our outermost construct in our data file is an object. 11:52 And that's told to us, 11:53 because we have "type" as 11:55 one of our built-in labels for the schema. 11:57 So we we have an 11:58 object with two properties, as 12:00 we can see here, the book's property 12:02 and the magazine's property. 12:04 And I use the word 12:05 "labels" frequently for label-value 12:07 pairs, that's synonymous with property value pairs. 12:11 Then inside the books property 12:13 for example, we see that 12:15 the type of that is array, 12:16 so we've got a label-value pair where the value is an array. 12:19 And then we follow the nesting and see that it's an array of objects. 12:22 And we go further down and we 12:24 see the different label-value pairs 12:26 of the object that make up 12:27 the books and nesting further into the authors and so on. 12:31 We see similarly for magazines 12:32 that the value of the 12:34 a label-value pair for 12:36 magazines is an array, and 12:37 that array consists of objects with further nesting. 12:41 So what we're looking at here is 12:42 an online JSON schema validator. We have two windows. 12:45 On the left we have our 12:46 schema and on the 12:47 right we have our data, and 12:49 this is exactly the same data 12:50 file and schema file that we were looking at earlier. 12:54 If we hit the validate button, 12:55 hopefully everything should work and it does. 12:58 This tells us that the 12:59 JSON data is valid with respect to the schema. 13:03 Now, this system will of 13:04 course find basic syntactic errors 13:06 so I can take away a comma 13:07 just like I did before and 13:09 when I validate I'll get a 13:10 parsing error that really has nothing to do with the schema. 13:13 What I'm going to focus on 13:14 now is actually validating 13:16 semantic correctness of the JSON 13:18 with respect back to the constructs 13:19 that we've specified in this schema. 13:21 Let me first put that comma back so we start with a valid file. 13:25 So, the first thing I'll show is 13:26 the ability to constrain basic 13:28 types, and then the ability 13:29 to constrain the range of values of those basic types. 13:32 And let's focus on price. 13:34 So here we're talking about the 13:35 price property inside books and 13:37 we specify in our schema 13:39 that the type of the price must be an integer. 13:42 So, for example, if our 13:44 price were instead a string 13:46 and we went ahead and try 13:47 to validate that we would get an error. 13:49 Let's make it back into an 13:51 integer but let's make 13:53 it into the integer 300 now instead of 100. 13:56 And why am I doing that? 13:58 Because the JSON schema also 14:00 lets me constrain the range of 14:01 values that are allowed if we have a numeric value. 14:05 So, not only in price did I 14:06 say that it's an integer but 14:08 I also said that it 14:09 has a minimum and maximum value, 14:11 the integer of prices must 14:13 be between 0 and 200. 14:15 So, if I try to make 14:16 the price of 300, and I 14:18 validate, I'm again getting an error. 14:20 Now it's not a type error, 14:21 but it's an error that my 14:23 integer was outside of the allowed range. 14:26 I've put the price back to 14:27 a hundred, and now let's 14:28 look at constraints on string values. 14:32 JSON schema actually has 14:33 a little pattern matching language that 14:35 can be used to constrain the 14:36 allowable strings for a specific type of value. 14:40 We'll look at ISBN number here as an example of that. 14:43 We've said that ISBN is 14:45 of type string, and then 14:47 we've further constrained in the 14:48 schema that the string values for 14:50 ISBN must satisfy a certain pattern. 14:52 I'm not gonna go into the details of this pattern-matching language. 14:56 I'm just gonna give an example. 14:57 And in fact, this entire demo is 14:59 really just an example lots of 15:00 things in JSON's schema that we're not seeing. 15:03 What this pattern here says is 15:05 that the string value for 15:06 ISBN must start with 15:08 the four characters ISBN and then can be followed by anything else. 15:13 So, if we go over to our 15:14 data and we look at 15:15 the ISBN number here and 15:17 say we have a typo, we 15:18 forgot the "I" and we try to validate. 15:20 Then we'll see that our data 15:22 no longer matches our schema specification. 15:25 Now let's look at some other constraints we can specify in JSON's schema. 15:29 We can constrain the number of elements in an array. 15:32 We can give a minimum or maximum or both. 15:35 And I've done that here in the context of the authors array. 15:38 Remember the authors are 15:39 an array that's a list of 15:40 objects and here I've said that 15:42 we have a minimum number of 15:44 items of 1 and a 15:45 maximum number items of 10. 15:46 In other words, every book 15:48 has to have between one and ten authors. 15:51 So let's try, for example, 15:53 taking out all of our authors here in our first book. 15:56 We actually looked at this before in terms 15:57 of syntactic validity, and it 15:59 was perfectly valid to have an empty array. 16:01 But when we try to validate 16:02 now we do get an 16:03 error, and the reason is 16:05 that we said that we needed 16:06 between one and ten array elements in the case of authors. 16:10 Now let's fix that, 16:12 not by putting our authors back 16:13 but let's say we actually decide 16:14 we would like to be able to have books that have no authors. 16:17 So, we can simply fix 16:19 that by changing that minimum 16:21 item to zero and that 16:23 makes our data valid again and 16:24 in fact, we could actually take that 16:26 minimum constraint out all together, 16:28 and if we do that our data is still going to be valid. 16:32 Now let's see what happens when we 16:33 add something to our data that isn't mentioned in the schema. 16:36 If you look carefully you'll see 16:38 that everything that we have 16:39 in the data so far has been specified in the schema. 16:42 Let's say we come along 16:43 and decide were gonna also have ratings for our books. 16:46 So let's add here a 16:47 rating label property with the value 5. 16:51 We go ahead and validate, you 16:53 probaly think it's not going to 16:54 validate properly but actually it did. 16:57 The definition of JSON 16:59 schema that it can constrain things by 17:00 describing them but you 17:02 can also have components in 17:04 the data that aren't present in this schema. 17:06 If we want to insist 17:08 that every property that is 17:10 present in the data is 17:11 also described in this 17:12 schema, then we can 17:14 actually add a constraint to the schema that tells us that. 17:17 Specifically, under the object 17:20 here, we can put in 17:22 a special flag which itself 17:24 is specified as a label called additional properties. 17:27 And this flag if we 17:29 set it to false and remember 17:31 false can is actually a keyword 17:32 in JSON's schema, tells us 17:34 that in our data we're not 17:36 allowed to have any properties 17:37 beyond those that are specified in the schema. 17:40 So now we validate and we 17:41 get an error, because the property 17:43 rating hasn't been defined in the schema. 17:46 If additional properties is missing, 17:48 or have the default value 17:50 of "true", then the validation goes through. 17:53 Now lets take a look at our authors that are still here. 17:56 Let's suppose that we don't 17:58 have a first name for our middle author here. 18:01 If we take that away and 18:02 we try to validate, we do 18:04 get an error, because we specified 18:06 in our schema and it's right 18:08 down here--that author-objects must 18:11 have both a first name and a last name. 18:14 It turns out that we can 18:16 specify for every property that the property is optional. 18:20 So, we can add to the 18:21 description of the first 18:23 name, not only that the 18:24 type is a string but that that 18:26 property is optional so we 18:27 say optional, true. 18:31 Now let's validate, and now we're in good shape. 18:34 Now, let's take a look 18:35 at what happens when we have 18:36 object that has more than 18:37 one instance of the same label or same property. 18:41 So let's suppose, for example, in 18:43 our magazine, the magazine 18:45 has two different years, 2009 and 2011. 18:46 This is syntactically valid, JSON, 18:52 it meets the structure of having a list of label-value pairs. 18:55 When we validate it, we 18:57 see that we can't add a second property, year. 19:00 So this validator doesn't permit 19:02 two copies of the same 19:04 property, and it's actually kind 19:05 of a parsing thing and not 19:07 so much related to JSON's schema. 19:09 Many parsers actually do enforce 19:12 that labels or properties need 19:14 to be unique within objects, even 19:15 though technically syntactically correct 19:18 JSON does allow multiple copies. 19:20 So that's just something to remember, 19:22 the typical use of objects is 19:23 to have unique labels, sometimes 19:26 are even called keys of which evokes a concept of them unique. 19:30 So typically they are unique. 19:32 They don't have to be for syntactic validity. 19:34 Usually when you wanna have 19:35 repeated values, it actually makes more sense to create an array. 19:39 I've taken away the second year in order to make the JSON valid again. 19:41 Now let's take a look at months. 19:44 I've used months to illustrate 19:46 the enumeration constraint so we 19:48 saw that we could constrain the 19:50 values of integers, and we 19:52 saw that we can constrain strings 19:54 using a pattern, but we can 19:55 also constrain any type by 19:57 enumerating the values that are allowed. 19:59 So, for the month, we've set 20:00 it a string type which it 20:02 is but we've further constrained it 20:03 by saying that string must be 20:05 either January or February. 20:08 So, if we try to say 20:09 put in the string March, we 20:14 validate and we get the obvious error here. 20:17 We can fix that by changing the 20:18 month back, but maybe it 20:19 makes more sense that March 20:21 would be part of our enumeration type, 20:23 so we'll add March to 20:24 the possible values for months, and now we're good. 20:27 As a next example, let's take 20:28 a look at something that we 20:30 saw was syntactically correct but 20:31 isn't going to be semantically 20:33 correct, which is when 20:34 we have the author list 20:36 be a mixture of objects and strings. 20:39 So, let's put Jeffrey Ullman here just as a string. 20:43 We saw that that was still 20:44 valid JSON, but when we 20:46 try to validate now, we're gonna 20:47 get an error because we expected 20:49 to see an object, we have 20:50 specified that the authors 20:52 are objects, and instead we got a string. 20:54 Now JSON schema does allow 20:56 us to specify that we 20:58 can have different types of data 21:00 in the same context, and I'm 21:02 going to show that with a little bit of a simpler example here. 21:05 So, let's first take away our 21:06 author there so that we're back with a valid file. 21:09 And what I am going to look at is simply the year values. 21:13 So, let suppose for whatever 21:15 reason that in our 21:16 magazines, one of the 21:17 years was a string and the other year was an integer. 21:21 So that's not gonna work out 21:22 right now because we have 21:23 specified clearly that the year must be an integer. 21:27 In JSON schema specifications, when we 21:29 want to allow multiple types 21:31 for values that are 21:32 used in the same context, we 21:34 actually make the type be an array. 21:36 So instead of just saying 21:37 integer, if we put 21:38 an array here that has 21:40 both integer and string that's 21:42 telling us that our year 21:43 value can be either an 21:45 integer or a string 21:46 and now when we validate, 21:48 we get a correct JSON file. 21:50 That concludes our demo of JSON schema validation. 21:53 Again, we've just seen 21:54 one example with a number 21:56 of the constructs that are available 21:58 in JSON schema, but it's not 21:59 nearly exhaustive, there are many 22:01 others, and I encourage you 22:02 to read a bit more about it. 22:04 You can download this data and 22:06 this schema as a starting 22:07 point, and start adding things playing around 22:09 and I think you'll get a 22:10 good feel for how JSON 22:12 schema can be used to 22:13 constrain the allowable data in a JSON file.