0:00 In this video, we'll be learning about null values in SQL. 0:04 As usual, we'll be considering a 0:05 simple college admissions database, and 0:07 we'll be exploring the select 0:09 from where query expressions. 0:11 Now instead of extending what 0:13 we can write in queries, in 0:14 this video we'll be looking at extending what's in the data. 0:17 Specifically, in relational databases, 0:20 unless specified otherwise, any value 0:23 in an attribute can take on the special value null. 0:26 Null is usually used to 0:27 mean that the value is undefined or unknown. 0:30 For example, we might have 0:31 a student whose GPA is unknown 0:33 or who doesn't have a GPA, 0:35 and we would represent that in our data with null. 0:38 Or we might create our 0:40 reply relation with the decisions 0:42 not yet decided, and those 0:43 decision values would also be null. 0:45 So what we're going to look 0:46 at in our demo queries is what 0:48 happens when we have null 0:49 values and we run queries over the database. 0:51 Here is the set of students in our database. 0:55 In order to explore what happens 0:56 when we have null values, let's insert 0:59 two new students, Kevin and 1:01 Laurie, each of which have a null GPA. 1:05 We go back and we look 1:06 at our student table and now 1:08 we see Kevin and Laurie as 1:09 part of that table, and in 1:11 this particular interface null values are depicted as blanks. 1:15 Now let's run some queries. 1:17 Our first query finds students whose GPA is greater than 3.5. 1:21 It's a simple query to write. 1:23 We run the query and we find a number of students in our result. 1:27 We don't have Kevin and Laurie, 1:28 because their GPAs are null, 1:30 so we can't determine that they're greater than 3.5. 1:33 Let's run another query 1:35 that says GPA less than or equal to 3.5. 1:38 We'll get a different set of students, 1:40 as we see, but we still don't get Kevin and Laurie. 1:44 Now let's run a query 1:45 where we ask for the GPA 1:49 to be greater than 3.5 or less than or equal to 3.5. 1:51 And this is the one that's a little bit funny. 1:53 Actually, we talked about this 1:54 very same query back when we were talking about the relational model. 1:58 When you look at the query, 1:59 you would think that you 2:00 would get every student in the 2:02 database, because everybody's GPA obviously 2:04 is either greater than 3.5 or less or equal to 3.5. 2:06 Or, from a more logical point 2:09 of view, this is an 2:10 expression that's always true. 2:13 But you might not be 2:14 surprised when we run the 2:15 query, we still don't get 2:17 Kevin or Laurie, because we can't 2:18 determine that their GPA satisfies 2:20 either of these conditions. 2:22 So even when we have 2:23 a query that looks like the 2:25 where clause is a tautology, 2:27 meaning a logical expression that's always 2:29 true, we still don't get all the data in the result. 2:32 If we want this query to 2:33 return all students in the 2:35 database, we can do 2:36 that by adding one more clause 2:38 to the where, which is or GPA is null. 2:42 Now this is truly a condition 2:44 that will be matched by every tuple, 2:46 because either the GPA is greater 2:48 than 3.5, less than or equal 3.5, or it's a null value. 2:50 "Is null" is a keyword 2:52 phrase in SQL that matches values that are null. 2:55 Now when we run the query, 2:56 we finally get all of 2:57 the students in the database, including Kevin and Laurie. 3:01 Just to drive the point home 3:02 a bit further, let's continue looking 3:04 at queries where we have 3:05 null values involved in where clauses. 3:08 In this query, we're looking for students 3:09 where their GPA is greater than 3:11 3.5 or their size high school is less than 1600. 3:13 So let's run the query and see who we get. 3:17 We get all of these students, including Kevin. 3:20 So although his GPA is 3:21 null and therefore we can't 3:22 determine that it satisfies this 3:24 part of the condition, because since 3:25 his high school size is 1500, 3:27 he does satisfy the second 3:28 part of the condition and he does come out in our result. 3:32 Now let's add one more 3:33 condition, which is or size 3:36 high school is greater than or equal to 1600. 3:39 So now we have again 3:41 something in the condition that looks like a tautology. 3:43 It looks like it should always be true. 3:45 Everyone's high school is either smaller or larger than 1600. 3:47 Furthermore, in this case, we 3:50 have no null values for 3:52 the high school size, so when 3:54 we run this query, we should 3:56 get back everybody, and indeed 3:57 when we run the query, we do get back all of the students. 4:01 Now, I'm not going to go into details here because you can certainly read about it. 4:04 The way the where clause is 4:06 evaluated when null values 4:07 are involved is using a 4:09 three valued logic where every 4:11 expression is either true or false or unknown. 4:15 These logical values are combined 4:17 to finally yield a value 4:18 for the where expression that determines whether a couple is in the result. 4:22 Our last series of queries 4:24 explores the interaction between null 4:25 values and aggregate functions, specifically 4:28 in this case the count function. 4:29 There are actually quite a 4:30 few subtleties about null values 4:32 and aggregation and also null values and sub-queries. 4:35 We're not going to do an exhaustive 4:37 exploration of these subtleties 4:38 here but just give one example. 4:40 And we encourage you if you 4:42 are using a database with null 4:43 values to be very careful 4:44 when you write your queries to make 4:46 sure you understand how the null 4:47 values are going to influence the result. 4:50 OK, so our query here is 4:52 counting the number of students 4:53 whose GPA is not null for starters. 4:55 So we run the query and 4:57 we see that there twelve students who have a non-null GPA. 5:00 Those are the 12 students we had 5:01 originally in the database before we 5:03 added Kevin and Lorrie with their null GPA's. 5:06 Now instead of just counting 5:08 the students, let's count the distinct GPAs. 5:12 So among these twelve students, how many different GPAs are represented? 5:16 It turns out there are seven 5:17 distinct GPAs among those that are not null. 5:20 Let's further drop the 5:22 not null condition and just 5:24 count the the distinct GPAs among the students. 5:27 That will tell us whether count distinct 5:29 includes nulls or not in its count. 5:32 We'll run the query and we 5:33 get the same answer so we 5:34 still have 7 GPAs and 5:36 we are not counting the null values. 5:39 Now, let's do one 5:41 last query, which is to take 5:43 away the and just look 5:44 at the distinct GPAs themselves. 5:46 So, we had seven when we did the count. 5:48 We run the query now and we get eight. 5:51 What happened? 5:51 Well this first tuple is hard with the visualization. 5:54 This first answer is 5:55 empty and that's actually the null value. 5:58 So when we right select the distinct GPA. 6:00 We do include the null 6:01 value in our result if there is a null GPA. 6:04 However, when we count the distinct values, we don't include the null. 6:07 And that's really just one 6:09 example of the type of 6:11 subtleties and possibly unexpected 6:13 behavior we might get when we have null values. 6:16 So again, I encourage you to 6:17 be very careful when writing queries 6:19 over databases that include nulls 6:21 that you know exactly what you're going to get in your result.