0:00 In this demo, we'll be learning some more features of the SQL language. 0:04 Specifically, we'll be learning about 0:05 table variables and about set operators. 0:09 We already learned the basic select statement. 0:12 which can be quite powerful for 0:13 writing queries but we'll learn 0:14 some constructs in these demos 0:16 that will give us even more expressive power. 0:18 The first construct is table variables. 0:21 Table variables are in the FROM clause, and they actually serve two uses. 0:24 One is simply to make queries more readable, as we'll see. 0:28 But a second purpose is to 0:29 rename relations that are used in the FROM clause, 0:32 particularly when we have two instances of the same relation. 0:35 This is exactly what we 0:37 needed in the relational algebra when 0:39 we wrote joins that included 0:40 two instances of the same relation. 0:43 The second construct we'll be 0:44 learning, actually a set of constructs, 0:46 in this video, are the set operators. 0:48 And we'll be learning the same 0:49 three set operators we had 0:50 in relational algebra: the union 0:52 operator, the intersect operator, 0:54 and the except operator which is the minus operator. 0:58 We'll be doing a demo and 1:00 the demo will use the 1:01 same college admissions database that 1:02 we've been using in previous demos where 1:04 we have tables about college 1:05 information, student information and students applying to colleges. 1:10 Let's move to the demo. 1:11 Let's start with a big 1:13 join query that we'll use to introduce table variables. 1:16 This query involves all three relations. 1:18 It joins the three relations on 1:19 their shared attributes and then it selects a bunch of information. 1:23 So here we see the result of that query. 1:25 So the main point of this 1:26 query is not the result but 1:27 just to show you how table variables are used in the FROM clause. 1:31 We can add two, each of our relation names, a variable. 1:34 We'll use S for student 1:36 and C for college and A for apply. 1:38 And then everywhere else in the 1:40 query, instead of writing the 1:41 full relation name we can just use the variable. 1:44 In this case we're not changing the 1:46 expressiveness, we're not changing the 1:47 outcome of the query, we're really 1:49 just making it a bit 1:50 more readable and we can do 1:51 the same thing here in this left clause. 1:54 We'll take S and A and so on. 1:57 Then we'll run the query and we'll get exactly the same result, no change. 2:01 Now let's look at 2:02 where table variables are actually useful. 2:04 What we want to get 2:06 in this query is all pairs 2:07 of students who have the same 2:09 GPA. This is kind of 2:10 similar to the relational algebra 2:12 query we did where we found 2:13 all pairs of colleges that are in the same state. 2:16 In order to do that we 2:17 need to have two instances of the student relation. 2:20 So we'll call one instance, S1, and we'll call the other instance S2. 2:23 And the FROM will do the 2:25 cross-product of those two, 2:26 so it will consider every 2:27 every possible pair of students from the student relation. 2:30 From all those pairs, 2:32 we'll take the pairs where the 2:34 student had the same GPA 2:35 and will return the ID, 2:36 name, and GPA for each of the two students. 2:39 So let's go ahead and 2:40 execute the query; and here we can see the result. 2:44 Now, this result is exactly what we wrote. 2:47 It literally is every pair 2:49 of students that have the same 2:50 GPA, but it might not be what we intended. 2:53 Amy and Amy, the same student. 2:55 Well Amy has the same GPA 2:57 as herself, but more likely 2:59 we just wanted different students who had the same GPA. 3:02 So to do that we'll add 3:03 an and that says these are two different students. 3:07 The SIDs of the students are different. 3:12 Now, let's run the query and see what happens. 3:15 Now, we see that we no 3:16 longer have Amy and Amy, 3:18 and every student is paired with a different student. 3:20 We do have two Amy's here, 3:21 but don't be alarmed, this Amy 3:23 is 123 and this Amy is 654. 3:25 So things are looking 3:27 quite a bit better, but there's 3:29 still one thing that we 3:31 might not want in the result 3:32 of the query which is that 3:33 we have Amy paired with 3:34 Doris and then we 3:36 have Doris paired with Amy. 3:37 So we're actually getting every pair 3:39 of students twice in the two different orders. 3:42 As it turns out, that's very easy to fix. 3:44 We only need to erase one character to make that work. 3:48 Maybe you can think about what that character is. 3:50 Here it is. 3:51 Instead of looking at not equals, we'll just make it less than. 3:55 And then we'll get every pair of 3:56 students only once because we'll 3:58 always be listing the one with 3:59 the smaller SID first and 4:01 finally we get the answer 4:03 that we probably intended in the first place. 4:06 Now let's take a look at the set operators and we'll start with union. 4:09 Just like in our relational algebra video 4:11 let's use the union operator to 4:13 generate a list that includes names 4:15 of colleges together with names of students. 4:17 So here's the query that will 4:18 do it for us and we 4:19 go ahead and execute the query and we see our result. 4:23 Now I left the 4:25 schema as having the 4:26 C name in the first part 4:27 of the union and the S name in the second. 4:29 SQL allowed me to do 4:30 that and it chose to use the C name to label the result. 4:34 If I want to unify the 4:36 schemas of the two sides 4:37 of the union and give a 4:38 new label for the result, I 4:40 use the "as" as we saw earlier for 4:42 re-naming attributes in the result of queries. 4:44 So I'll add "as name" to 4:45 both sides of the union, 4:47 run the query, and now I see name in the result. 4:51 Now one thing you might have 4:52 noticed is that this result is actually sorted. 4:55 We didn't ask for it 4:56 to be sorted, but for some 4:57 reason the system sorted it for us. 4:59 And I can actually explain why that happened. 5:01 I'll also mention that if I 5:03 ran this same query on another system, it might not come out sorted. 5:06 In fact, it will not come out sorted because I tried it. 5:09 Here's the deal. 5:10 The union operator in SQL 5:12 by default eliminates duplicates in its results. 5:15 So if we have two Amy's, 5:17 which in fact we do, we only get one Amy in our result. 5:20 And similarly, for Craig, we have two of those as well. 5:23 So that's the default, and 5:25 it so happens the system I'm 5:26 using today, which is called 5:27 SQLite, eliminates duplicates gets by sorting the result. 5:31 So, it sorts the result, looks for 5:33 adjacent pairs that are the 5:34 same and eliminates all but 5:35 one of those, and then it gives us the answer. 5:37 But again, I want to 5:39 emphasize that's not something one 5:40 can count on when one runs 5:42 the same query on a different 5:43 system or even on the same system on a different day. 5:47 Now, if we want 5:48 to have the duplicates in our 5:50 result, that's something we can do quite easily. 5:52 We add to union the 5:54 word all that will turn 5:55 the set operator into what's 5:57 technically a multi-set operator that retains duplicates. 6:01 We run the query. 6:02 Well, the first thing we notice is it's not sorted anymore. 6:05 That's because it didn't need to eliminate the duplicates. 6:07 But if we look closely, we'll also see that the duplicates are now there. 6:10 We have two Amys, for example, 6:12 and we have two Craigs as well. 6:15 If we want this result to 6:16 be sorted and to guarantee 6:18 that the other one's sorted, we would add an order by clause. 6:21 So we can just say order by name. 6:23 We run the query and now 6:25 we have the result in sorted order. 6:28 Our next query demonstrates the intersect operator. 6:31 This query is going to 6:32 get the IDs of all 6:34 students who have applied to both 6:35 CS for a major and EE for a major. 6:38 So, very simple query. 6:39 We get the IDs of students 6:40 who applied to CS, the 6:42 IDs of students who applied to EE, 6:44 and then we perform the intersect operator 6:46 on the result of those two queries. 6:48 We execute it, and we find that there are indeed two students who applied to CS and EE. Some database systems don't support the intersect operator. They don't lose any expressive power. We just have to write our queries in different ways. So, this next query is computing exactly the same thing. The sIDs of students who have applied to both CS and EE, but this time we're doing it by doing two instances of the apply relation. One of these self joins, so we have to use table variables again, so we take every pair of apply tuples, we look at cases where it's the same student, and in one case, they're applying for CS, in the other case they're applying for EE, and we'll return the sID of those students. So we run the query and we get sort of the same answer, but not exactly, because we have a whole bunch of duplicates now that we didn't get when we did it with an intersect operator. Now, where did those duplicates come from? Let's take a look at the apply relation itself. Here, we see that student 123 applied to 6:49 that there are indeed two students 6:51 who applied to CS and EE. 6:55 Some database systems don't support the intersect operator. 6:58 They don't lose any expressive power. 7:00 We just have to write our queries in different ways. 7:03 So this next query is computing 7:04 exactly the same thing, the 7:06 SIDs of students who have 7:07 applied to both CS and 7:08 EE, but this time 7:10 we're doing it by doing two 7:11 instances of the apply relation, 7:14 one of these self joins, so we have to use table variables again. 7:17 So we take every pair 7:19 of apply tuples, we look 7:20 at cases where it's the same 7:21 student, and in one case, they're applying for CS. 7:24 In the other case, they're applying for 7:26 EE, and we'll return the SID of those students. 7:29 So we run the query and we 7:31 get sort of the same answer, 7:32 but not exactly because we have 7:34 a whole bunch of duplicates now 7:35 that we didn't get when we did it with an intersect operator. 7:39 Now where did those duplicates come from? 7:41 Let's take a look at the 7:42 apply relation itself. 7:47 Here we see that student 7:48 123 applied to CS and to 7:49 EE and to CS again and to EE again. 7:52 And we're gonna get all 7:53 pairs of 123 tuples, 7:56 where one pair pair of 7:57 the tuples is CS and the other is EE. 8:00 So, we'll get CS with EE, CS with EE and so on. 8:04 Going back to our query result. 8:05 Here it is. 8:07 We can see that we got the four 123's when we ran the query. 8:11 Well, that's easy to get rid of. 8:12 We just write select distinct and 8:14 that will get rid of duplicates and 8:16 now we're back to our original query result. 8:19 Now instead of finding students 8:20 who applied to both CS and 8:22 EE, let's find students 8:23 who applied to CS but did not apply to EE. 8:26 For that we need the difference operator. 8:28 It's called difference in relational Algebra. 8:30 Sometimes, it's called minus. 8:31 The word that's used in the 8:32 SQL standard is the word except. 8:34 So here's our query. 8:35 We find the student IDs who 8:37 applied to CS and then 8:38 we take away from those the 8:40 IDs of students who applied to EE. 8:42 We run the query and we 8:43 find that there are three students 8:44 who applied to CS and not to EE. 8:48 Some database systems don't support 8:50 the except operator either and 8:51 here, things get a little tricky. 8:53 So, let's try to rewrite 8:55 that query without using the except operator. 8:58 So as a reminder, we want to 8:59 find students who applied to 9:01 CS but did not apply to EE. 9:03 So, here's my attempt at writing that query. 9:05 I again do a self join 9:07 of apply with apply, and I 9:09 find all pairs where it's 9:11 the same student we're talking about 9:13 and the major in one 9:14 of the tuples of CS and the major in the other one is not EE. 9:17 Well, it looks pretty good. 9:19 Let's see what happens. 9:21 Whoa, we got a lot of results. 9:22 Okay, well that's probably just that 9:24 problem with duplicates again, so 9:26 let's just add distinct and go for it. 9:31 It still seems like a lot of results. 9:33 Let's go back to our previous 9:34 query that uses except and 9:36 then we found that there were three students in the result 9:38 where here we're still getting five in the result. 9:41 Well, if we think about 9:43 exactly what we wrote, 9:44 what we wrote is finding 9:46 all pairs of apply 9:48 records where it's the same student 9:49 and they applied to CS in 9:51 one of the pairs and they didn't 9:52 apply to EE in the other. 9:54 So, it could be, for example, biology or geology. 9:58 But the problem is that 9:59 when we consider these pairs, that 10:01 doesn't mean there's not another 10:02 pair with the same student 10:04 where they applied to CS and EE. 10:06 All this is actually finding is 10:08 students who applied to CS 10:10 and also applied to another 10:12 major that's not EE. 10:13 So, that's quite different from the query we're shooting for. 10:16 And actually, the interesting thing 10:18 is that with the constructs we've 10:20 seen so far in SQL, it's 10:21 not possible to write 10:23 the query we had earlier without using the except operator. 10:27 But in later videos we will 10:28 see additional constructs in SQL that do allow us to write that query.