0:00 Now let's talk about 0:01 duplicate values in the results of relational algebra queries. 0:05 Let's suppose we ask 0:06 for a list of the majors that 0:08 people have applied for and the decision for those majors. 0:10 So we write that as the 0:11 project of the major 0:14 and the decision on the applied relation. 0:19 You might think that when we get 0:20 the results of this query, we're going to have a lot of duplicate values. 0:23 So we'll have CS yes, CS 0:25 yes, CS no, EE yes, EE no, and so on. 0:28 You can imagine in a 0:29 large realistic database of applications, 0:31 there's going to be hundreds of 0:32 people applying for majors and having a yes or a no decision. 0:36 The semantics of relational 0:38 algebra says that duplicates are always eliminated. 0:42 So if you run a query 0:43 that would logically have a 0:44 lot of duplicate values, you just 0:45 get one value for each result. 0:48 That's actually a bit of a difference with the SQL language. 0:50 So, SQL is based on 0:52 what's known as multi-sets or 0:54 bags and that means 0:55 that we don't eliminate duplicates, whereas 0:58 relational algebra is based 0:59 on sets themselves and duplicates are eliminated. 1:02 There is a multi-set or 1:04 bad relational algebra defined as 1:06 well but we'll be fine by 1:07 just considering the set relational algebra in these videos. 1:11 Our first operator that combines two 1:13 relations is the cross-product operator, 1:15 also known as the Cartesian product. 1:17 What this operator does, is it 1:19 takes two relations and kinda 1:20 glues them together so that 1:21 their schema of the result 1:23 is the union of the 1:24 schemas of the two relations and 1:26 the contents of the result are every 1:27 combination of tuples from those relations. 1:30 This is in fact the normal 1:31 set cross product that you 1:32 might have learned way back in the elementary school. 1:35 So let's talk about, say, 1:36 doing the cross products of students and apply. 1:40 So if we do this 1:41 cross products, just to 1:43 save drawing, I'm just gonna glue 1:45 these two relations together here. 1:47 So if we do the 1:48 cross product we'll get at the 1:50 result a big relation, here, which 1:52 is going to have eight attributes. 1:54 The eight attributes across the student 1:55 and apply now the only 1:57 small little trick is that 1:59 when we glue two relations together 2:01 sometimes they'll have the same 2:02 attribute and we can see we have SID on both sides. 2:04 So just as a notational convention, 2:07 when cross-product is done 2:08 and there's two attributes that are 2:10 named, they're prefaced with the name of the relation they came from. 2:13 So this one would be referred 2:14 to in the cross-product as 2:15 the student dot SID where this 2:18 one over here would be referred 2:19 to as the apply dot SID. 2:22 So, again, we glue together in 2:23 the Cartesian product the two 2:25 relations with four attributes each, 2:26 we get a result with eight attributes. 2:29 Now let's talk about the contents of these. 2:31 So let's suppose that the student 2:33 relation had s-tuples in it 2:36 and that's how many tuples, while 2:37 the apply had 8 tuples in 2:39 it, the result of the 2:41 Cartesian products is gonna 2:42 have S times A tuples, 2:44 is going to have one tuple 2:46 for every combination of tuples 2:48 from the student relation and the apply relation. 2:52 Now, the cross-product seems 2:54 like it might not be that 2:55 helpful, but what is 2:56 interesting is when we use 2:58 the cross-product together with other operators. 3:00 And let's see a big example of that. 3:02 Let's suppose that we want 3:04 to get the names and GPAs of 3:06 students with a high school 3:07 size greater than a thousand who 3:09 applied to CS and were rejected. 3:11 Okay, so let's take a look. 3:13 We're going to have to access 3:15 the students and the apply 3:16 records in order to run this query. 3:19 So what we'll do is we'll 3:20 take student cross apply as our starting point. 3:23 So now we have 3:25 a big relation that contains 3:27 eight attributes and all of those tuples that we described previously. 3:31 But now we're going to 3:32 start making things more interesting, because 3:33 what we're going to do is 3:35 a big selection over this relation. 3:37 And that selection is first 3:38 of all going to make sure 3:39 that it only combines student and 3:41 apply tuples that are referring to the same student. 3:44 So to do that, we write 3:46 student dot SID equals 3:49 apply dot SID. 3:50 So now we've filtered the result 3:52 of that cross-product to only 3:54 include combinations of student 3:56 and apply by couples that make sets. 3:58 Now we have to do a little bit of additional filtering. 4:00 We said that we want the 4:01 high school size to be 4:03 greater than a thousand, so 4:04 we do an "and" operator in the high school. 4:07 We want them to have applied 4:09 to CS so that's and major equals CS. 4:11 We're getting a nice big query here. 4:13 And finally we want them to 4:15 have been rejected, so "and 4:16 decision" equals, we'll just be using R for reject. 4:21 So now, we've got that gigantic query. 4:23 But that gets us exactly what 4:25 we want except for one more thing, 4:26 which is, as I said, all we want is their names and GPAs. 4:29 So finally we take a 4:31 big parentheses around here and 4:33 we apply to that the 4:34 projection operator, getting the 4:36 student name and the GPA. 4:40 And that is the relational 4:41 algebra expression that produces 4:43 the query that we have written in English. 4:46 Now we have seen how the