0:01 This is the second of two videos about the relational algebra. 0:04 In the first video, we learned 0:06 about the select and project operators in various types of joins. 0:10 This video will cover set operators, 0:12 union difference and intersection, the 0:14 renaming operator, and different 0:16 notations for expressions of relational algebra. 0:20 Just as a reminder, we apply 0:21 a relational algebra query or 0:23 expression to a set 0:25 of relations and we get 0:26 as a result of that expression 0:28 a relation as our answer. 0:30 For our examples, we're using 0:31 an imaginary database about college admissions. 0:34 We have a relation of colleges, 0:36 a relation of students, and a 0:37 relation with information about students applying to colleges. 0:41 We'll keep at the bottom of 0:42 the video these three depictions of 0:44 those relations with a few 0:46 abbreviations used so that names aren't too long. 0:49 Let's move ahead to our first operator. 0:52 The first of three set operators 0:54 is the union operator, and it's 0:55 a standard set union that 0:57 you learned about in elementary school. 1:00 Let's suppose, for example, that we 1:01 want a list of the 1:02 college and student names in our database. 1:04 So we just want those as list. 1:05 For example, we might want 1:07 Stanford and Susan 1:10 and Cornell and Mary 1:15 and John and so on. 1:17 Now you might think 1:19 we can generate this list by 1:20 using one of the operators we've 1:22 already learned for combining information 1:23 from multiple relations, such as 1:25 the cross-product operator or the natural join operator. 1:29 The problem with those operators 1:30 is that they kind of combine information 1:32 from multiple relations horizontally. 1:34 They might take a tuple T1 1:36 from one relation and tuple T2 1:38 from the other and kind of match them. 1:40 But that's not what we want to do here. 1:41 We want to combine the information vertically to create our list. 1:45 And to do that we're going to use is the union operator. 1:48 So in order to get a list 1:49 of the college names and the 1:50 student names, we'll project the college name from the college relation. 1:55 That gives us a list of college names. 1:57 We'll similarly project the student 1:59 name from the student relation, 2:02 and we've got those two lists 2:03 and we'll just apply the union 2:04 operator between them and that will give us our result. 2:07 Now, technically, in relational algebra 2:10 in order to union two lists 2:11 they have to have the same schema, 2:13 that means that same attribute name 2:14 and these don't, but we'll correct that later. 2:17 For now, you get the basic idea of the union operator. 2:20 Our next set operator is 2:22 the difference operator, and this one can be extremely useful. 2:25 As an example, let's suppose 2:26 we want to find the IDs 2:28 of students who didn't apply to any colleges. 2:30 It sounds like a complicated query, but 2:32 we'll actually write it in a very simple fashion. 2:35 We'll start by projecting the student 2:37 ID from the student relation 2:39 itself and that will 2:40 give us all of this student IDs. 2:42 Then lets project the student 2:44 ID from the apply relation 2:47 and that gives us the IDs 2:48 of all students who have applied somewhere. 2:51 All we need to do is 2:52 take the difference operator, written 2:54 with the minus sign, and that gives us the result of our query. 2:57 It will take all IDs of 2:58 the students and then subtract 3:00 the ones who have applied somewhere. 3:03 Suppose instead that we 3:05 wanted the names of the students 3:06 who didn't apply anywhere, not just their IDs. 3:09 So that's a little bit more complicated. 3:11 You might think, "Oh, just add 3:12 student name to the projection 3:14 list here," but if we 3:15 do that, then we're trying to 3:17 subtract a set that has 3:18 just IDs from a set that has the pair of ID names. 3:20 And we can't have 3:22 the student name here because the 3:23 student name isn't part of the apply relation. 3:26 So there is a nice trick, however, that's going to do what we want. 3:29 Let me erase these here. 3:31 What we're going to do is 3:32 we're going to take this whole 3:34 expression here which gives us 3:35 the student IDs who didn't apply anywhere and watch this. 3:38 Pretty clever. 3:39 We're gonna do a natural join 3:42 with the student relation. 3:43 And now, that's called a join back. 3:45 So we've taken the IDs, 3:47 a certain select set of IDs 3:48 and we've joined them back to the student relation. 3:51 That's going to give us a 3:52 schema that's the student relation 3:54 itself, and then we're just 3:55 going to add to that a 3:57 projection of the student name. 3:59 And that will give us our desired answer. 4:03 The last of the three set operators is the intersection operator. 4:07 So let's suppose we want 4:08 to find names that are both 4:09 a college name and a student name. 4:11 So perhaps, Washington is the name of a student and a college. 4:15 To find those, we're going to 4:16 do something similar to what we've done in the previous examples. 4:19 Let's start by getting the college names. 4:24 Then let's get the student names, 4:29 and then what we're going to do 4:30 is just perform an intersection of 4:32 those two expressions and that will give us the result that we want. 4:36 Now like our previous example, 4:37 technically speaking, the two 4:39 expressions on the two sides 4:41 of the intersection ought to have 4:42 the same schema and again, I'll 4:43 show you just a little bit 4:44 later, how we take care of that. 4:47 Now, it turns out 4:49 that intersection actually doesn't add 4:51 any expressive power to our 4:53 language and I'm going 4:54 to show that actually in two different ways. 4:56 The first way is that 4:57 if we have two expressions, let's 4:59 say E1 and E2 and 5:02 we perform their intersection, that 5:04 is exactly equivalent to 5:06 writing E1 minus, using the 5:08 difference operator, E1 minus E2. 5:12 Now if you're not 5:14 convinced of that immediately, let's go 5:15 back to Venn diagrams, again 5:17 a concept you probably learned early in your schooling. 5:19 So let's make a picture of two circles. 5:23 And let's say that the first 5:24 circle Circle represents the result of 5:26 expression E1 and the 5:28 second rear circle represents the 5:29 result of expression E2. 5:32 Now if we 5:33 take the entire circle E1. 5:36 Let's shade that in purple. 5:39 And then we take the 5:41 result, so that's E1 here, 5:43 and then we take E1, the 5:45 result of the expression E1 5:46 minus E2 here, we'll write 5:48 that in green, so that's everything 5:50 in E1 that's not in 5:51 E2, that's this. Okay? 5:55 And if we take the purple minus 5:57 the green you will see 5:59 that we actually do get the intersection here. 6:02 So that's a simple property 6:04 of set Operations but what 6:05 that's telling us is that 6:07 this intersection operator here isn't 6:08 giving us more expressive power because 6:10 any expression that we can 6:11 write in this fashion, we can equivalently 6:14 right with the difference operator in this fashion. 6:17 Let me show you a completely 6:18 different way in which intersection 6:20 doesn't add any expressive power. 6:22 So, let's go back to 6:23 E1 intersect E2 and as 6:27 a reminder for this to be 6:28 completely correct these have to 6:30 have the same schema as equal between the two. 6:34 E1 intersect E2 turns out 6:35 to be exactly the 6:37 same as E1 natural join 6:40 E2 in this particular case 6:42 because the schema is the same. 6:45 Remember what natural join does. 6:46 Natural join says that you 6:48 match up all columns that 6:50 are equal and you eliminate duplicate values of columns. 6:52 So I'll just let you work 6:53 out for yourself that this 6:55 is indeed an equivalence and 6:57 a second reason that the 6:59 intersection doesn't add any expressive power. 7:01 Nevertheless, the intersection can be 7:03 very useful to use in queries. 7:07 Our last operator is the rename operator. 7:10 The rename operator is necessary 7:12 to express certain queries in relational algebra. 7:14 Let me first show the form of the operator and then we'll see it in use. 7:18 The rename operator uses the Greek symbol rho. 7:21 And like all of our 7:22 other operators, it applies to 7:23 the result of any expression of relational algebra. 7:26 And what the rename operator does 7:28 is it reassigns the schema 7:31 in the result of E. So 7:32 we compute E, we get 7:33 a relation as a result, 7:34 and it says that I'm 7:36 going to call the result of 7:37 that, relation R with 7:40 attributes A1 through An and 7:41 then when this expression itself 7:44 is embedded in more complex expression, 7:46 we can use this schema to 7:48 describe the result of the 7:49 E. Again we'll see shortly why that's useful. 7:52 There are a couple of the abbreviations 7:54 that are used in the rename 7:55 operator, this form is the general form here. 7:58 One abbreviation is if we 8:00 just want to use the 8:01 same attribute names that came 8:02 out of E, but change the 8:04 relation name, we write row 8:05 sub R applied to E, 8:08 and similarly, if we want 8:10 to use just the attribute 8:12 names, if we 8:14 want to change, I'm sorry, just 8:16 the attribute names then we 8:17 write attribute list here 8:19 and it would keep the same relation name. 8:21 This form of course has 8:23 to have a list of attributes 8:24 or we would not be able to 8:25 distinguish it from the previous form. 8:27 But again these are just abbreviations 8:28 and the general form is the one up here. 8:31 Okay, so now let's see the rename operator in use. 8:36 The first use of the rename 8:37 operator is something I alluded 8:39 to earlier in this video 8:40 which is the fact that when 8:41 we do the set operators, 8:43 the union, difference, and intersect 8:46 operators, we do expect 8:48 the schemas on the two 8:49 the sides of the operator to 8:50 match, and in a couple 8:52 of our examples they didn't match, 8:53 and the rename operator will allow us to fix that. 8:56 So, for example, if we're 8:58 doing the list of college and 8:59 student names, and let me 9:00 just remind you how we wrote that query. 9:02 We took the C name from 9:03 college and we 9:05 took the s name from students 9:09 and we did the big union of those. 9:11 Now, to make this technically 9:13 correct, these two attribute names would have to be the same. 9:16 So we're just going to apply the rename operator. 9:19 Let's say that we're gonna 9:20 rename the result of this 9:22 first expression to say 9:24 the relation name C with attribute name. 9:26 And let's make the 9:28 result of the second expression similarly 9:31 be the relation C with attribute name. 9:33 And now we have two 9:34 matching schemas and then 9:36 we can properly perform the union operator. 9:38 Again, this is just a 9:39 syntactic necessity to have 9:41 well-formed relational algebra expressions. 9:44 Now, the second use of 9:46 the rename operator is a 9:47 little more complicated and quite 9:50 a bit more important actually which 9:52 is disambiguation in self 9:54 joins and you probably have no 9:55 idea what I'm talking 9:57 about when I say that, but let me give an example. 10:00 Let's suppose that we wanted 10:01 to have a query that finds 10:03 pairs of colleges in the same state. 10:05 Now, think about that. 10:06 So we want to have, for example, 10:08 Stanford and Berkeley and 10:11 Berkeley and UCLA and so on. 10:14 So that, as you 10:16 can see, unlike the union 10:18 operator, we're looking for this 10:19 horizontal joining here. So 10:21 we're going to have to combine essentially 10:23 two instances of the college relation. 10:26 And that's exactly what we're going to do. 10:28 We're effectively going to do college join 10:31 college making the state equal. 10:33 So, let's work on that a little bit. 10:35 So, what we wanna 10:36 do is we wanna have college 10:39 and we want to, let's just 10:41 start with, say, the cross-product of college. 10:45 And then we want to somehow say, 10:47 "Well, the state equals the state." 10:49 But that's not gonna work. 10:50 Which state are these? 10:52 And how do we describe the two instances of college? 10:55 So what we're going to 10:56 do and let me just 10:57 erase this, is we're going 10:59 to rename those two instances 11:01 of colleges so they have different names. 11:03 So we're going to take the first 11:05 instance of college here and 11:07 we're going to apply a rename operator to that. 11:09 And we'll call it C1 and we'll 11:11 say that that has name1, state1, and enrollment1. 11:16 And then we'll take the second instance here. 11:18 We'll call it C2, so N2, 11:21 S2, E2 of college and 11:24 now we have two different relations. 11:26 So what we can do is 11:27 we can take the cross-product 11:29 of those two like that, 11:31 and then we can select where 11:34 S1 equals S2, okay? 11:38 And that gives us pairs of college in the same state. 11:41 Actually, let me show you 11:42 an even trickier, simpler way of doing this. 11:45 Let's take away the selection operator 11:47 here, okay? 11:49 And let's take away this. 11:52 And let's make this into a natural join. 11:55 Now that's not gonna work quite 11:56 yet because the natural join 11:58 requires attribute names to 11:59 be the same, and we don't 12:00 have any attribute names that are the same. 12:03 So the last little trick we're 12:04 gonna do is we're 12:06 gonna make those two attribute names, 12:07 S, be the same. 12:10 And now when we 12:12 do the natural join, it's gonna 12:13 require equality on those two 12:15 S's and everything is gonna be great. 12:18 Okay? 12:18 Now, things are still a little bit more complicated. 12:22 One problem with this query 12:24 is that we are going to 12:25 get colleges paired with themselves. 12:27 So we're going to get from 12:28 this, for example, Stanford Stanford. 12:31 If you think about it, right? 12:33 Berkley Berkeley, as well as Stanford Berkeley. 12:41 Now, that's not really what we want presumably. 12:44 Presumably we actually want different colleges. 12:47 but that's pretty easy to handle, actually. 12:49 Let's put a selection condition here 12:51 so that the name one is not equal to name two. 12:56 Great. We took care of that. 12:57 So in that case we will 12:58 no longer get Stanford Standford and Berkeley Berkeley. 13:02 Ah, but there's still one more problem. 13:04 We'll get Stanford Berkeley but we'll also get Berkeley Stanford. 13:09 Now, let me 13:12 pause for a moment and see 13:13 if you can think of a 13:14 simple way to solve this problem. 13:17 Actually, there's a surprisingly simple way, kind of clever. 13:20 We're gonna take away this not 13:22 equals and we're going replace it with a less than. 13:24 And now we'll only 13:27 get pairs where the first one is less than the second. 13:30 So Stanford and Berkeley goes away and we get Berkley Stanford. 13:34 And this is our final query 13:36 for what we wanted to do here. 13:37 Now what I really wanted to 13:39 show, aside from some of the 13:40 uses of relational algebra, is 13:42 the fact that the rename operator 13:43 was for this query absolutely necessary. 13:46 We could not have done this query without the rename operator. 13:50 Now we've seen all the operators of relational algebra. 13:53 Before we wrap up the 13:54 video I did want 13:55 to mention that there are 13:56 some other notations that can 13:58 be used for relational algebra expressions. 14:00 So far we've just been writing 14:01 our expressions in a standard form 14:03 with relation names and operators 14:05 between those names and applying to those names. 14:08 But sometimes people prefer to 14:09 write using a more 14:10 linear notation of assignment statements 14:13 and sometimes people like to write the expressions as trees. 14:15 So I'm just gonna briefly show a couple of examples of those and then we'll wrap up. 14:19 So assignment statements are a 14:21 way to break down relational 14:23 algebra expressions into their parts. 14:25 Let's do the same query we 14:26 just finished as a big 14:27 expression which is the 14:28 pairs of colleges that are on the same state. 14:31 We'll start by writing two assignment 14:33 statements that do the rename 14:34 of the two instances of the college relation. 14:37 So we'll start with 14:39 C1 colon equals and we'll 14:40 use a rename operator and now 14:42 we use the abbreviated form that just lists attribute names. 14:45 So we'll see say C one, 14:47 S, E one of 14:49 college and we'll similarly 14:52 say that C2 gets the 14:54 rename, and we'll call it 14:56 C2SE2 of college, 14:59 and remember we use the 15:00 same S here so that we can do the natural join. 15:03 So, now we'll say 15:04 that college pairs gets C1 15:07 natural join C2, and 15:10 then finally we'll do our selection condition. 15:12 So our final answer will be 15:14 the selection where N1 is 15:16 less than N2 of CP. 15:19 And again, this is equivalent to 15:21 the expression that we saw on the earlier slide. 15:23 It's just a notation that sometimes 15:25 people prefer to modularize their expressions. 15:29 The second alternate notation I'm going to show is expression trees. 15:33 And expression trees are actually commonly used in relational algebra. 15:36 They allow you to visualize the structure 15:38 of the expression a little bit better. 15:40 And as it turns out when SQL 15:42 is compiled in database systems, 15:44 it's often compiled into an 15:45 expression tree that looks very 15:46 much like what I'm gonna show you right now. 15:49 So for this example let's suppose 15:51 that we want to find the 15:52 GPAs of students who are applying to CS in California. 15:55 So that's going to 15:57 involve all three relations because 15:59 we're looking at the 16:00 state is in California, and 16:02 we're looking at the student GPA's 16:04 and we're looking at them applying to CS. 16:07 So what we're going 16:08 to do is we're going to 16:09 make a little tree notation here 16:11 where we're going to first do 16:12 the natural join of these three relations. 16:14 So technically the expression 16:16 I'm going to show you is going to stop down here. 16:18 It's not going to actually have the tables. 16:20 So the leaves of the expression are 16:21 going to be the three relations: college, students, and apply. 16:24 And in relational algebra trees, the 16:26 leaves are always relation names. 16:28 And we're going to do the natural 16:30 join of those three which 16:31 as a reminder enforces equality of 16:34 the college name against the 16:35 college name here against the 16:36 college name here, and the 16:37 student ID here and the student ID here. 16:40 That enforcement means that we 16:41 get triples that are talking 16:42 about a student applying to a particular college. 16:45 And then we're going to apply to 16:47 that, and so that's going to 16:49 be written as a new note above this one in the tree. 16:51 The selection condition that says 16:54 that the state equals California and the major equals CS. 17:02 And finally, we'll put on 17:04 top of that the projection that gets the GPA. 17:08 okay? 17:08 Now actually this expression is 17:11 exactly equivalent to if 17:12 we wrote it linearly, project the 17:14 GPA, select etc. of 17:16 the three college join student, join apply. 17:19 I'm just abbreviating here. 17:20 That would be an equivalent expression. 17:22 But again, people often like 17:24 to use the tree notation 17:26 because it does allow you to 17:27 visualize the structure of the 17:28 expression, and it is used 17:30 inside implementations of the SQL language. 17:34 Let me finish up by summarizing relational algebra. 17:37 Let's start with the core constructs of the language. 17:40 So a relation name is 17:42 a query in relational algebra, 17:45 and then we use operators that combine relations and filter relations. 17:48 So we have the select operator 17:50 that applies a condition to the result of an expression. 17:53 We have the project operator that 17:55 gives us a set of 17:57 attributes that we take from the result of an expression. 18:00 We have the expression one 18:03 cross-product expression two. 18:05 And again those can be any expressions. 18:08 Then we have expression one 18:10 union expression two. 18:13 And we have expression one minus expression two. 18:16 And finally we have 18:18 the rename operator that takes 18:20 an expression and renames the 18:22 result of that, the 18:24 schema in the result of that expression. 18:29 Now, you probably noticed that 18:30 I skipped a few of our 18:32 favorite operators, but this 18:33 is the core of the language 18:35 and all the other operators are 18:36 actually abbreviations that don't 18:38 increase the expressive power of 18:40 the language but they can be very useful performing queries. 18:43 And the abbreviations that we 18:45 learned were expression one 18:47 natural join expression two. 18:49 They were expression one 18:51 theta join expression two. 18:55 And, finally, expression one intersect expression two. 18:58 All of those where we had 18:59 a method of rewriting them using the core operators. 19:03 Just a small aside about parentheses. 19:06 A parentheses are used in 19:07 relational expressions for, relational 19:09 algebraic expressions, for disambiguation, similarly to arithmetic expressions. 19:13 I was a little cavalier about whether 19:15 I included parentheses or not, 19:17 but as you write your relational 19:18 algebra expressions you will 19:20 see that it's pretty straightforward 19:22 to figure out when disambiguation is needed. 19:25 So to conclude relational algebra, 19:27 entirely it's a formal language. 19:28 It's based on sets, set 19:30 operators and other operators 19:32 that combine data from multiple relations. 19:34 It takes relations as input, 19:36 it produces relations as answers 19:38 and it does form the formal 19:40 foundation of implemented relational database management.