0:00 Now we have seen how the 0:01 cross product allows us to 0:02 combine tuples and then 0:04 apply selection conditions to 0:06 get meaningful combinations of tuples. 0:09 It turns out that relational algebra 0:10 includes an operator called 0:12 the natural join that is 0:13 used pretty much for the exact purpose. 0:15 What the natural join does is 0:17 it performs a cross-product 0:18 but then it enforces equality 0:21 on all of the attributes with the same name. 0:23 So if we set up our 0:24 schema properly, for example, 0:26 we have student ID and student 0:28 ID here, meaning the same 0:30 thing, and when the cross 0:31 product is created, it's only 0:33 going to combine tuples where the student ID is the same. 0:36 And furthermore, if we add 0:37 college in, we can 0:39 see that we have the college name here and the college name here. 0:41 If we combine college and apply 0:44 tuples, we'll only combine tuples that are talking about the same college. 0:47 Now in addition, one more 0:49 thing that it does is it 0:50 gets rid of these pesky attributes that have the same names. 0:54 So since when we combine, 0:55 for example, student and apply 0:57 with the natural join, we're only 0:59 combining tuples where the 1:01 student SID is the same as the apply SID. 1:04 Then we don't need to keep two 1:05 copies of that 1:07 column because the values are always going to be equal. 1:11 So the natural join operator 1:13 is written using a bow 1:15 tie, that's just the convention. 1:18 You will find that in your text editing programs if you look carefully. 1:21 So let's do some examples now. 1:24 Let's go back to our same 1:26 query where we were finding 1:27 the names and GPAs of students 1:29 from large high schools who applied to CS and were rejected. 1:33 So now, instead of using 1:34 the cross-product we're gonna 1:35 use the natural join, which, as I said, was written with a bow tie. 1:39 What that allows us to 1:41 do, once we do that natural 1:42 join, is we don't have 1:43 to write that condition, that enforced 1:45 equality on those two 1:47 attributes, because it's going to do it itself. 1:49 And once we have done that then 1:50 all we need to do is 1:51 apply the rest of our conditions, 1:53 which were that the high school 1:54 is greater than a thousand and the 1:56 major is CS and the decision 1:59 is reject, again we'll 2:01 call that R. And then, 2:03 since we're only getting the names 2:05 and GPAs, we write the 2:07 student name and the GPA. 2:11 Okay. 2:12 And that's the result of the query using a natural join. 2:14 So, as you can see that's a 2:15 little bit simpler than the original 2:16 with the cross-product and by setting 2:18 up schemas correctly, natural join can be very useful. 2:21 Now let's add one more complication to our query. 2:24 Let's suppose that we're only interested 2:26 in applications to colleges where the enrollment is greater than 20,000. 2:29 So, so far in 2:31 our expression we refer to the 2:32 student relation and the apply 2:33 relation, but we haven't used the college relation. 2:36 But if we want to have a 2:38 filter on enrollment, we're going to have 2:39 to bring the college relation into the picture. 2:42 This turns out to perhaps be easier than you think. 2:45 Let's just erase a couple 2:47 of our parentheses here, and what 2:49 we're going to do is we're 2:50 going to join in the 2:52 college relation, with the two relations we have already. 2:55 Now, technically, the natural 2:59 join is the binary operator, people 3:01 often use it without parentheses 3:03 because it's associative, but if 3:04 we get pedantic about it we could add that and then we're in good shape. 3:07 Now we've joined all three relations together. 3:10 And remember, automatically the natural 3:12 join enforces equality on the shared attributes. 3:15 Very specifically, the college 3:17 name here is going to 3:18 be set equal to the apply college name as well. 3:21 Now once we've done that, we've got all the information we need. 3:24 We just need to add one 3:25 more filtering condition, which is 3:27 that the college enrollment is greater than 20,000. 3:30 And with that, we've solved our query. 3:35 So to summarize the 3:36 natural join, the natural join combines relations. 3:40 It automatically sets values equal 3:42 when attribute names are the 3:43 same and then it removes the duplicate columns. 3:46 The natural join actually does not 3:48 add any expressive power to relational algebra. 3:52 We can rewrite the natural join without it using the cross-product. 3:56 So let me just show that rewrite here. 3:58 If we have, and now I'm 4:00 going to use the general case of two expressions. 4:02 One expression, natural join 4:05 with another expression, that is 4:06 actually equivalent to doing 4:09 a projection on the 4:11 schema of the first 4:13 expression - I'll just 4:15 call it E1 now - union 4:17 the schema of the second expression. 4:18 That's a real union, so that 4:20 means if we have two copies we 4:21 just keep one of them. Over 4:23 the selection of. Now we're 4:26 going to set all the shared attributes 4:28 of the first expression to 4:30 be equal to the shared attributes of the second. 4:31 So I'll just write E1, A1 4:33 equals E2, A1 4:36 and E1, A2 equals E2 dot A2. 4:42 Now these are the cases where, 4:44 again, the attributes have the same names, and so on. 4:48 So we're setting all those equal, 4:50 and that is applied over 4:52 expression one cross-product expression two. 4:55 So again, the natural 4:58 join is not giving us 4:59 additional expressive power, but it is very convenient notationally.