0:00 So I've alluded a few 0:01 times to the fact that functional 0:02 dependencies generalize the notion of keys. 0:04 And let's make that connection explicit now. 0:07 Let's suppose we have a relation 0:08 R and R has no duplicate tuples. 0:11 R has some attributes A and it has some other attributes. 0:14 Let's call those B. And let's 0:16 suppose that we have a functional dependency that 0:17 A determines all of the attributes in the relation. 0:21 Now let me draw a picture of that. 0:23 So here's our relation R with 0:24 attributes A and attributes B. 0:26 And now let's suppose that 0:28 we have two tuples with the 0:29 same values for A. Now 0:31 we'll just write those as little a bar. 0:33 And now let's see what happens with the B values. 0:36 We'll make them B1 bar and B2 bar. 0:39 Because we have the functional 0:41 dependency, the equal values 0:43 here for A say that B1 and B2 have to be equal. 0:46 So B1 equals B2. 0:47 Let's just erase the little one and two. 0:50 But now we've generated duplicate tuples. 0:52 So what the functional 0:53 dependency tells us in 0:54 that case is that these two tuples 0:56 are actually the same, or rather 0:57 we cannot have two separate 0:59 tuples with the same A values. 1:01 So we cannot have two separate 1:03 tuples with the same A values 1:04 is exactly what it means 1:06 for A to be a key. 1:09 Now again, this is only the case when we have 1:10 no duplicates in R. But 1:12 if we have no duplicates, if 1:13 a set of attributes determines 1:15 all the other attributes, then those attributes are a key. 1:20 So here are a few other definitions related to functional dependencies. 1:23 We have a notion of a trivial functional dependency. 1:26 A functional dependency is trivial 1:28 A to B if B 1:30 is a subset of A. And 1:31 let's draw a little picture of what that means. 1:34 So here we have our attributes 1:36 A, and then we're saying 1:37 and that's all of the attributes 1:39 here, and what we're saying is 1:40 that B is a subset 1:42 of A. So in other words, 1:43 some portion of these attributes 1:45 here, we'll just mark that in 1:46 purple here are attributes 1:48 B. Well, it's pretty obvious 1:50 that if two tuples have 1:51 the same values across their 1:53 entire expanse here for A's, 1:55 then obviously they're also going 1:57 to have the same values for 1:58 just this portion here, the B portion. 2:00 So that's why it's called the trivial functional dependency. 2:04 So a nontrivial functional dependency 2:06 is a functional dependency that's not a trivial one. 2:08 By the way, FD is a common abbreviation for functional dependency. 2:12 So if we have A determines 2:13 B, then that is 2:14 non-trivial if it's not 2:16 the case that B is a 2:17 subset of A. Going to 2:19 our picture, let's have here 2:21 our attributes A. And now 2:23 we're saying there are some 2:24 attributes in B that are 2:26 not part of A. So 2:27 we can say, well maybe B 2:28 is partially part of A, 2:30 but there's some portion that is 2:31 not part of A. So 2:33 let's say that these are our B attributes here. 2:35 So now our functional dependency 2:37 is actually saying something. 2:38 It's saying we have two 2:40 attributes that agree in these 2:42 values, then they're also 2:44 going to agree in these values over here. 2:46 And the last definition is 2:48 a completely nontrivial functional dependency, 2:51 and that's A determines B 2:53 where A and B have no intersection at all. 2:56 So in that case, again, going 2:57 back to our picture, we'll have 2:59 our A attributes here and 3:01 then our B attributes are going 3:03 to be some portion of the remaining attributes. 3:05 And here we're saying a lot. 3:07 We're saying if these two have 3:08 the same value, then these two have the same value as well. 3:11 And the reality is that completely 3:13 nontrivial functional dependencies are the 3:15 ones that we're most interested in specifying. 3:18 I mentioned that there are some 3:19 rules that apply to all 3:20 functional dependencies and I'll give a couple of those right now. 3:23 The first one is the splitting rule. 3:25 The splitting rules says that if 3:26 we have a set of attributes 3:27 that determine another set of 3:29 attributes - and this time I'm 3:30 going to write it out instead of using 3:31 the bar notation - then 3:33 we also have...this implies that 3:36 we have A determines B-1 3:38 and A determines B-2 and so on. 3:42 In other words, we can split the right side of the functional dependency. 3:45 And if you think about it, 3:46 this is pretty If we say 3:48 that the when the A value 3:49 are the same all of the 3:50 B values have to be the 3:51 same then certainly when the 3:53 A values are the same the B values have to be the same independently. 3:56 Now you might wonder if the splitting rule also goes the other way. 3:59 So let's say we have, I'll 4:01 put the left hand side, I'll 4:02 write out the attributes explicitly so 4:04 let's say we have A-1 4:06 through A-N determines B then is 4:08 it from that the case 4:09 that A-1 determines B and 4:12 A-2 determines B on its own, and so on. 4:15 Well, the answer to that is no. 4:17 And I'll give a simple example from our college application database. 4:21 So let's say that we 4:22 have the functional dependency high school 4:24 name and high school 4:26 city determines high school code. 4:29 We talked about that one before. 4:31 Oops, here, that's an arrow there. 4:33 So that says that when we 4:34 have a particular name and 4:36 city combination for a high 4:37 school, that's identifying a single 4:39 high school, and so it will always have the same code. 4:42 So that makes a lot of sense 4:43 but it is not the case, 4:45 I'll put a big X here, 4:47 necessarily, that if we 4:48 split the left hand side 4:50 that high school name alone will determine a high school code. 4:53 So for example, I would 4:54 venture to guess that there 4:55 are a lot of Jefferson High Schools 4:57 all over the country and they 4:59 won't all will be the same high school. 5:01 So it's really the combination of attributes 5:02 on the left that together functionally 5:05 determine the right hand side 5:07 and so we do not then have 5:08 the splitting rule as a general principle. 5:10 The combining rule is the inverse of the splitting rule. 5:13 It says if we have A 5:15 determines B-1 and we have 5:16 A determines B-2 and so 5:18 on up to A determines 5:20 B-N, then we also 5:22 have A determines B-1 through B-N together. 5:28 Next, we have two trivial dependency rules. 5:30 Let me remind remind you what a trivial dependency is. 5:32 It's A determines B where 5:34 B is a subset of A. 5:36 So in other words, every left hand 5:37 side determines itself or any 5:39 subset of itself, and that's 5:41 what drives the two trivial dependency rules. 5:43 One of them says that if 5:44 we have A determines B, 5:46 then we also have A determines 5:49 A union B, so in 5:50 other words we can add to 5:52 the right hand side of every 5:53 dependency what's already on the left hand side. 5:55 Sort of as a converse, 5:57 we can also say that if 5:58 A determines B then A 6:01 determines A intersect B. 6:04 Actually this one is also 6:06 implied by the splitting rule. 6:07 So we have two different rules in this case that are doing the same thing. 6:11 And finally the transitive rule, which is the one we alluded to earlier. 6:14 It says if we have A 6:15 determines B and and 6:17 we have B determines C, then 6:19 we have A determines C. 6:21 Now all of these rules 6:22 can actually be proven formally 6:24 and I'm going to go through 6:25 a sketch of this particular one. 6:27 So here's my relation R and 6:29 I'll have attributes A, B, 6:31 C and then let's let D be the left of our attributes. 6:35 And my goal is to prove 6:37 that A determines C. And 6:39 the information I have to 6:40 prove that is these two functional dependencies here. 6:43 So to prove that A determines 6:44 C, I have to 6:46 prove that if two tupples 6:47 have the same A values--we'll put 6:49 little bars there--then they also have the same C values. 6:53 So I need to show that these two C values here are going to be the same. 6:56 So you can see what's going to happen. 6:58 Using the first functional dependency, 7:00 because these two A values 7:01 are the same, I know their B values must be the same. 7:05 And then I just use the second 7:06 functional dependency and because 7:08 the two B values are the 7:09 same, I then know that 7:10 the two C values are the 7:11 same, and that has 7:13 shown that this functional dependency holds. 7:15 And you can do a 7:16 similar thing to prove the other rules to yourself. 7:19 Now, I'm going to introduce the concept of closure of attributes. 7:22 Let's suppose we're given a 7:23 relation, a set of 7:25 functional dependencies for that 7:26 relation, and then a set 7:28 of attributes A bar that are part of that relation. 7:32 I'm interested in finding all attributes 7:33 B of the relation such 7:35 that A bar functionally determines 7:37 B. And this is what's 7:39 called the closure, and I'll 7:40 show in a bit why we might want to compute that. 7:43 Notationally the closure is written using the + sign. 7:46 So the closure of A bar is A bar +. 7:48 Let me be a little 7:50 more explicit, let me write 7:51 out A bar, because remember 7:52 whenever we write bar, we're actually talking about a list of attributes. 7:56 So we're going to write it 7:57 as A-1 through A-N, and I'm interested 7:59 in computing the closure of that set of attributes. 8:02 In other words, the entire set 8:04 of attributes that are functionally 8:05 determined by the attributes A-1 through A-N. 8:07 And I'm going to give an algorithm for doing that. 8:11 My algorithm says start with the set itself. 8:14 So I'm going to start with 8:16 A1 through AN, except I'm not going 8:19 to close that. 8:20 I'm going to leave a little space there. 8:21 And then I'm going to repeat until there's no change. 8:25 I'm going to add attributes to 8:27 that set until I get to the closure. 8:30 So I'm going to repeat. 8:31 If A determines B 8:34 - and now we'll put bars 8:35 in here - and all 8:37 of A are in the set, 8:39 then add B to the set. 8:43 So I might have my 8:44 attributes here, A1 through AN, 8:46 and it might be the case that 8:48 A, you know, 4 determines 8:49 attributes C and D, 8:51 so I'll add C and D to the set. 8:53 I repeat. 8:54 Maybe there's a C goes 8:55 to E and I'll add E to the set, and so on. 8:57 And when there's no more change, then I've computed the complete closure. 9:01 Now if you happen to be someone 9:02 who likes to think in terms of 9:03 rules instead, what we're effectively 9:05 doing is applying the combining 9:07 and transitive rules to the 9:09 attributes in the set until there's no change. 9:12 So let's run an example of the closure algorithm. 9:14 Let's go to our complete student 9:16 table and let's add three functional dependencies. 9:19 One that says that student's social 9:20 security number determines their name, 9:22 address and GPA and that 9:23 would be a normal... GPA determines 9:26 priority, and high school 9:28 code determines high school name and high school city. 9:30 Again, these are all examples we 9:31 gave earlier that would be 9:32 natural functional dependencies for this particular relation. 9:36 Let's suppose that we're interested 9:37 in computing the closure of 9:39 the two attributes - social security 9:41 number and high school code. 9:43 So in other words, we want 9:44 to find all attributes in 9:46 the student relation that are functionally 9:48 determined by these two attributes. 9:51 So the algorithm says start with 9:53 the two attributes themselves, social security 9:55 number and high school code, 9:57 and then add attributes that 9:58 are functionally determined by ones in the set. 10:01 So if we start with our 10:02 first functional dependency here, because 10:04 we have social security number, that 10:05 allows us to add the 10:06 student name, the address, 10:11 the GPA... and that's it for that one. 10:15 Now let's repeat again. 10:17 Because we have the GPA, our 10:19 second functional dependency tells 10:20 us we can add the priorityAnd that's it for that one. 10:25 And then since we have the 10:26 high school code in this set, 10:27 our third one tells us 10:28 that we can add the high school 10:30 name and the high school 10:31 city and at that 10:32 point we certainly know we're done 10:34 because we've actually got the entire relation at this point. 10:37 Now I didn't pick this particular example at random. 10:40 We took two attributes, social security 10:42 number and high school code, we 10:44 computed their closure, and we 10:45 discovered that they together functionally 10:47 determine all attributes of the student relation. 10:51 Now remember just a few slides 10:52 ago, when a set 10:54 of attributes functionally determine all the 10:55 attributes, then those attributes together form a key for the relation. 10:59 And in fact if you think 11:00 about it, social security number and 11:02 high school code together are a natural key for the relation. 11:05 A student might go to 11:06 multiple high schools, but there's 11:07 no reason to have more 11:09 than one tupple with the 11:10 combination of a particular 11:12 student and the high school they attended. 11:15 So let's formalize a bit this 11:16 relationship between the notion of closure and keys. 11:19 Let's suppose we're interested in 11:20 answering the question... is a 11:22 particular set of attributes 11:23 a key for a relation we can use closure to do that. 11:26 Remember, we have the relation, 11:28 and we have a set of 11:29 functional dependencies for the relation, 11:31 so what we do is we 11:32 compute the closure of 11:35 A that's going to give 11:36 us a set of attributes and 11:38 if that equals all attributes, then 11:41 A is the key. 11:43 As a more general question, let's 11:45 suppose we're given a set 11:46 of functional dependencies for a 11:47 relation, how can we use 11:48 it to find all of the keys for the relation? 11:51 So use the same basic idea 11:53 of computing closure, but this 11:54 time we have to do it for 11:55 every subset of the attributes. 11:58 So let's call each subset A 12:00 bar and we just 12:02 check if the closure of A bar determines all attributes. 12:05 And if it does, then it's a key. 12:08 And by considering every subset of 12:10 the attributes in R, then 12:11 we're considering every possible key 12:13 and we'll just check for 12:14 each one whether it's actually a key. 12:16 Now that seems fairly inefficient 12:18 and actually we can be a little more efficient than that. 12:20 We can consider these subsets in 12:23 increasing size order. 12:25 So for example, if 12:26 we determine that A and 12:29 B together determine all 12:31 attributes functionally determine all attributes in the relation. 12:33 That tells us AB is 12:35 a key, and that actually 12:36 also tells us that every 12:38 superset of AB is also a key. 12:40 And if you think about it, that fills out naturally. 12:43 So the real algorithm would 12:44 say let's start with single attributes and determine if they are key. 12:47 If a single attribute say 12:49 C, is a key then so is 12:50 every super set of C, and 12:52 then we go to pairs of 12:53 attributes and so on.