0:00 This is the first of seven 0:01 videos where we're going to learn the SQL language. 0:04 The videos are largely going to 0:05 be live demos of SQL queries 0:07 and updates running on an actual database. 0:09 The first video is going 0:10 to focus on the basics of the SELECT statement. 0:13 As a reminder, the SELECT statement 0:15 selects a set of attributes from 0:16 a set of relations satisfying a particular condition. 0:20 We will see in the demo that 0:21 even with the these three clauses, 0:23 we can write quite powerful queries. 0:26 All of the seven demos are 0:27 going to be using the simple 0:29 college admissions database that we 0:31 learned about in the relational algebra videos. 0:33 As a reminder, we have three relations. 0:36 We have the college relation: college 0:38 relation contains information about the 0:39 name of the colleges, the state, and the enrollment of those colleges. 0:43 We have the student relation, which 0:44 contains student IDs, their names, 0:46 their GPA, and the size of the high school that they come from. 0:49 And finally, the application information, that 0:51 tells us that a particular student 0:53 applied to a particular college 0:55 for a particular major and there 0:56 was a decision of that application 0:58 Now as a reminder, in the 1:00 relational model, when we underline 1:01 attributes, that means we're designating a key for the relation. 1:04 So, the underlying attributes in 1:06 our example say that the 1:07 knowledge name is going to be unique within the college relation. 1:10 The student's idea is unique within 1:11 the student relation and in 1:13 the applied relation, the combination of these three attributes is unique. 1:17 That means that student can, if 1:18 he or she wishes, apply to a 1:20 college many times, or apply 1:21 for a major many times, but 1:22 can only apply to a 1:24 college for a particular major once. 1:26 Let's turn to the demo. 1:28 Let's start by looking at the actual data that we're going to be querying over. 1:32 We have a set of four 1:33 colleges: Stanford, Berkeley, MIT and Cornell. 1:36 We have a bunch of students. 1:39 And a reminder, each student has an 1:40 ID, a name, a GPA, and a size of high school. 1:43 And finally, we have a set 1:45 of application records where a 1:47 student with a particular ID 1:48 applies to a college for a 1:50 particular major, and there's a 1:51 yes or no decision on that application. 1:53 So let's go to our first SQL query. 1:56 This query is going to find 1:57 the ID, name, and GPA of 1:59 students whose GPA is greater than 3.6. 2:01 So, very simple, it's the 2:03 basic SELECT FROM WHERE structure. 2:05 The SELECT gives our table name, 2:06 the WHERE gives our filtering condition 2:08 and the SELECT tells us what we want to get out of the query. 2:11 We'll execute that query and 2:13 we will find here all of 2:14 our students with a GPA greater than 3.6. 2:17 Now, it's not necessary 2:19 to include the GPA in 2:20 the result to the query even if we filter on the GPA. 2:24 So, I could just take GPA away 2:25 from the SELECT clause, run the 2:27 query again and now, we 2:28 see the same result but without the GPA. 2:32 Okay. 2:33 Let's go to our second query. 2:35 Our second query is going to combine two relations. 2:39 In this query, we're going to 2:40 find the names of the students 2:41 and the majors for which they've applied. 2:44 So, now, we're involving both the 2:45 student table and the 2:46 apply table and the 2:48 condition we see here is the 2:49 join condition that tells us 2:51 we want to combine students with 2:52 apply records that have the same student ID. 2:55 This is what would happen automatically in 2:57 a natural join of the 2:58 relational algebra, but in SQL 3:00 we need to always write the 3:00 join condition explicitly, and finally 3:03 we get the student name and the major. 3:05 And if we execute the 3:06 query, we get, expectedly, a 3:08 bunch of students and the majors that they've applied for. 3:11 Now, we do notice here that we have several duplicate values. 3:14 We have two copies of Amy 3:15 applying to CS and two copies of Craig applying to Bio-Engineering. 3:19 As we discussed in the relational 3:20 algebra video, in relational algebra 3:22 which underlies SQL, it's by 3:24 default the set model; we don't have duplicates. 3:26 But in the SQL language we 3:28 do have duplicates, it's based on a multi-set model. 3:31 If we don't like the duplicates in 3:32 our results SQL provides us a convenient way to get rid of them. 3:35 We simply add the keyword, 3:37 "distinct", to our query after 3:39 the word, "select", we execute, and 3:41 now we get the same result 3:42 but with the duplicate values eliminated. 3:47 Our next query is going 3:49 to be a little more complicated; it's 3:50 going to find the names 3:52 and GPAs of students whose 3:54 size high school is less 3:55 than a thousand, they've applied to 3:56 CS at Stanford, and we're going 3:57 to get the decision associated with that. 3:59 So again we have two 4:01 relations, two tables involved, the student and the apply. 4:04 We have the join condition, making 4:06 sure we're talking about the same 4:07 student and the student and apply tuples. 4:10 Very important to remember that one. 4:11 We are going to filter the result 4:13 based on size high school, major, 4:15 and the college to which they're applying. 4:18 So let's run this query and 4:20 we will see the result that 4:21 we have two students who 4:22 have applied to CS at Stanford from a small high school. 4:26 Our next query is again a join of two relations. 4:30 This time we're going to find all large 4:32 campuses that have someone applying to that campus in CS. 4:35 So this time we're going to join the college table and the apply table. 4:38 And again, we need to 4:39 be careful to make sure we 4:40 only join tuples that are talking about the same college. 4:42 So we have college.cname 4:43 equals apply.cname. 4:46 We have an enrollment that's greater 4:47 than 20,000 and a major that equals CS. 4:49 Let's run this query. 4:51 Oops, we got an error! 4:53 Well, actually I knew that was 4:54 coming, but I wanted to show you what happens here. 4:57 So the error is that we 4:58 have an ambiguous column name, 4:59 and that's the one right here, the C name. 5:02 So I haven't pointed it 5:03 out explicitly, but whenever I've 5:06 referred to attributes where there's 5:06 an attribute from both of 5:08 the relations we're querying, I prefaced 5:10 it with the name of 5:11 the relation that we cared about, the college here in the apply. 5:14 So the attribute name here 5:16 in the select clause is actually 5:17 ambiguous because there's a 5:19 C name attribute in college and there's one there in apply. 5:22 Now we happen to set those equal, 5:23 but in order for the query to 5:25 actually run we have to choose 5:26 So let's just say we're 5:27 going to take that C name from college. 5:30 Now, everything should be fine, and here we go. 5:33 So those are the colleges where we 5:34 have at least one 5:36 CS major and their enrollment is greater than 20,000. 5:38 Again, we see duplicates 5:41 so if we don't like 5:42 the two copies of Berkeley, we 5:43 simply add distinct and we run the query again. 5:46 And now we have Berkeley and Cornell. 5:48 Now, let's do a query with a bigger result. 5:52 This time we're finally going to join all three of our relations. 5:55 Student, college and apply. 5:56 And we're going to apply the 5:57 joint conditions that ensure that 5:59 we're talking about the same student and the same college. 6:03 And then from the result 6:04 of that big cross-product, that 6:06 big join, we're going to 6:07 get the student ID, their name, 6:08 their GPA, the college that 6:10 they're applying to and the enrollment of that college. 6:12 So just a whole bunch of 6:13 information associated with this students' applications. 6:17 And we execute this and here 6:19 we get the result with all the attributes that we asked for. 6:21 Now, one thing I haven't 6:23 mentioned yet is the order 6:24 of the results that we get when we run SQL queries. 6:27 SO SQL is, at its heart, an unordered model. 6:30 That means that we can get 6:31 the results of our queries in 6:32 any order, and in fact, 6:34 we could run a query today 6:35 and get our results in a particular order. 6:37 And then run the query tomorrow and get a different order. 6:39 And that's permitted with the 6:41 specification of SQL on relational databases. 6:44 If we care about the order 6:46 of our result SQL provides a 6:48 clause that we can ask for a 6:49 result to be sorted by 6:50 a particular attribute or set of attributes. So 6:55 let's say we want our application information here 6:56 sorted by descending GPA. 6:59 Then we add another clause called the order by clause. 7:02 We tell the attribute we'd like 7:04 to be ordering by and then 7:05 if we want it to be descending we write DESC. 7:08 The default behavior is actually ascending. 7:12 So if we run this query 7:13 now we get our results by 7:15 descending the GPA we 7:19 see all the 3.9's, 3.8, 3.7, and so forth. 7:21 Now we might still want 7:22 to further sort within all the 7:23 3.9s if we want 7:25 to do that we can specify another 7:27 attribute to sort each group by. 7:29 So, for example, if we 7:30 decide from that we 7:31 want to sort by enrollment 7:35 and ascending, we won't put 7:36 anything because ascending is the default. 7:38 And we execute. 7:39 Now we still have GPA 7:41 as descending as our primary 7:43 sort order and then within each 7:44 of those will be sorting by ascending enrollment. 7:47 This query introduces the like predicate. 7:51 Like is a built-in operator 7:53 in SQL that allows us 7:54 to do simple string matching on attribute values. 7:57 Let's suppose, for example, that we 7:58 wanted to find all students 7:59 who were applying for a major that had to do with bio. 8:02 Instead of listing all the 8:03 biology majors we can 8:05 simply pattern match the major 8:06 against the special string here 8:08 which says, match any major 8:10 where there's some set of characters, 8:12 followed by bio, followed by 8:14 some set of characters we execute 8:16 the query, and we'll find the 8:18 students who have applied for various 8:19 bio type majors. 8:21 Now, I want to introduce another construct. 8:23 I'm going to use the same query to 8:25 do it, which is the construct select star. 8:28 So far, we've always listed 8:30 explicitly the attributes that we 8:31 want to get in the result of a query. 8:33 But if we simply want to get 8:34 all attributes, then we can just write select star. 8:38 And when we do that, we 8:39 don't project away any attributes, 8:41 but we get all the attributes in the result of the from and where expression. 8:45 While we're at it, let's do a gigantic query. 8:48 We'll just do the cross-product 8:49 and student college without any 8:51 combination, and we'll do 8:52 select star to get all the attributes out. 8:54 So, here goes, and you can 8:56 see, we get all the attributes 8:57 and we get a whole lot of tuples as well. 9:00 Our last query is going to 9:01 demonstrate the ability to use 9:03 arithmetic within SQL clauses. 9:05 So we see here a query 9:07 that selects all the information 9:08 from the student relation but adds 9:10 to it a scaled GPA where 9:13 we're going to boost the student's 9:14 GPA if they're from a big 9:15 high school and reduce it if they're from a small one. 9:18 Specifically, we'll take their GPA, multiply 9:20 it by the size high school divided by a thousand. 9:21 So, let's run this 9:23 query and you can see 9:24 that we have the whole student table 9:25 here with an additional column that 9:28 has scaled their GPA based on the size of their high school. 9:31 Now, if we don't like the 9:33 label on this column, we 9:34 could change it and so 9:36 I'll use this query as an 9:37 example to demonstrate the 'as' 9:38 feature which allows us 9:40 to change the labeling of the schema in a query result. 9:43 Let's say as scaled GPA, 9:45 and we should get the same 9:46 result with a more nicely labeled attribute. 9:49 That concludes our video introducing the basic select statement. 9:52 We'll see many other features in 9:53 the upcoming six videos on SQL.