0:00 In this video, we're going to learn about querying relational databases. 0:04 We're not going to focus on a specific query language, we'll do that later. 0:07 We're just going to talk about querying relational databases in general. 0:11 Let's start by talking about 0:12 the basic steps in creating 0:14 and using a relational database. 0:16 So, by the way, I 0:17 should mention that database people 0:19 have this habit of drawing databases 0:21 and database systems as gigantic 0:23 disks. So, I'll be using that same habit. 0:26 So, the first step is to 0:28 design the schema of the 0:29 database and then create 0:30 the schema using a data definition language. 0:33 So as we discussed in previous 0:34 videos in a relational database 0:36 the schema consists 0:38 of the structure of 0:40 the relations and the attributes of those relations. 0:43 So we set those up inside our big disk. 0:45 Once that's ready, the next 0:47 step is to load up the database with the initial data. 0:50 So it's fairly common for the 0:52 database to be initially loaded 0:53 from data that comes from an outside source. 0:55 Maybe the data is just stored 0:56 in files of some type, and 0:58 then that data could be loaded into the database. 1:01 Once the data is loaded, then 1:02 we have a bunch of tuples in 1:03 our relation. Now, we're ready 1:05 for the fun part which is to query and modify the data. 1:08 And so that happens continuously 1:10 over time as long as the database is in existence. 1:13 So let's just say for now 1:14 that we're going to have human 1:15 users that are directly querying the database. 1:17 In reality, that typically happens 1:19 through say an application or a website. 1:22 So, a user will come along and 1:23 we'll ask a question of the database and we will get an answer. 1:27 He might come along and 1:28 ask another question Q2 and he'd get another answer back. 1:31 The same human or maybe a 1:32 different human might ask to modify the database. 1:35 So, they might want 1:37 to insert new data or 1:38 update some of the data 1:39 and the database will come back 1:40 and say, "Okay, I made that change for you." 1:43 So that's the basic paradigm 1:45 of querying and updating relational databases. 1:49 Relational databases support ad 1:50 hoc queries and high-level languages. 1:53 By ad hoc, I mean that 1:54 you can pose queries that you didn't think of in advance. 1:57 So it's not necessary to write long programs for specific queries. 2:00 Rather the language can be 2:02 used to pose a query 2:03 as you think about what you want to ask. 2:05 And as mentioned in previous videos 2:06 the languages supported by relational 2:08 systems are high level, meaning 2:10 you can write in a fairly compact 2:12 fashion rather complicated queries 2:14 and you don't have to write the 2:15 algorithms that get the data out of the database. 2:18 So, let's look at an example of 2:19 a few queries. Let's go to again to our 2:22 imaginary database of students who are applying to colleges. 2:25 And here's just three examples of the types of things 2:27 that you might ask of a relational database. 2:29 You might want to get all 2:31 students whose GPA is greater 2:32 than 3.7 who are applying 2:34 to Stanford and MIT only. 2:37 You might want to get all 2:38 engineering departments in California 2:39 with fewer than 500 applicants or 2:41 you might ask for the 2:42 college with the highest average 2:44 accept rate over the last five years. 2:46 Now these might seem 2:48 like a fairly complicated queries 2:49 but all of these can be 2:51 written in a few lines 2:52 in say the SQL language or 2:54 a pretty simple expression in relational algebra. 2:57 So, some queries are 2:58 easier to pose than others, that's certainly true. 3:00 Though the 3 queries you 3:02 see here are as I said pretty easy to pose. 3:04 Now some queries are easier 3:06 for the database system to execute efficiently than others. 3:09 And interestingly it's not necessarily. 3:12 These two things aren't necessarily correlated. 3:13 There are some queries that are easy 3:15 to post but hard to execute 3:16 efficiently and some that are vice-versa. 3:20 Now, just a bit about terminology. 3:22 Frequently, people talk about the 3:24 query language of the database system. 3:26 That's usually used sort 3:28 of synonymously with the DML 3:29 or Data Manipulation Language which 3:32 usually includes not only querying but also data modifications. 3:36 In all relational query languages, when 3:38 you ask a query over a 3:39 set of relations, you get a relation as a result. 3:42 So let's run a query 3:43 cue say over these three 3:44 relations shown here and what 3:46 we'll get back is another relation. 3:50 When you get back the 3:51 same type of object that 3:53 you query, that's known as closure of the language. 3:55 And it really is a nice feature. 3:57 For example, when I want 3:59 to run another query, say Q2, 4:01 that query could be posed over 4:02 the answer of my first query 4:04 and could even combine that answer 4:05 with some of the existing relations in the database. 4:08 That's known as compositionality, the 4:10 ability to run a query 4:12 over the result of our previous query. 4:15 Now, let me talk briefly about two query languages. 4:18 We'll be learning these languages in 4:19 detail later, but I'm just 4:20 going to give the basic flavor of the languages here. 4:23 Relational algebra is a formal language. 4:25 Well, it's an algebra as you can tell by its name. 4:28 So it's very theoretically well-grounded. 4:31 SQL by contrast is 4:34 what I'll call an actual language or an implemented language. 4:37 That 's the one you're going to run on an actual deployed database application. 4:41 But the SQL language does have 4:44 as its foundation relational algebra. 4:46 That's how the semantics of the SQL language are defined. 4:50 Now let me just give you 4:51 a flavor of these two languages and 4:53 I'm going to write one query in each of the two languages. 4:57 So, let me get rid of this little line here. 5:00 Let's start in relational algebra. 5:02 So we're looking for the 5:03 ID's of students whose GPA 5:05 is greater than 3.7 and they've applied to Stanford. 5:09 In relational algebra, the basic 5:10 operators language are Greek symbols. 5:12 Again, we'll learn the details later, 5:13 but this particular expression will 5:15 be written by a Phi followed by a Sigma. 5:19 The Phi says we're going to 5:20 get the ID, the Sigma 5:21 says we want students whose 5:22 GPA is greater than 3.7 5:25 and the college that the 5:27 students have applied to is Stanford. 5:31 And then that will operate 5:33 on what's called the 5:35 natural join of the 5:37 student relation with the apply relation. 5:43 Again, we'll learn the details of that in a later video. 5:46 Now, here's the same query in SQL. 5:49 And this is something that you would 5:50 actually run on a deployed 5:51 database system, and the SQL 5:53 query is, in fact, directly 5:55 equivalent to the relational algebra query. 5:58 Now, pedagogically, I would 5:59 highly recommend that you learn 6:01 the relational algebra by watching 6:02 the relational algebra videos before 6:04 you move on to the SQL videos, 6:06 but I'm not going to absolutely require 6:08 that. So, if you're in a big hurry 6:10 to learn SQL right away 6:11 you may move ahead to the SQL videos. 6:14 If you're interested in the formal 6:15 foundations and a deeper understanding, 6:17 I recommend moving next to the relational algebra video.