0:01 Welcome to the course Introduction to Databases. 0:04 I'm Jennifer Widom from Stanford University. 0:06 In this course we'll be learning 0:08 about databases and the use 0:09 of database management systems, primarily 0:12 from the viewpoint of the designer, 0:14 user and developer of database applications. 0:18 I'm going to start by describing in 0:21 one very long sentence what 0:22 a database management system provides for applications. 0:27 It provides a means of handling large amounts 0:29 of data primarily, but let's looks at a little more detail. 0:32 What it provides, in a 0:33 long sentence, is efficient, reliable, 0:36 convenient and safe multi-user 0:40 storage of and access to 0:41 massive amounts of persistent data. 0:45 So, I'm going to go 0:46 into each one of those adjectives in 0:47 a little bit more detail in a moment. 0:49 But I did want to mention that database 0:51 systems are extremely prevalent in the world today. 0:54 They sit behind many websites 0:56 that will run your banking systems, 0:58 your telecommunications, deployments of 1:01 sensors, scientific experiments and much, much more. 1:04 Highly prevalent. 1:05 So let's talk a little 1:06 bit about why database systems are 1:08 so popular so and prevalent by looking at these seven adjectives. 1:13 The first aspect of database 1:15 systems is that they handle 1:16 data at a massive scale. 1:19 So if you think about 1:20 the amount of data that is 1:22 being produced today, database systems 1:24 are handling terabytes of data, 1:25 sometimes even terabytes of data every day. 1:29 And one of the critical 1:30 aspects is that the data 1:31 that's handled by database management systems 1:33 systems is much larger than can 1:35 fit in the memory of a typical computing system. 1:38 So memories are indeed growing 1:39 very, very fast, but the 1:41 amount of data in the world 1:42 and data to be handled by 1:43 database systems is growing much faster. 1:46 So database systems are 1:48 designed to handle data that to residing outside of memory. 1:52 Secondly, the data that's 1:54 handled by database management systems is typically persistent. 1:58 And what I mean by that is 1:59 that the data in the database 2:00 outlives the programs that execute on that data. 2:04 So if you run 2:06 a typical computer program the program 2:08 will start the variables we created. 2:11 There will be data that's operated on 2:13 the program, the program will finish and the data will go away. 2:16 It's sort of the other way with databases. 2:18 The data is what sits there 2:20 and then program will start 2:21 up, it will operate on the 2:22 data, the program will stop and the data will still be there. 2:25 Very often actually multiple programs 2:27 will be operating on the same data. 2:31 Next, safety. 2:32 So database systems, since 2:34 they run critical applications such as 2:36 telecommunications and banking systems, 2:39 have to have guarantees that 2:40 the data managed by the system 2:42 will stay in a consistent 2:44 state, it won't be lost or 2:45 overwritten when there are 2:47 failures, and there can be hardware failures. 2:50 There can be software failures. 2:53 Even simple power outages. 2:55 You don't want your bank 2:57 balance to change because the 2:58 power went out at your bank branch. 3:00 And of course there are the problem 3:02 of malicious users that may try to corrupt data. 3:05 So database systems have a 3:06 number of built in mechanisms that 3:08 ensure that the data remains consistent, 3:10 regardless of what happens. 3:12 Next multi-user. So I 3:14 mentioned that multiple programs may operate on the same database. 3:18 And even with one program operating 3:20 on a database, that program may 3:22 allow many different users or 3:23 applications to access the data concurrently. 3:27 So when you have 3:28 multiple applications working on 3:30 the same data, the system 3:32 has to have some mechanisms, again, 3:33 to ensure that the data stays consistent. 3:36 That you don't have, for example, 3:37 half of a data item 3:39 overwritten by one person and 3:41 the other half overwritten by another. 3:43 So there's mechanisms in database 3:45 systems called concurrency control. 3:48 And the idea there is 3:49 that we control the way multiple users access the database. 3:53 Now we don't control it by 3:55 only having one user have 3:57 exclusive access to the database 3:58 or the performance would slow down considerably. 4:01 So the control actually occurs at 4:03 the level of the data items in the database. 4:05 So many users might be operating 4:07 on the same database but be 4:09 operating on different individual data items. 4:11 It's a little bit similar 4:12 to, say, file system concurrency or 4:14 even variable concurrency in programs, 4:16 except it's more centered around the data itself. 4:21 The next adjective is convenience, and 4:24 convenience is actually one of the 4:26 critical features of database systems. 4:28 They really are designed to make 4:29 it easy to work with large 4:31 amounts of data and to 4:32 do very powerful and interesting processing on that data. 4:35 So there's a couple levels at which that happens. 4:39 There's a notion in databases called Physical Data Independence. 4:44 It's kind of a mouthful, but 4:45 what that's saying is that 4:47 the way that data is actually 4:49 stored and laid out on 4:51 disk is independent of the 4:53 way that programs think about the structure of the data. 4:56 So you could have a program that 4:57 operates on a database and 4:59 underneath there could be a 5:00 complete change in the 5:02 way the data is stored, yet 5:04 the program itself would not have to be changed. 5:06 So the operations on the 5:07 data are independent from the way the data is laid out. 5:11 And somewhat related to 5:12 that is the notion of high level query languages. 5:15 So, the databases are 5:17 usually queried by languages 5:20 that are relatively compact 5:23 to describe, really at a 5:24 very high level what information you want from the database. 5:28 Specifically, they obey a 5:31 notion that's called declarative, and what 5:33 declarative is saying is that 5:36 in the query, you describe 5:37 what you want out of the 5:38 database but you don't need 5:40 to describe the algorithm to 5:42 get the data out, and that's a really nice feature. 5:44 It allows you to write queries in 5:45 a very simple way, and then 5:47 the system itself will find 5:48 the algorithm to get that data out efficiently. 5:52 And speaking of efficiency, that's 5:54 number six, but certainly not 5:56 sixth importance. There's in 5:59 real estate as a little 6:00 aside here, a old saying 6:02 that when you have a piece of 6:03 property, the most important three 6:05 aspects of the property are 6:06 the location of the property, the location and the location. 6:10 And people say the same 6:12 thing about databases, a similar 6:13 parallel joke, which is that the 6:15 three most important things in 6:17 a database system is first 6:19 performance, second performance and again performance. 6:23 So database systems have 6:24 to do really thousands of queries 6:28 or updates per second. 6:31 These are not simple queries necessarily. 6:34 These may be very complex operations. 6:36 So, constructing a 6:39 database system, that can execute 6:40 queries, complex queries, at that 6:42 rate, over gigantic amounts of 6:44 data, terabytes of data is no 6:46 simple task, and that is 6:47 one of the major features also, provided 6:49 by a database management system. 6:51 And lastly, but again not last in importance is reliability. 6:55 Again, looking back at say 6:56 your banking system or your telecommunications 6:58 system, it's critically important 7:00 that those are up all the time. 7:03 So 99.99999 % up time 7:07 is the type of guarantee that 7:08 database management systems are making for their applications. 7:13 So that gives us an idea 7:14 of all the terrific things that a database system provides. 7:17 I hope you're all ready convinced that 7:18 if you have a application you 7:21 want to build that involves data, it 7:22 would be great to have all 7:23 of these features provided for you in a database system. 7:27 Now let me mention a few 7:28 of the aspects surrounding database 7:30 systems and scope a little 7:31 bit what we're going to be covering in this course. 7:34 When people build database applications, 7:37 sometimes they program them with what's known as a framework. 7:40 Currently at the time of 7:41 this video, some of the 7:42 popular frameworks are Django 7:44 or Ruby on Rails, and these 7:46 are environments that help you 7:48 develop your programs, and help 7:49 you generate, say the calls 7:51 to the database system. We're 7:53 not, in this set of 7:54 videos, going to be talking 7:55 about the frameworks, but rather we're 7:56 going to be talking about the data 7:58 base system itself and how it is used and what it provides. 8:02 Second of all, database systems are 8:04 often used in conjunction with what's known as middle-ware. 8:07 Again, at the time of this 8:08 video, typical middle-ware might 8:10 be application servers, web servers, 8:12 so this middle-ware helps 8:14 applications interact with database 8:16 systems in certain types of ways. 8:18 Again, that's sort of outside the scope of the course. 8:20 We won't be talking about middleware in the course. 8:24 Finally, it's not the 8:25 case that every application that 8:27 involves data necessarily uses 8:29 the database system, so historically, 8:33 a lot of data has been stored 8:34 in files, I think that's a little bit less so these days. 8:37 Still, there's a lot of data out there that's simply sitting in files. 8:40 Excel spreadsheets is another 8:43 domain where there's a lot 8:45 of data sitting out there, and 8:47 it's useful in certain ways, and the 8:49 processing of data is not always 8:51 done through query languages associated with database systems. 8:54 For example, Hadoop is 8:56 a processing framework for running 8:59 operations on data that's stored in files. 9:02 Again, in this set of 9:04 videos we're going to focus 9:05 on the database management system 9:07 itself and on storing 9:09 and operating on data through a database management system. 9:13 So there are four key concepts that we're going to cover for now. 9:16 The first one is the data model. 9:18 The data model is a 9:20 description of, in general, how the data is structured. 9:23 One of the most common 9:24 data models is the relational dot 9:26 data model, we'll spend quite a bit of time on that. 9:28 In the relational data model 9:30 the data and the database is thought of as a set of records. 9:33 Now another popular way to 9:35 store data is for example, 9:37 in XML documents, so, an XML 9:39 document captures data, instead 9:40 of a set of records, as a 9:42 hierarchical structure, of labeled values. 9:45 Another possible data model 9:47 would be a graph data model or 9:49 all data in the database is in the form of nodes and edges. 9:52 So again, a data model is 9:54 telling you the general form of 9:55 data that's going to be stored in the database. 9:58 Next is the concept of schema versus data. 10:02 One can think of this kind 10:03 of like types and variables in a programming language. 10:07 The schema sets up 10:09 the structure of the database. 10:11 Maybe I'm going to have information about 10:12 students with IDs and 10:15 GPAs, or about colleges, 10:17 and it's just going to tell 10:18 me the structure of the database 10:19 where the data is the actual 10:21 data stored within the schema. 10:25 Again, in a program, you 10:26 set up types and then you 10:27 have variables of those types, we'll 10:28 set up a schema, and then 10:29 we will have a whole bunch of data that adheres to that schema. 10:32 Typically the schema is set 10:34 up at the beginning, and doesn't change 10:36 very much where the data changes rapidly. 10:39 Now to set up the schema, 10:41 one normally uses what's known as a data definition language. 10:44 Sometimes people use higher level design 10:46 tools that help them think 10:48 about the design and then from 10:49 there go to the data definition language. 10:52 But it's used in general to set up 10:53 a scheme or structure for a particular database. 10:57 Once the schema has been set up 10:58 and data has been loaded, then 11:00 it's possible to start querying 11:01 and modifying the data and 11:03 that's typically done with what's 11:04 known as the data manipulation language, 11:07 so for querying and modifying the database. 11:15 Okay, so those are some key concepts 11:16 certainly we're going to get in 11:17 to much more detail in later videos about each of these concepts. 11:21 Now let's talk about the 11:22 people that are involved in a database system. So 11:25 the first person we'll mention 11:26 is the person who implements the 11:28 database system itself, the database implementer. 11:31 That's the person who builds the 11:32 system, that's not going to be the focus of this course. 11:35 We're going to be focusing more on 11:37 the types of things that are 11:38 done by the other three people that I'm going to describe. 11:41 The next one is the database designer. 11:43 So the database designer is the 11:45 person who establishes the schema 11:47 for a database. 11:48 So, let's suppose we have an application. 11:51 We know there's going to be a 11:51 lot of data involved in the 11:53 application and we want to 11:54 figure out how we are gonna structure 11:55 that data before we build 11:57 the application. That's the job of the database designer. 11:59 It's a surprisingly difficult job 12:01 when you have a very complex 12:03 data involved in an application. 12:05 Once you've established the 12:07 structure of the database 12:08 then it's time to build the 12:10 applications or programs that 12:11 are going to run on the 12:13 database, often interfacing between 12:15 the eventual user and the 12:16 data itself, and that's 12:18 the job of the application developer, 12:20 so those are the programs that operate on the database. 12:26 And again I've mentioned already 12:28 that you can have a database 12:29 with many different programs that operate on it, be very common. 12:33 You might, for example, have a 12:34 sales database where some applications 12:37 are actually inserting the sales 12:39 as they happen, while others are analyzing the sales. 12:41 So it's not necessary to have 12:43 a one-to-one coupling between programs and databases. 12:46 And the last person is the database administrator. 12:50 So the database administrator is the 12:51 person who loads the data, 12:53 sort of gets the whole thing running and keeps it running smoothly. 12:57 So, this actually turns 12:59 out to be a very important job 13:00 for large database applications. 13:03 For better or worse, database systems 13:04 do tend to have a 13:06 number of tuning parameters 13:07 associated with them, and getting 13:09 those tuning parameters right can 13:11 make a significant difference in the 13:12 all important performance of the database system. 13:15 So database administrators are 13:17 actually, highly valued, very important, highly 13:20 paid as a matter of fact, 13:22 and are, for large deployments, 13:24 an important person in the entire process. 13:26 So those are the people that 13:28 are involved, again, in this 13:29 class we'll be focusing mostly on 13:31 designing and developing applications, 13:33 a little bit on administration, but in 13:36 general thinking about databases and 13:37 the use of database management systems 13:40 from the perspective of the application builder and user. 13:43 To conclude, we're going to 13:45 be learning about databases and whether 13:47 you know it or not not you're 13:48 already using a database every day. 13:50 In fact, more likely than not 13:52 you're using a database every hour.