0:00 Welcome, everyone. This is Dr. Soper And today we 0:06 will be going through our first lecture for our database class. 0:11 I need to operate under the assumption that not everyone in the class has had experience with databases in the past. 0:19 And so this first lecture is intended to bring everyone up to the same basic level of understanding 0:26 with respect to several important database concepts. We have several different objectives, 0:33 which we will seek to achieve in this lecture. These are listed on the screen right now. 0:39 First, we want to try to understand why storing data in a list is not necessarily 0:46 a very good idea. It can cause many different types of problems. 0:51 Second, we want to see if we can gain some insight into why an organization might want to use a database. 0:58 Third, we will see how the notion of related tables of data, which forms a core part of the relational database 1:07 model, provides a basis for eliminating many of the problems which are associated with storing data 1:14 in a list. We'll also explore the various components and elements that comprise a database or a database system. 1:23 And we will learn about the purpose of something that we call a DBMS, a database management system. 1:31 And along the way, we're going to explore some of the functions that a robust database 1:38 application can provide to us. What, then, is the purpose of a database? 1:43 Well, it's important to remember that a database does not have just a single purpose. 1:49 Instead, there are several key advantages that databases provide. First, a database provides a repository for storing data. 1:59 That's kind of implicit in the name. Database implies that we have a place to store data. 2:05 However, what might not be so obvious is that databases provide an organizational structure 2:13 for data. That is, we don't just have a place to store data, but the database also provides an organized structure 2:21 into which those data can be placed. Finally, a database provides us with a mechanism 2:30 for interacting with our data. Now, interacting with data can generally 2:36 be described in four different operations. Here they're listed as querying, creating, modifying, 2:46 and deleting data. But there's another more interesting acronym, which may help you to remember this. 2:52 And that acronym is CRUD, C-R-U-D. This stands for create, read, update, and delete. 3:02 These are the four basic operations that we can use when interacting with data. 3:09 A key point to remember here is that, in business, there are many natural, hierarchical relationships among data. 3:21 For example, a customer can place many orders. Another way of saying that is many different orders 3:29 can be associated with the same customer. Or another example is a department 3:35 can have many different employees, but a given employee might work in one, and only 3:43 one, department. So these are hierarchical relationships among the data. 3:49 And a relational database allows us to model and represent these relationships. 3:55 Let's take a look at a list of data. What I mean by a list here is a simple, two-dimensional table 4:05 of data. And in this table, we store information that is important to us for some reason. 4:11 An example might be, say that we have many different projects in our company, and we want to keep track of who the project 4:21 manager is for each project. Therefore, for each project, we may 4:27 track the project manager's name, their ID number, 4:32 and maybe their phone extension within our company. Now, ordinarily this wouldn't be such a big problem. 4:40 But imagine if the same person, the same project manager, is simultaneously managing multiple projects. 4:49 In that case, using a simple list, the project manager's information 4:55 would necessarily appear on the list multiple times. Now, what is the problem with this? 5:03 Well, there isn't really any major problem, aside from the fact that it's redundant. 5:08 Another way of saying that is we are using more space than is necessary to record which project manager is 5:17 associated with which projects. Another problem with storing data in a list 5:24 is that the list of data may contain more than one concept 5:30 or more than one theme. I want you to remember this idea of a business concept, 5:36 because we will see it again and again throughout our course. A business theme or concept refers 5:44 to a number of different attributes or properties 5:49 that are all naturally related to one type of business entity. 5:56 An example might be an employee. An employee is a business concept. 6:03 Employees have different attributes that we may want to track for each employee 6:09 in our organization. For example, we may want to track the employee's ID 6:14 number, the employee's name, the employee's salary. 6:20 Other examples of business concepts or business themes include things like departments, products, customers, orders, 6:31 locations, and so forth. Returning to lists, a major problem 6:37 is with these multiple themes. That is, for each row of the list, 6:43 we might be recording information about more than one of these business concepts. 6:49 As an example, consider our list of project managers. If we also wanted to include project information-- that 6:57 is, perhaps, a project name, a project ID, any additional information about the project-- 7:05 we might store that in the same row along with the project manager. 7:11 Aside from redundancy, as I mentioned earlier, the problem here is not necessarily 7:17 how we are storing this information, but what might happen to the information 7:23 if it is stored in this way. When we store information in a list, we introduce the possibility of something 7:32 called data anomalies. And specifically, there are three types of these anomalies. 7:40 Here they're listed as a deletion problem, an update problem, and an insertion problem. 7:46 Later in our course, we will refer to these as deletion anomalies, update anomalies, 7:51 and insertion anomalies. So the structure of a list and its associated problems 7:58 of redundancy and multiple themes can produce a situation in which these modification problems 8:06 potentially could occur. So let's take a look at some of these modification issues. 8:13 Here we have a list of data, which contains information on students. 8:18 So we have a student's last name, first name, their email address. And then we see who that student's advisor is. 8:27 So the last name of the student's advisor, the email of the student's advisor, 8:32 and the department in which they work, along with some additional information. 8:38 Let's take a look at how some of these modification anomalies might emerge. 8:43 In the first case, imagine that we want to change a student's advisor. And for this example, we're going 8:49 to change Advisor Baker to Advisor [? Tiang. ?] When we do this, not only are we going 8:56 to have to update the last name of the advisor in the list, but to maintain the overall quality of the data, a concept 9:04 that we call data integrity, we are going to have to also update the advisor's email address. 9:12 Note, in this case, that we do not need to update the department or the administrator's last name, 9:19 because those are the same for advisors [? Tiang ?] and Baker. However, if, for some reason, we wanted 9:28 to change the student's advisor from, say, Baker to Valdez, 9:34 well, now not only do we need to update the last name and the email address, 9:40 but we also need to update the department and the admin last name attributes as well. 9:47 Just to do something as simple as changing the student's advisor with this list approach requires 9:54 that four separate pieces of information be updated. So this is what we might call an update anomaly or an update 10:03 problem. Next, let's look at a deletion problem. 10:08 Imagine that our student, listed here as Chip Marino, 10:14 decides to drop out of the university. So we need to remove Chip from our list of students. 10:22 Now, look what happens if we delete this row of data. When the row of data is deleted, not only 10:30 do we lose the student's information, but we've also lost information about the advisor 10:38 and the department in which that advisor works. That is, you will notice that Advisor [? Tran ?] is not 10:45 currently assigned to advise any other students. So when we delete this row of data, 10:51 we may entirely lose the knowledge that Advisor [? Tran ?] even exists. 10:57 And that can be a problem. In this case, this is called a deletion problem or a deletion 11:04 anomaly. Finally, let's look at an insertion anomaly. 11:09 Let's say that at our university, we decide that we want to add a new department. 11:18 So we create a biology department. Well, this is fine. We add information to our list. 11:24 We now have a biology department and an administrator's last name. 11:31 However, we now have all of this missing data in our table. There are empty cells. 11:37 There's no information for a student. There's no information for an advisor. All we have here is information for the department. 11:46 This means that data are missing, and we're not efficiently utilizing our available storage 11:53 space. These are some of the problems with these lists. Now, remember, earlier I mentioned 12:00 that in the business world there are complex relationships 12:05 among different types of data. For example, a department may have many employees 12:14 who work in that department. Or a product may be assembled from many different components. 12:22 Or a customer might place many different orders with our company. 12:28 So there are these natural complexities that arise in business. 12:34 And relational databases, as we will see, not only solve 12:39 the problems that are associated with lists, but also allow us to model these natural relationships 12:47 among business data. Now, I've been using this term relational database. 12:52 So it would probably be a good idea for us to consider what a relational database actually is. 12:59 On a very basic level, relational databases store data in a table. 13:07 Now, a table is simply a two-dimensional grid of data that contains columns, and it contains rows. 13:18 The convention in the relational database world is that the columns represent different attributes 13:27 of an entity. And each row in the table represents 13:33 an instance of the entity. So for example, if I have an employee table 13:39 with an employee ID, an employee name, and an employee telephone 13:45 number, we would list those three attributes-- ID, name, and telephone number-- as columns in the table. 13:55 And then each row in the table would represent an individual employee. 14:02 Again, we said that there are these natural relationships among different business concepts 14:08 out there in the business world. In a relational database, data for each of these concepts 14:15 is stored in its own table. So I may have an employee table. 14:21 I may have a department table. And then I can create a relationship 14:27 between those tables, which will allow me to figure out which employees work in which departments. 14:35 So a good way of thinking about this is instead of storing all of the information in one big list, 14:42 we break the list apart into separate pieces according to which information is 14:50 associated with which business theme or business concept. 14:56 Therefore, all of the information associated with an employee might go into an employee table. 15:03 And all of the information associated with a department might go into a department table. 15:09 Although, inside a relational database, information about different business concepts or business 15:17 themes is stored in separate tables, it may be necessary-- say, for reporting purposes-- for us 15:27 to reassemble all of the data from these different tables 15:32 back into a list. Now, we can accomplish this in the relational database world 15:40 by performing something called a join operation. That is, we join the tables together. 15:47 Now, a very, very important concept for you to understand 15:52 is that in the relational database world, we link the records in different tables 15:59 together using matched pairs of values. 16:06 These matched pairs of values allow us to associate a row in one table 16:14 with one or more rows in another table. For example, if we have a project table 16:22 and we have a customer table, and we use an attribute called a customer ID 16:31 to uniquely identify each customer, then what we can do 16:36 is we can place a customer ID attribute in the project table. 16:43 And we can then use the ID of the customer in each table to link related rows together. 16:52 Now, of course, you may be listening to this, and you may think, why do we want to do all of this extra work? 16:59 And it is true that relational databases are more complicated 17:04 than storing data in a list. However, as we will see, relational databases 17:12 have many, many advantages over a list. Among these, first, a relational database 17:20 minimizes data redundancy. In the examples we saw earlier, we 17:25 saw that there exists with lists the possibility to have all sorts of redundant data. 17:33 Relational databases eliminate this. Relational databases also allow us 17:38 to model the natural relationships that exist among business concepts in the real world. 17:45 And there are several other advantages as well, including providing a solid basis from which to do things like generate reports 17:53 or build user interface forms. I mentioned a few minutes ago that there 18:00 are four basic ways of interacting with data-- create, 18:08 read, update, and delete. In the relational database world, 18:14 we have access to something called the Structured Query Language, often called SQL, or SEQUEL, if you like. 18:23 And this is the primary tool that we can use for performing these four 18:29 basic operations on our data. In a few weeks, we will begin learning the Structured Query 18:35 Language. This is an extremely valuable skill for you to develop. 18:42 If you are able to learn the Structured Query Language well, then you will be able to sit down and work 18:49 with virtually any modern database with a very, very 18:55 short learning curve. It doesn't matter if you need to work with a SQL Server 19:00 database, or an Access database, or an Oracle database, or a DB2, or a MySQL database, or even a database 19:10 for, say, a mobile device. Most databases, including all of those that I mentioned, 19:17 support the Structured Query Language. So if you can learn the Structured Query Language, 19:23 you will be very well positioned to work with almost any database. 19:28 Even though we're still early in our course, I want to give you an example of the Structured Query Language 19:34 now so that you can see that it's really not that difficult to understand. 19:41 This is certainly not a full scale computer programming language. 19:46 The Structured Query Language was designed from the ground up to be simple to use and to be simple to understand. 19:53 So what we have here is an example 19:58 where we have three tables. There is a customer table, and a course table, 20:07 and an enrollment table. So let's say that these are art courses. 20:15 We have customers who have signed up to take different art 20:20 courses to see, perhaps, if they can learn to paint or they can learn to sculpt. 20:26 Now, although we start with three tables, if we need to combine all of the information 20:32 together in order to restore the original list structure of the data, we can do that 20:40 by using something called a SQL SELECT statement. Here we see an example of such a statement. 20:48 I know this looks complicated. But hopefully, you are able to read this even right now 20:56 at the beginning of our course and get a good idea for what this statement is supposed to do. 21:03 So let's take a look. We are selecting the customer's last name, the customer's 21:09 first name, the customer's phone number, and the date of the art 21:15 course, the amount paid out of the enrollment table, 21:21 the course itself, and the total course fee from our three different tables-- 21:27 customer, enrollment, and course. We are then linking the related records in each table together 21:34 using these matched pairs of values that I mentioned earlier. In the first case, we are linking the customer table 21:42 to the enrollment table using matched pairs of customer 21:47 numbers. And in the second case, we're linking the course table 21:53 to the enrollment table using matched pairs of course numbers. What this will do for us is restore 22:02 the original list of data. So you can see our list. It contains the customer's last name and first name, 22:09 customer's phone number, the course date, the amount that the customer has paid so far, 22:17 the course they are taking, and the total course fee. Now, the SQL statement that we saw was a SELECT statement. 22:27 And that is a statement which just pulls data from the database. 22:33 There are many other things that we can do with SQL, like deleting data, inserting new data, 22:40 updating data in the database, and so forth. And we will learn how to do all of those various tasks 22:48 when we study the Structured Query Language in more depth in a few weeks. 22:55 Now I would like to turn our attention to some conceptual information. 23:02 Let's focus on a database system. It's important to remember that a database system is not 23:09 just the database itself. Rather, it exists in a broader hierarchy, which 23:17 includes users, application software programs which 23:22 rely on the database, something called a database management system, a DBMS, and then, of course, the database itself. 23:33 The broader picture of a database system includes these four components. 23:39 Here we see the relationships between the four components of a database system. 23:46 On the far left, we see users. Now, users are people, just like you 23:53 or me, who might need to use the data in a database. But a user doesn't necessarily have to be a human being. 24:02 It is possible, for example, for other software programs 24:07 to be users of our database as well. Now, you will see that users interact 24:14 with database applications. These are the programs that rely on the database. 24:20 So if we are a human user, we might, for example, use a website, which interacts with a database. 24:29 That website could be considered a database application. You will notice that the database application 24:37 talks to the DBMS. The DBMS, the database management system, 24:45 acts as a gatekeeper. All of the information flowing in or out of the database 24:53 must travel through the DBMS. It is a critical mechanism for maintaining the quality 25:02 of the data in the database. We do not allow our users or our database applications 25:09 to directly access the data in the database. To work with the data, to access those data, 25:17 database applications must go through the database management 25:22 system. And then, finally, far over on the right side of our screen, 25:28 we see the database itself. And this is where we actually store the raw data 25:34 in their separate tables. So what do the users of a database system do? 25:41 Well, they do many different things. You and I are users of database systems on almost a daily basis, even if we don't realize it. 25:50 So we can look up our checking account balances online. 25:55 We can produce reports. We can seek information on Wikipedia. 26:02 All of these are examples of us as users interacting with database applications. 26:10 A few interesting things to note about the database itself is that it is a self-describing collection 26:18 of related records. What this means is that the database does not just 26:24 contain the data, but it also contains a definition of its own structure. 26:30 Databases also contain metadata. The simplest way to understand metadata 26:36 is that metadata are data which describe other data. 26:42 And in the context of the database world, metadata are data which describe the structure 26:49 of the data in the database. So for example, I may say that an employee table contains 26:57 an attribute called Employee ID, and that ID value is 27:03 an integer. That knowledge that the Employee ID value is an integer 27:10 is metadata. And of course, databases allow us to establish 27:16 relationships between tables. So that means the tables within a database 27:22 are related to each other according to the relationships that we establish. 27:28 So it's important to remember that a database does not just contain the data that we have put into it. 27:35 A database also contains metadata, which describe our data. 27:41 It contains, or potentially can contain, what we might refer to as overhead data. 27:49 These are tables of data that are used to improve performance or track 27:57 the status of the database or how users are interacting with the database. 28:02 And one of the major types of these overhead data are called indexes. 28:08 Later in our course, we will examine database indexes. And of course, databases can also 28:15 contain application metadata. So these are data values that are 28:21 used by the applications which rely upon the database. 28:27 An application, for example, might store user preferences in the database. 28:34 Now, to reiterate, a DBMS, or a database management system, 28:39 serves as a gatekeeper or an intermediary between the database and the applications 28:47 which rely upon that database. The purpose of the DBMS is to control and manage 28:54 the operations of the database. The DBMS is a special kind of software program, which 29:02 is used to not only create databases, but also to process and administer those databases. 29:10 So a DBMS is a type of application program. But it is not the database. 29:18 The DBMS interacts with the database, but it itself is not the database. 29:26 What, then, can a database management system do for us? Well, the answer is it can do many different things, 29:33 as we see here on your screen. We can use the DBMS to create a new database. 29:39 And then, within that database, we can create tables and establish relationships between tables. 29:48 We can read, write, edit, and delete data using the DBMS. 29:56 We can establish rules which regulate and control the data in the database. 30:02 The DBMS provides an additional layer of security for us. And it also provides mechanisms through which 30:11 we can backup our data and recover our data if anything 30:17 ever goes wrong. So the DBMS is a very critical piece of software. 30:23 Now, businesses often operate with many different types of business rules or constraints. 30:31 And one of the great things about working with databases is that the DBMS can establish and enforce 30:39 many different types of constraints. One of the most useful types of constraint 30:45 that a DBMS can enforce is called a referential integrity constraint. 30:51 The purpose of a referential integrity constraint is to ensure that values that we are entering in one table 31:00 have matching values in another table. Now, that's a very abstract way of trying 31:07 to convey a simple concept. As an example, imagine that we have a customer 31:12 table and a project table. And we are trying to enter information 31:19 about a new project in the project table. And we specify that the customer ID 31:27 that is associated with this project is customer number 5. 31:32 Referential integrity means that the database will actually check in the customer table to see 31:40 if customer number 5 exists. If customer number 5 does not exist, 31:47 then the database will not allow us to add that customer ID to our project table. 31:55 And in this way, I hope you can see it helps to maintain the quality of the data 32:01 in the database, what we call data integrity. 32:06 Now, the fourth component of our database system were database applications. 32:12 And database applications are simply those software programs that need to use or work 32:20 with data in the database. So these might be data driven websites. 32:26 They might be mobile apps. They might be desktop applications or productivity software. 32:34 Just remember that these applications are not allowed to talk to or use the database directly. 32:43 But rather, they must go through the DBMS to work with or use those data. 32:49 Finally, for this lecture, I want to just talk about some 32:55 of the different types of database systems that are available. First, we can consider something called a personal database 33:03 system. These are very small databases. 33:08 They are typically used by a single user at a time. 33:13 They are not complicated. They are simple in design. And they're not really suitable for supporting 33:20 medium or large businesses. They might be used by very small businesses just out 33:26 of convenience. But as those companies grow, they will, at some point, 33:31 certainly need to choose a more robust database solution. 33:37 An example of a personal database system might be Microsoft Access. 33:43 Convenient, easy to use, but not very powerful. 33:48 One of the interesting characteristics of personal database systems is that they often 33:54 contain the capability to build a database application, or at least interfaces to the database 34:03 within the DBMS. So in Microsoft Access, for example, 34:09 I can create data entry forms or generate reports for my Microsoft Access database. 34:18 That is, it kind of combines the database application and the database management system into a single entity, 34:27 conceptually speaking. On the other hand, we have a class of database systems 34:33 that are called Enterprise-Level database systems. These are databases that are used by medium and large size 34:40 companies to support their operations. And it is Enterprise-Level databases 34:46 that we will learn to use and interact with in this class. 34:51 Compared to personal database systems, Enterprise-Level databases are designed 34:57 to support many different users at the same time. And this could be thousands or tens of thousands of users 35:07 all at the same time. Enterprise-Level databases also serve data 35:12 to more than one application. For example, if we are a retailer like Target 35:22 or Walmart, we may have a database that provides data to our website 35:30 and also provides data to the dashboards that are used by our managers. 35:38 Enterprise-Level databases are commonly spread across multiple physical computing devices. 35:47 So many different computers. So a single database might be running on many physical servers. 35:56 Enterprise-Level databases can also be geographically distributed. 36:02 So I might have part of my database in my corporate offices in Los Angeles, 36:08 another part in Beijing, another part in Berlin in Germany. 36:15 And Enterprise-Level database management systems support more than one database. 36:21 So within the same database management system, we might, for example, have our operational database, 36:29 which allows us to run our business and keep track of transactions in real time. 36:36 And we might also create a data warehouse or data marts. 36:42 And they can all be managed by the same Enterprise-Level database management system. 36:47 These are large databases. As an example, consider an ERP system like SAP. 36:56 A typical ERP implementation will have thousands of tables, all related 37:03 to each other in some way. An Enterprise-Level database management system 37:09 can handle this level of complexity with relative ease. 37:14 So here we see a graphical example of these Enterprise-Level database systems. 37:20 The DBMS, again, serves as an intermediary or a gatekeeper between the databases 37:27 and all of the various database applications that want to rely on the data in those databases. 37:34 And they can be mobile applications, applications that are written in Java, or C#, 37:41 or even web applications. It might be ASP.NET or PHP apps. 37:50 Just as examples of some commercial DBMS products, again, I mentioned that personal or desktop DBMS products 37:57 might include Microsoft Access. Whereas, Enterprise-Level products 38:04 include SQL Server, which is what we will be using in this class, Oracle, MySQL, and DB2. 38:15 MySQL, if you're interested, is a completely free and open 38:20 source database. Well, my friends, thus ends our first lecture 38:28 for our database class. I hope you learned something. And come back soon, and we'll begin our exploration 38:37 of our next topic. Until then, have a great day.