0:01
In this video, we'll learn about the Relational Model.
0:03
The Relational Model is more than
0:05
35 years old, and it's
0:07
really the foundation of database management systems.
0:09
It's spawned a many billion dollar industry.
0:13
The relational model underlies all
0:15
commercial database systems at this point in time.
0:18
It's actually an extremely simple model and that's one of its benefits.
0:22
Furthermore, it can be queried.
0:24
By that I mean we can ask
0:25
questions of databases in the
0:26
model using High Level Languages.
0:29
High Level Languages are simple, yet
0:31
extremely expressive for asking questions over the database.
0:35
And finally, very importantly there
0:37
are extremely efficient implementations of
0:39
the relational model and of
0:41
the query languages on that model.
0:43
So let's move ahead and
0:45
understand the basic constructs in the relational model.
0:48
So, the primary construct is in fact, the relation.
0:51
A database consists of a
0:53
set of relations or sometimes
0:55
referred to as "tables", each of which has a name.
0:58
So, we're gonna use two relations in our example.
1:01
Our example is gonna be a fictitious database
1:03
about students applying to colleges.
1:05
For now we're just gonna look at the students and colleges themselves.
1:08
So we're gonna have two tables,
1:09
and let's call those tables
1:11
the Student table and the College table.
1:15
Now, as an aside, there's
1:16
a healthy debate in the
1:18
database world about whether tables
1:20
relations ought to be named using the singular or the plural.
1:23
I personally don't have a stake in that debate.
1:25
I'm going to use the singular.
1:27
Next, we have the concept of attributes.
1:30
So every relation and relational
1:32
database has a predefined set
1:34
of columns or attributes each of which has a name.
1:37
So, for our student table,
1:39
let's say that each student
1:41
is gonna have an ID, a
1:42
name, a GPA and a photo.
1:46
And for our college
1:48
table, let's say that every
1:50
college is going to have
1:51
a name, a state, and an enrollment.
1:55
We'll just abbreviate that ENR.
1:57
So those are the labeled columns.
1:59
Now the actual data itself
2:01
is stored in what are
2:02
called the tuples (or the rows) in the tables.
2:05
So let's put a couple
2:06
of the data tables, data tuples in our tables.
2:09
So let's start with the
2:12
students and lets say that
2:14
our first student has ID 123.
2:15
Name is Amy -
2:18
GPA 3.9, and she's happy with that.
2:21
So she has a smiley photo.
2:23
And our second student is 234,
2:26
Bob - his GPA
2:29
is 3.4.
2:30
He's not quite as happy.
2:32
And typically of course a
2:33
table will have thousands, maybe
2:35
millions, even sometimes billions of
2:37
rows, each row containing a value for each attribute.
2:41
In our college table, let's suppose
2:43
we have .... well of course
2:44
we're going to start with Stanford in
2:46
the state of California and Stanford's enrollment is 15,000.
2:52
We'll include our cross-bay rival Berkeley
2:54
again, in the state of California.
2:56
Berkeley's enrollment is a whopping 36,000.
2:58
And last of all,
3:01
we are going to not be West Coast biased.
3:04
We'll include MIT in the
3:05
state of Massachusetts with an enrollment of 10,000.
3:08
Now of course there's
3:09
gonna be again many more
3:11
tuples in the college table, and
3:12
many more tuples in the student table.
3:15
Okay, next, let me
3:16
mention that in a relational
3:18
database, typically each attribute or
3:19
column has a type
3:21
sometimes referred to as a domain.
3:23
For example, the ID might
3:25
be an integer, the name
3:26
might be a string, GPA might
3:28
be a float, photo might
3:29
be a jpeg file.
3:32
We do also in most relational
3:34
databases have a concept of enumerated domain.
3:37
So for example, the state might
3:38
be an enumerated domain for
3:40
the 50 abbreviations for states.
3:43
Now, it's typical for relational
3:45
databases to have just
3:46
atomic types in their
3:48
attributes as we have
3:50
here, but many database
3:52
systems do also support structured
3:54
types inside attributes.
3:58
Okay, a little bit more terminology.
4:00
The schema of a
4:01
database is the structure of the relation.
4:04
So the schema includes the name
4:07
of the relation and the attributes
4:09
of the relation and the types of those attributes.
4:12
Where the instance is the
4:13
actual contents of the table at a given point in time.
4:16
So, typically you set up
4:17
a schema in advance, then the
4:19
instances of the data will change over time.
4:23
Now, I mentioned that most columns have types.
4:26
But, there's also a special value
4:28
that's in any type of
4:29
any column and that's a
4:31
special value known as null, and nulls
4:32
are actually quite important in relational databases.
4:35
Null values are used to
4:36
denote that a particular
4:38
value is maybe unknown or undefined.
4:42
And, so let's suppose, let's add
4:44
another tuple to our database.
4:47
Let's say 345, another student
4:50
named Craig, and for,
4:51
whatever reason Craig doesn't have a GPA.
4:53
Maybe Craig is home schooled,
4:55
maybe Craig doesn't want to reveal his GPA.
4:57
So then the database would contain
4:59
a null value for Craig
5:01
and we'll just put a neutral face there.
5:04
Or, for example, maybe Bob doesn't
5:07
want to have his photo in
5:08
the database, so then Bob
5:10
would have a null value
5:12
for his photo, again nulls can go anywhere.
5:15
Now null values are useful
5:17
but one has to be very
5:18
careful in a database system
5:19
when you run queries
5:21
over relations that have null values.
5:23
In a later video we'll go
5:24
into this in more detail, but I
5:25
just wanted to give a just
5:26
sort of example of what can happen.
5:29
So, let's suppose we're
5:30
asking a query over our
5:31
student table of all students
5:33
whose GPA is greater than 3.5.
5:36
So when we run
5:37
that query on our database
5:38
obviously we'll get Amy out,
5:40
obviously we won't get Bob out, but should we get Craig?
5:43
The answer is No.
5:44
We don't know for a fact
5:46
that Craig's GPA is greater
5:47
than 3.5, so we'll only get
5:48
one student out from that query.
5:50
Now let's suppose we
5:52
had another query, where we
5:53
were gonna ask for the GPA
5:55
less than or equal to 3.5.
5:57
So, similarly where we
5:59
would not have Amy in result
6:01
and we would certainly have Bob in
6:02
the result and similarly would
6:04
not have Craig in the
6:05
result because we don't know that
6:06
his GPA is less than or equal to 3.5.
6:08
So far so good, but
6:10
it gets a little weird is
6:11
when we add an or here
6:13
in our query, we say I
6:14
want everyone who's GPA is greater
6:16
than 3.5 or who's GPA is less than or equal to 3.5.
6:18
And even though it
6:20
looks like every tuple should
6:22
satisfy this condition, that it's
6:23
always true, that's not the case when we have null values.
6:27
So, that's why one has to
6:28
be careful when one uses
6:29
null values in relational databases.
6:32
Let me erase this now and
6:33
let's move on to our next concept
6:35
which is the concept of Key.
6:38
Key is again another important concept in relational databases.
6:42
And, a key is an
6:43
attribute in of a relation
6:45
where every value for that attribute is unique.
6:49
So if we look at the
6:50
student relation, we can feel
6:52
pretty confident that the ID is going to be a key.
6:54
In other words, every tuple is
6:56
going to have a unique for ID.
6:59
Thinking about the college relation, it's a little less clear.
7:02
We might be tempted to say
7:03
that the name of the college is
7:04
an ID, that actually college
7:06
names probably are not unique across the country.
7:10
There's probably a lot of or
7:10
several colleges named Washington college for example.
7:14
You know what, we're allowed to
7:15
have sets of attributes that
7:16
are unique and that makes sense in the college relation.
7:19
Most likely the combination of
7:21
the name and state of
7:22
a college is unique, and that's
7:24
what we would identify as the key for the college relation.
7:27
Now, you might wonder why it's
7:28
even important to have attributes that are identified as keys.
7:32
There's actually several uses for them.
7:34
One of them is just to identify specific tuples.
7:37
So if you want to run
7:38
a query to get a specific
7:40
tuple out of the database you
7:42
would do that by asking for that tuple by its key.
7:45
And related to that
7:46
database systems for efficiency
7:48
tend to build special index
7:50
structures or store the database in a particular way.
7:53
So it's very fast to find a tuple based on its key.
7:56
And lastly, if one relation
7:58
in a relational database wants
8:00
to refer to tuples of another,
8:02
there 's no concept of pointer in relational databases.
8:05
Therefore, the first relation
8:07
will typically refer to a
8:08
tuple in the second relation by its unique key.
8:11
As our videos develop we'll see the importance of keys.
8:15
Okay, just to wrap up,
8:16
I'll mention how one creates relations
8:19
or tables in the SQL language.
8:21
It's very simple, you just
8:22
say "create table," give the
8:24
name of the relation and a list of the attributes.
8:27
And if you want to give types for the attributes.
8:29
It's similar except you follow
8:31
each attribute name with its type.
8:33
So to wrap up, the relational
8:35
model has been around a long time.
8:37
Has started a huge industry.
8:39
It's used by all database systems.
8:41
As you've seen it's a very
8:42
simple model and will shortly
8:43
see that it can be queried with very nice languages.
8:46
And, finally, it's been implemented very efficiently.