0:06 hello everyone welcome back in this 0:08 presentation we will see the advantages 0:11 of having dbms over file system so the 0:14 topic of today's lecture is file system 0:17 versus database management systems dbms 0:21 in the last presentation we have seen a 0:23 variety of applications of dbms and we 0:26 concluded the last presentation with 0:28 this why we need dbms when we already 0:30 have file systems in place 0:32 and i told you there are a lot of 0:34 advantages of having dbms over file 0:37 system so in this presentation we are 0:39 going to exclusively focus on the 0:41 disadvantages of having file system and 0:43 why do we need to move towards database 0:46 management system so let's now see 0:48 file system versus database management 0:51 system we are going to compare file 0:53 system and database management system as 0:56 per the following seven points the first 0:58 point is data redundancy and 1:01 inconsistency point number two 1:03 difficulty in accessing the data point 1:06 number three data isolation 1:09 point number four integrity problems 1:12 point number five atomicity problems 1:14 point number six concurrent taxes 1:16 anomalies and point number seven 1:19 security problems i mean to say these 1:22 seven problems are there in five systems 1:25 and that's why file systems are not 1:26 suitable for organizations for efficient 1:29 storing and retrieving purpose 1:32 let's start with the first one 1:34 data redundancy and inconsistency when 1:37 we talk about software applications we 1:39 know that software applications are 1:42 developed by different programmers say 1:44 if we have a software program that 1:46 software is not the outcome of the 1:47 effort of only one programmer there are 1:50 multiple programmers involved when we 1:52 have different programmers involved in 1:54 creating a software program so obviously 1:57 each programmer will choose different 1:59 file for their own activities 2:01 when they go for obviously different 2:03 files obviously file structures are also 2:05 going to be different see different 2:08 programmers chooses different file 2:09 systems obviously each file system will 2:12 be differing in the structure and the 2:13 way they organize the data and at the 2:15 same time when we have different 2:17 programmers involved in a software 2:18 project it may be using different 2:21 programming languages a software is 2:23 generally a collective effort of 2:25 multiple programmers so each programmer 2:27 has their own uniqueness in that 2:29 perspective suppose if they store the 2:32 data in different different places 2:34 obviously data is duplicated so that is 2:37 what i mean to say here is when we have 2:40 duplication in the data that duplication 2:43 we call as redundancy what is the 2:45 disadvantage of having duplication or 2:47 redundancy obviously the storage will be 2:50 increased and the access cost is also 2:52 increased so when we have redundancy 2:55 this is a serious problem to address 2:58 because redundancy invites higher 3:00 storage cost and access cost and this is 3:03 a problem mainly with the file systems 3:04 because each file can be stored in 3:06 different locations as well if we are 3:08 working on windows operating system one 3:10 file may be in c drive another file may 3:12 be indeed drive one file may be in one 3:15 directory another file may be existing 3:17 in another directory which is not in the 3:18 same drive in such a situation if there 3:21 are duplicate records it's really 3:22 difficult to find out which is the 3:24 duplicate record at the same time when 3:26 we want to access the data it certainly 3:29 invites a lot of access cost and the 3:32 storage size is also getting increased 3:34 because of the redundancy i will bring 3:36 in a file to understand data redundancy 3:38 and inconsistency now 3:41 this is a table that i have created in 3:42 microsoft word where this table contains 3:45 employee id employee name department and 3:48 the salary of each employee we can see 3:50 different departments are there human 3:52 resource quality assurance accounts 3:54 sales etc and i have listed few 3:57 employees here 4:00 and the problem with maintaining this 4:02 file system is that data redundancy say 4:05 if i insert a new row here 4:07 i'm just duplicating the first row one 4:09 not one alicia 4:13 human resource 4:16 and some salary now if you see here this 4:19 one not one is a duplicate record 4:21 because we know already one not one 4:23 exists see file system is not able to 4:25 identify that this is a duplicate record 4:27 you may be asking we can handle this 4:29 situation well in microsoft excel by 4:32 doing conditional formatting and 4:33 formulas of course we can do that but 4:35 think an organization wants to store and 4:38 retrieve the data effectively in such a 4:40 case if they go for file system it's 4:42 really difficult to access the data 4:45 because handling redundancy is one of 4:47 the biggest problems with file systems 4:49 at the same time let's assume i am 4:51 renaming the department human resource 4:53 as simply hr in this case is it 4:56 affecting the other places 4:58 no we are making some change and we 5:00 expect that chain to be reflected in all 5:03 associated places but unfortunately file 5:06 system is not able to do this and this 5:08 actually leads to inconsistency at the 5:10 same time let's assume one more 5:12 situation this file is located in c 5:15 drive of my computer i have copied the 5:17 same file in d drive as well i am 5:20 copying the file to another location 5:21 just to have a backup having a backup is 5:24 always good because if one files get 5:26 corrupted or lost we can still have 5:28 another file let's assume i am making a 5:30 change in the file that is existing in c 5:32 drive will the file present in d drive 5:35 will automatically gets updated with the 5:37 change that what we have done in the 5:38 file that is present in c drive 5:41 no this is also leading to inconsistency 5:44 manually we need to go and update all 5:45 the associated places if there is a 5:48 change in the file system but database 5:50 systems are good in this perspective 5:52 when you make a change in one place it 5:55 will automatically reflect the changes 5:57 in all the associated places that's the 5:59 power of having dbms from this we 6:02 understood that duplication i mean 6:04 redundancy is a problem with file 6:06 systems and databases can handle this 6:08 well and also file systems has 6:11 inconsistency which can also be handled 6:14 well in database systems how database 6:16 system handles well don't worry about 6:19 this now in the course progresses we 6:21 will be able to understand all the stuff 6:23 completely so we are done with the first 6:25 point data redundancy and inconsistency 6:29 let's now move on to the second point 6:31 difficulty in accessing the data let's 6:34 take the same file 6:35 the example what we are seeing i am 6:37 asking you to retrieve all the employee 6:40 names who draws salary greater than 50 6:43 000 6:43 in that case retrieving the data from a 6:45 file system is really difficult as 6:48 already mentioned if you work with the 6:49 file system if you apply filters or 6:51 conditional formatting you can retrieve 6:52 it but think when there is an 6:54 application and that application 6:56 requires the data from your file 6:58 in that case it's really difficult only 7:01 thing is you need to manually go and 7:03 check is it greater than fifty thousand 7:05 yes is it greater than fifty thousand 7:06 yes we have to manually retrieve all the 7:09 records and in this case here this is 7:11 not greater than fifty thousand dollars 7:12 so we need to remove this or we need to 7:15 copy all this row it involves a lot of 7:18 manual effort 7:19 and that's why i told you file systems 7:21 have difficulty in accessing the data 7:24 let's take two examples just imagine i 7:27 have given a file which contains all 7:28 student information in a table and i'm 7:31 asking you to find out all students who 7:33 resides in a particular city as already 7:35 mentioned we need to manually do it in 7:38 case that city has multiple names 7:41 and that refers to the same city again 7:43 in that perspective it really invites a 7:45 lot of inconvenience so file system is 7:48 really having difficulty in accessing 7:50 the data 7:51 and i'm asking you to find out all the 7:53 students who gained 25 credits in such a 7:56 case manual intervention is really 7:58 required and see the real difficulty 8:00 again now i am asking you to list both i 8:03 mean all the students who resides in a 8:05 particular city and also they should 8:07 have gained 25 credits in this case file 8:10 system is really difficult in accessing 8:12 the data but database systems are very 8:15 easy when we supply the queries with the 8:17 condition they should belong to this 8:19 city and they should have gained 25 8:21 credits it's really easy in dbms to 8:23 access or retrieve the records as per 8:25 our convenience and conditions 8:27 from this we understood that file 8:29 systems are not convenient and efficient 8:32 whereas database systems are so 8:34 obviously when we want to have easy 8:36 access easy retrieval of data then a 8:40 more responsive data retrieval system is 8:42 needed which file systems cannot offer 8:44 fortunately dpms can do this easily 8:48 we are done with the second point 8:49 difficulty in accessing the data let's 8:52 now move on to the third point data 8:54 isolation what do we mean by this we 8:57 know software programs are the efforts 8:59 of different programmers where they use 9:01 different files where each file uses 9:03 different file structures and obviously 9:05 they may store the data in different 9:07 locations as well so obviously the data 9:09 are scattered in different files in file 9:12 systems in such a case isolating the 9:14 real data that we require is a problem 9:17 because our files may be in different 9:19 locations and that's why data isolation 9:21 is really difficult with file systems 9:24 but in database management system all 9:26 the records are going to be stored in a 9:28 central place which is the database and 9:30 that's why whenever any change is made 9:32 in one place all the associated places 9:34 gets updated automatically at the same 9:37 time data can be easily isolated because 9:40 all the data are in one database 9:42 and you may be asking how come if that 9:44 database fails what we should do all 9:46 data will be gone right no problem we 9:49 can take backups periodically and we can 9:51 synchronize the backup with the 9:52 up-to-date data so we are done with the 9:55 third point the data isolation let's now 9:57 move on to the fourth one the integrity 9:59 problems 10:00 what do we mean by integrity problems 10:03 suppose i want to enforce some 10:04 constraints or conditions to my data 10:07 which is actually stored in the file 10:08 system let me bring an example file we 10:11 know all employees will have salary but 10:13 here i am updating alicia salary as 0. 10:16 do you think 0 is a valid salary 10:19 no but see file system is accepting this 10:22 value isn't it because there is no way 10:24 in this file system to enforce some 10:26 conditions that the salary value should 10:28 not be zero 10:30 so that's one of the biggest problems 10:31 with file system so when we want to 10:33 enforce some conditions then definitely 10:36 it is not possible with this file system 10:38 to some extent we can handle this in 10:40 microsoft excel but again as i mentioned 10:43 this file system cannot be a complete 10:45 back end because there are several 10:47 drawbacks and we are dealing all the 10:49 drawbacks one by one at the same time if 10:51 the employee id is purely a number type 10:53 and when i start giving a random text 10:55 it's accepting because no constraints 10:57 can be enforced here so this is also one 10:59 of the problems with the file systems so 11:01 coming to the integrity problems we 11:03 cannot enforce consistency constraints 11:06 into the files example the account 11:08 balance in a banking database should not 11:10 be zero 11:12 if this is maintained in a file system 11:14 definitely it's really difficult but 11:15 when it is a database system when we 11:17 enforce a condition that account balance 11:19 should not be zero whenever any update 11:21 or insert is carried out on that 11:23 particular row and if the condition is 11:25 not satisfied then that activity will 11:27 not be permitted in databases see how 11:30 powerful databases are 11:32 of course we can easily enforce these 11:33 constraints or conditions in a software 11:36 code so it's really easy to add the 11:37 conditions in software code but not in 11:40 the files and that's what we have seen 11:42 just now think you have enforced some 11:44 constraints and suddenly we want to 11:46 enforce some new constraints again this 11:48 is also a problem with the file systems 11:50 but in database management systems 11:52 anytime whatever constraints we want to 11:54 enforce we can do that because databases 11:57 are really a savior for all these 12:00 problems and we have one more biggest 12:02 integrity problem with file system which 12:04 is this problem the integrity problem is 12:07 compounded when the constraints involve 12:09 several data items from different files 12:12 imagine our data is scattered or stored 12:14 in different places different files 12:16 different locations in that case 12:18 enforcing some constraints when it 12:21 involves several data items will be 12:22 really a tough task but this will not be 12:25 the case in database management system 12:27 because all the required data will be in 12:29 our database in one place 12:31 so we are done with the fourth problem 12:33 the integrity problems let's now move on 12:36 to the fifth point that is the atomicity 12:38 problems file systems are really 12:40 suffering from atomicity problems what 12:43 we mean by this it's really difficult to 12:45 ensure atomicity in a conventional file 12:48 processing system what do we mean by 12:50 this don't worry i am going to explain 12:52 that now see any system whether it is a 12:55 hardware system or software system any 12:57 system is prone to failures is subject 13:00 to failure isn't it when a failure has 13:02 happened it is an expectation from that 13:04 system to recover from failures 13:07 of course backup will be really helpful 13:09 in doing this so obviously restoration 13:12 of data is a very very essential thing 13:14 when the system encounters failure now 13:17 when we have a backup we can easily 13:19 restore the data i am not talking about 13:21 the problem of backup here i am talking 13:23 about atomicity problems what do we mean 13:26 by this see the example then you will be 13:28 able to understand this let's say i am 13:30 doing a fund transfer and i am going to 13:32 enforce a condition that that 13:34 transaction should be atomic it means 13:36 the transaction should fully complete or 13:38 none 13:39 i know things will be unclear when we 13:41 see an example then it will be easy for 13:43 you to understand let's bring in a table 13:45 here let's assume i'm gonna do a fun 13:47 transfer from ace account to base 13:49 account so this is what the example that 13:51 i was talking about let's assume before 13:53 the transaction is started a is having 13:55 the account balance of one thousand 13:57 dollars and b is actually having the 13:59 account balance of five 500 14:02 and let's assume the transfer operation 14:04 is carried out from case account i mean 14:06 transfer of 500 dollars from a to b it 14:09 means from a i'm withdrawing 500 14:13 and the new balance for a will be only 14:15 500 because this 500 is debited to yes 14:18 account and it is expected to be 14:19 credited to b's account isn't it so at 14:22 this stage a is having a new balance of 14:25 500 remember the debit operation is 14:28 completed but still the credit operation 14:30 is not completed after debit is 14:32 completed let's assume the system has 14:34 undergone a failure in that case the 14:37 value is actually debited and a is ended 14:39 up with only 500 and if you note here 14:42 the value is not credited to b's account 14:45 b is also with 500 only when a system is 14:48 accepted if a is 500 and b is 1000 it 14:51 means amount is debited to a s account 14:53 and credited to b's account that's 14:55 perfect otherwise the money should have 14:57 not been debited to a's account so that 15:00 consistency is maintained see this is a 15:02 real problem when we deal about 15:04 transactions when we use files as a 15:06 backend so what condition i am enforcing 15:08 is if a transaction is started it should 15:11 complete till the last when the source 15:13 account is debited then the destination 15:15 account should be credited so i'm 15:18 enforcing a condition if a transaction 15:20 is started it should complete all 15:22 otherwise none say if a failure has 15:25 happened here then the transaction 15:27 should be revoked to the previous stage 15:29 which a is equal to 1000 and b is equal 15:31 to 500 even if there are some statements 15:34 executed and if there is a failure so 15:36 system should identify this failure and 15:39 it should revoke it to the original 15:40 place like this a is equal to 1000 then 15:43 b is equal to 500 so what we are 15:45 achieving 15:46 all are none right if all then good if 15:50 none that is also good partial execution 15:53 obviously is not good in this stage and 15:55 that's why we want to enforce atomicity 15:57 constraints to this and that's why i'm 16:00 telling you file system has a lot of 16:02 atomicity problems whereas dpms is very 16:05 good in this perspective so we are done 16:08 with the fifth problem the atomicity 16:10 problems let's now move on to the sixth 16:12 point which is concurrent access 16:14 anomalies 16:16 is concurrency good or bad 16:18 when multiple people work at the same 16:20 time it's always good we know databases 16:22 store all the data in one place isn't it 16:25 let's assume if one person is accessing 16:26 the database another person is refrained 16:29 from getting the access because already 16:31 one person is accessing the database 16:33 it's not good right let's say we are 16:35 accessing google.com see multiple people 16:38 from multiple different locations are 16:40 accessing google server at the same time 16:42 so concurrent access is always good but 16:44 there are some serious problem with 16:45 concurrent access 16:47 and that too when they are accessing the 16:49 same data let's see that now so 16:51 concurrency is always good because 16:54 multiple people can access the resource 16:56 at the same time but it may lead to 16:59 inconsistency which we need to address 17:01 how let me take a real-time example 17:04 let's take there is a common balance to 17:06 the department which is one thousand 17:08 dollars so this department has an 17:10 account balance of one thousand let's 17:12 assume there are two clerks clerk a and 17:14 clerk b can access this shared data item 17:17 let's assume clerk a wants to withdraw 17:19 some money at the same time clerk b also 17:22 wants to withdraw some money let's 17:24 assume the situation that both are doing 17:26 this at the same time when clerk a reads 17:29 the shared data item the account balance 17:31 of the department he sees it as one 17:33 thousand dollars same value only clerk b 17:36 is also going to see because this is a 17:38 concurrent access both clerk a and b are 17:41 accessing this shared resource the 17:43 common data at the same time 17:45 after reading let's assume clerk a is 17:48 debating 200 and he is writing a new 17:51 value as 800 and this is correct only 17:54 because 1 000 is the actual value what 17:56 he has read and he debited 200 to the 17:59 account and finally is writing a new 18:01 value 800 dollars which is perfectly 18:03 fine after a completes its execution he 18:06 has written a new value 800 and he has 18:08 left the place 18:10 b after he has read the value 1000 after 18:13 some time now he is debating 300 to the 18:16 account it means he has read 1000 which 18:19 is a wrong read actually because a new 18:22 update was already carried out but b is 18:24 not aware of this b assumes that the 18:26 account has one thousand dollars as the 18:28 balance which is this and he writes a 18:30 new balance after debating 300 the new 18:33 balance is what 700 according to b and 18:36 if b writes this as the final value to 18:38 the common balance or the common data 18:41 then this is a problematic one because 18:43 the new balance is 500 only a has 18:46 already debited 200 and b is debiting 18:48 300 obviously 200 plus 300 is 500 and 18:52 the new balance should be what 500 only 18:55 but see what is the final value that is 18:57 written by b it's 700 which is 18:59 definitely incorrect and that's why i 19:02 told you concurrent access anomalies are 19:04 there in file systems but databases are 19:07 really good in this perspective even if 19:10 multiple people access the resource at 19:12 the same time databases are good in 19:14 handling this i will also give you one 19:16 more example let's assume there is a 19:18 file in a shared directory what happens 19:20 when two people open the file at the 19:22 same time 19:23 the first person when he opens he gets 19:25 both read and write permission on that 19:26 file right in the second person who 19:28 opens the same file which is in the same 19:30 location because it's a shared location 19:33 then obviously he gets the file with the 19:36 read-only permission not with the right 19:38 permission because it leads to 19:39 inconsistency when concurrent access is 19:41 there in a file system isn't it so file 19:44 systems are not recommended when we are 19:46 going for concurrent access because 19:48 files have a lot of anomalies when we 19:51 deal with concurrent access 19:53 we are done with the sixth point the 19:54 concurrent access anomalies in file 19:56 system 19:57 let's now move on to the last one the 19:59 security problems and we know security 20:02 is important for anything and we know 20:04 databases or file systems both are going 20:07 to deal with the data think about the 20:09 data is a confidential data so obviously 20:12 not everyone should be permitted to 20:13 access the data so we are required to 20:15 restrict the data access 20:17 so what i mean to say in simple terms is 20:20 we want to provide authentication as 20:22 well as authorization so there are 20:24 multiple authentication mechanisms so 20:26 that we can identify whether the right 20:28 user is accessing the system or not so 20:31 authentication is all about identifying 20:33 the right person is accessing the system 20:34 or not say password is one of the 20:36 authentication mechanisms biometrics 20:39 facial recognition voice recognition all 20:41 these are authentication mechanisms 20:44 let's take the file 20:45 in this file let's assume we want to 20:47 restrict access to this file so what we 20:49 can do we can go to the option security 20:51 and we can lock this file so whenever 20:53 any person opens this file it will ask 20:56 for the password so if he gives the 20:57 right password this file will be opened 20:59 otherwise this file will not be opened 21:02 let's assume two people have access to 21:03 this file and both knows the password 21:06 and obviously both will be getting the 21:08 same privilege let's name them as a and 21:10 b a can see employee id employee name 21:13 department and salary similarly b can 21:15 also see all these four columns let's 21:17 assume b should be restricted from 21:19 seeing the salary column in that case is 21:22 there any way to log this particular 21:24 column from b 21:26 in file system it's really difficult 21:28 isn't it but in databases we can create 21:31 multiple views and we can grant the 21:32 privilege according to the users roles 21:35 and responsibilities a normal user will 21:37 have lesser privilege when compared to 21:39 database administrator so database 21:41 administrator has the complete privilege 21:43 over the database but not an end user or 21:46 an iv user don't worry about the users 21:49 in the coming lectures we are going to 21:50 see different users of databases so we 21:53 will understand things clearly there 21:55 so what i mean to say here is we need to 21:58 have authentication and authorization 22:00 techniques for the data that is stored 22:02 in the file system or in the database 22:04 system but the problem is providing 22:06 these things to the file system is 22:08 really difficult so databases are really 22:10 powerful even addressing security 22:12 problems and i'll give you one more 22:14 example suppose if we have multiple 22:16 files in multiple different locations 22:18 and each file is having different 22:20 different data in a university there is 22:22 a finance personnel and this guy should 22:24 not access academic records and 22:26 enforcing this access constraints in 22:28 file system is really difficult we 22:30 understand that finance related data 22:32 only finance personnel should deal with 22:35 but if it is in a file system enforcing 22:37 such access constraints is really 22:39 difficult so we have discussed all the 22:41 seven points in this presentation i hope 22:44 now you can understand why do we need 22:46 database management systems over file 22:48 systems because we have seen a lot of 22:50 drawbacks with their file systems and 22:53 that's why we wanted to move our 22:54 attention towards database management 22:56 systems i hope the session is 22:58 informative and thank you for watching 23:01 [Music] 23:01 [Applause] 23:03 [Music] 23:12 you