![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction to Evolution of DatabasesAt the core of all information systems are databases. Information needs to be stored, and this is what databases are for. In fact, databases not need to be computer systems, although we typically talk about databases in terms of ones stored and processed by computers. It seems crucial to ask a basic question in the beginning? What is a database? Consider the following questions: Is an Excel table with students’ grades a database? Is your notebook a database? Is a phonebook a database? Is the GMU schedule of classes a database? Is a medical record of a patient a database? Is a list of nurses working in a hospital a database? The answer to almost all of the above questions is yes, with the exception of a notebook which allows free text, drawings, etc. to be written onside. More precisely, database is a collection of data with defined structure and purpose. Obviously, a computer database is a database stored on a computer. In order to use databases, computers need to be equipped with a Database Management System (DBMS). Many commercial and publically available DBMSs are available. Flat DatabasesFlat data models have the least amount of structure. They typically take the form of one large Table, where the first row is the list of the variables and subsequent rows are data. Each case has a row of data. Many statisticians use flat models. Here is an example of a flat file for students in a class:
AdvantagesMost software include free access to flat data files. For a small number of cases, flat databases do a reasonably fast job. DisadvantagesFlat databases waste computer storage by requiring it to keep information on items that logically cannot be available. For example, if we are keeping information on zip codes, flat files require us to enter a missing information for zip codes in foreign countries. Hierarchical and structural databases avoid this problem by defining different tables for classes of countries in which zip code is not available. Thus flat files keep large sparse data full of missing information. When the size of database is large, search through the data takes a long time. Flat databases are not conducive to complicated search queries that divide the database further. For example, in a flat file about students it would be difficult to find all students that live in a certain zip code and have received grade of C. Such a query will require repeated pass through the data. First pass may identify all students who live in a zip code, second pass may identify all students who have grade of C and the third pass may find students in both groups. Such multiple passes through data are inefficient and take long time periods. Since every simple if statement (if zip code is equal to 22101 then ...) in a computer program takes a fraction of a second, efficient search methods are important for large databases. Relational DatabasesIn a relational data base, one stores a record with related fields as data. All items on the same record are said to be related to each other. Thus, one may store the information that patients have medical records in the following fashion:
This information in essence says that the patient ID number, name and medical record number belong to the same person. To effectively store both information items and the relationships among these items, relational data are kept in table formats. The first row of the table shows the name of the variables and subsequent rows are data. All items in the same row usually belong to the same case and are related. One column of the table is treated as the key to the table. Numbers or characters in this column corresponds to items in other Tables. Thus it is possible to move from one table to another. In a relational database, tables do not need to be of the same size but they all need to have a key column that connects them to each other and that uniquely defines the elements inside the table. ExampleHere is an example of a Table of grades for the example introduced under flat files:
This is an example of an additional table for contact information:
When a query is made, the relational database searches through its tables to find the answer. Often the answer involves information pooled together from different tables. For example, a query for names of people with grade of A, can be answered from the Table of Grades. The query for grades of people in zip code area 22101 must be answered from both Tables. Can you try to answer this question: What is the average final grade of the persons in zip code area 22101? Did you notice how you move from one table to another to get your answer. You use the key shared between the two tables, to find the relevant information in the other table. AdvantagesA relational database makes life difficult for the designer of the database but easy for the user of the database. It is more difficult for the designer because many possible relationships must be anticipated. It is easier for the user because data can be examined from many different perspectives. In addition, if Tables are appropriately defined it is possible to avoid having to enter missing information for variables that are not logically possible. This helps data entry and data processing speed. A relational database is also easy to modify because adding new concepts involves adding new Tables, not altering old ones. Hierarchical DatabasesHierarchical database models are one type of relational data models in which the relationship between any two adjacent item is similar to a father-child relationship. Hierarchical database models resemble a-cyclical tree structures (directional tree structures that do not include circular paths). ExampleThe file directory in your desk top is an example of a hierarchical database system. A folder may contains other folders which may contain other files, which contain data. AdvantagesIn hierarchical models, children inherit the relationships and characteristics of their fathers. An operation on the father affects the children; if you tell the computer to do an operation on a folder, the operation affects all the folders and files that it contains. For example, if you delete the top folder, you would delete all of the folders and files it contains. This feature saves time for the person maintaining the files. Sometimes, when the real world relationship that is being modeled is not hierarchical, it is difficult to fit relationships into a hierarchical database model. In these circumstances, an operation on the father should not be executed on the children. Distributed DatabasesMost databases physically reside in one place and are managed by a single computer. They may be backed up to another place but the elements of the database are not maintained in different places. In a distributed database, data are kept in different settings and on different computers. One or several central computers maintain indexes to where the data are. Using the address of the data then computers can communicate and find the information needed. Distributed databases need not only addresses for where the data are but also need an audit trail of who has updated data or retrieved it. Audit data are needed in order to pinpoint errors in the system and in order to understand where confidentiality of the system breaks down. When a computer requests data from another, an audit trail is created by storing who sent data where and when. When this computer passes the data to another, the information needs to be updated in the original computer. As the number of computers receiving the data increases the task of auditing becomes more difficult. At some point (at least theoretically), it is necessary to cutoff the original computer from being updated about where the information has traveled. ExampleImagine a healthcare organization with multiple geographically distributed sites. Each site manages its own patients, but sometimes patients may visit ther locations (i.e., when on vacation). One way of implementing information system in such as organization is through distributed database. Each site can maintain data (i.e. medical and financial records) cocrresponding to own patients, while the data are accessible from all other sites. Advantages and disadvantagesThe information system manager must make decisions on whether to use centralized or de-centralized databases based on a number of issues including the following:
Federated DatabasesFederated databases are an approach to solve some of the problems with distributed datbases. Their goal is to provide transparent interface to distributed datbases. Users are able to access federated databases without knowing where physically required data are stored and in what formats. In a federated database, all local databases maintain their autonomy. It is local databases that decide if to joint the federation, execute incoming queries and how. Privacy and security are maintained locally. Federated databases are, thus, ideal for multiple organizations, i.e. healthcare providers, that wish to share information. What Do You Know?Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through writing. When you write you not only recall what you have written but also may need to make inferences about what you have read. Please complete the following assessment:
If you are taking the course online, submit your responses as a word document attached to an email to your instructor, otherwise bring your work to class. Do not copy-and-paste. Write answers in own words. PresentationsFollowing resources are available to assist you:
Narrated lectures and videos require use of Flash. Recently Asked Questions
Ask a question and we will answer it within the next 48 hours. If you
have no questions, please review the answer to the questions asked by others:
Question: How can i merge cells in a row without altering the other rows in a table. Answer: If your goal is to display specific row with merged cells, this can be done by specific queries as described in section 8. It is, however, not possible to permanently modify table to have different cells in different rows. In databases, each row in a table always have the same cells (fields). This question was asked on 9/7/2009 7:47:00 PM and answered on 9/8/2009 10:23:03 AM. Question: For question #2 am I using the data tables from the flat databases example, relational database example, or both? Also, am I telling you how I find out the answer to your question in my head or how I would type the code into Access to get my answer? Answer: You should use the data tables from the relational database. In this lecture you do not need to know how to write queries in Access yet, so it is sufficient to describe steps. This question was asked on 8/25/2008 8:53:06 AM and answered on 8/26/2008 10:42:34 PM. Question: Why can relational databases allow for missing data that is logically not possible? Answer: Because they do not force the collection of such data. Since data are collected in separate tables, only relevant data is collected and irrelevant and inappropriate data is not collected at all. In contrast, in flat files inapprorpriate data needs to be collected or marked as not available. This question was asked on 2/12/2006 5:29:14 PM and answered on 2/13/2006 4:30:31 PM. Question: what is an index adn a cluster? Answer: I do not see where we have used the word cluster, so I am not sure the context on which you are asking this question. Index is used often in the first lecture and it is not used to designate any specific technical meaning. It is intended to mean a pointer to content of another web page. This question was asked on 9/26/2005 10:31:59 AM and answered on 9/26/2005 8:27:52 PM. Question: ARe there currently any data-less information systems being utilized in any industry sector? If so, how long was the implementation phase? Answer: I am not aware of such system in practice, though with growth of HIPAA regulations more of these systems should be expected. Of course, Google is a good example of dataless system. This question was asked on 9/6/2005 5:24:52 PM and answered on 9/6/2005 9:02:34 PM. Question: Regarding question #3 from the "What do you know" section: I feel that the lecture didn't elaborate on this topic very much, other than to say that the Tables need to be appropriately defined. Answer: There is a video that shows how tables are organized and we will get into this topic at much length later, when we discuss entity relationship diagrams This question was asked on 9/6/2005 4:10:40 PM and answered on 9/6/2005 9:01:33 PM. Question: Is it possible to have a database that is a combination of two (ie, say a relational AND hierarchical Database)? Answer: No. The definition of one excludes the other. For example a hirearchhial database does not allow other types of relationships and therefore cannot be a relational database or a flat database. This question was asked on 9/5/2005 2:41:12 PM and answered on 9/5/2005 11:58:06 PM. Question: Would the concept of a patient's electronic medical record within a system such as the Kaiser Permanente of California be an example of a data-less system? Answer: No. These medical records acutally have data in them. The best example of a dataless system is your browser. When you do a google search the data is assembled for you. You do not have access to the data before then. This question was asked on 2/10/2005 9:28:12 AM and answered on 2/10/2005 2:54:46 PM. Suggested Changes
Add
your own suggestions or read below suggestions made by others regarding
how to improve this session: Comment: The lecture was very clear and understandable. Left on 2/12/2006 5:23:40 PM Comment: The lecture material and explanation of Table creation was clear. Practice in creating the tables was very helpful in learning how to create the tables. Left on 9/1/2006 10:15:27 PM Comment: Lesson was well organized and easy to understand. It would be easier to understand the issue of data redundancy for flat files vs relational databases if the examples used included say two sets of grades for one student. Left on 9/7/2006 11:13:14 AM Comment: The lecture was easy and informative. Left on 8/29/2006 5:33:36 PM Comment: The overview was understandable. Left on 2/13/2005 11:18:09 AM Comment: Presentation of topics is well organized, easy to understand, and without fluff. Left on 9/5/2007 2:53:16 PM Comment: test Left on 1/15/2005 1:36:51 PM Comment: So far, I like the class and the lecture was clear and easy to follow. Left on 8/26/2008 9:07:59 AM Comment: Enjoyed the lecture and the background material. Looking forward to improving my knowledge about databases through the course. Left on 9/2/2008 8:44:37 AM Comment: This chapter is well organized as it fulfills its objectvive of introducing the concept of databases in general. Left on 9/3/2008 8:58:37 PM More
This page is part of the course on Healthcare Databases, the lecture on Evolution of Databases. It was last edited on Saturday August 30, 2008 by Farrokh Alemi, Ph.D. © Copyright protected. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||