Import, export data tables across platforms
Structure queries to describe complex relationships between fields in healthcare databases.
Databases contain a great deal of information in separate tables. An analyst is often called upon to integrate these separate tables, to find answers to specific questions. This process is called querying a database. In this lecture, you learn how to query a database.
Standard Query Language (SQL) is the list of commands that can be used to query a database. An example is provided in Figure 1. This list of commands is widely available and many different databases use the same SQL commands. In order to help people query a database without knowing SQL, some Databases, such as Microsoft Access, have created graphical interfaces. This lecture is about Microsoft Access graphical interface for querying a database. A later section teaches the use of SQL.
This lecture corresponds to pages 192 through 226 in assigned required book titled “Teaching yourself Visually Access 2003.”
During this presentation, you learn by doing. Each concept is demonstrated by showing you how it is done in Access database. In this sense, this session is more of a laboratory for you than an actual didactic lecture. Do not expect to master this section without actually repeating everything demonstrated here on your own inside a database. Keep in mind that practice makes perfect.
|SELECT Claims.PatientID, Last(ICD9.ICDDescription) AS LastOfICDDescription FROM Claims INNER JOIN ICD9 ON Claims.DiagnosisCode = ICD9.ICD9Codes WHERE (((ICD9.ICDDescription) Like "*diabete*")) GROUP BY Claims.PatientID ORDER BY Claims.PatientID;|
|Figure 1: Example of Standard Query Language|
Figure 2 shows the Microsoft Access’s view for designing a query. This query window replaces SQL commands. You find the query design window by opening a database and selecting the option to design a query.
Figure 2: A Graphical View for Designing a Query
The top part is called the toolbar. There are many database options here. Two rows of options are shown in Figure 2. One row in text and another in graphical icons. The first row is focused on database options, like opening a database. The second row shows the query toolbar, displayed here in graphical icons. On the left lower corner you see a button that allows you to see various views of a query. For example, you can see an SQL view of the query. You can go back and forth between the graphical and SQL view of the query by clicking on the View button in lower left toolbar. Next to the View icon, you see the icon for saving the changes you have made to query. To the right of save icon, you see the option to search as well as a series of typical options available in any software such as print, copy, cut, paste, undo and redo. To the right of redo you see the option of setting type of query and next to it the exclamation mark is for running a query. Once you have set up your query, you can use this option to run the query and get your results. Next to run icon, you see the button for showing tables on which the query is run. Obviously you must specify which of the many tables in your database are involved in the query. To the right, you see the Total button shown as a summation sign. This sets the option for summarizing several rows of data into one row, for example, when you sum the data in several rows to show as one value. To the right of the total button, you see a command to restrict how much of the output of the query should be displayed. For example, if you want only the first few rows of the query results displayed, you can do so using this option. Still further to the right is the option query properties, for example, here you can specify that values used should be unique and non-duplicating. Still to the right of this you see a wand. This allows you to use internal Access functions to calculate a new field from existing fields in the tables. To the right of the wand you see the option to return to the database view, so you can go back and forth between query window and database window. To the right of this feature you see the new object icon, where you can automatically generate reports from a query that you have created.
Right below the toolbars in Figure 2, you see an area where the tables involved in the query and their relationships can be displayed. You might use this area to specify several tables and set the keys that these tables share with each other. Obviously, only fields within the tables you set inside the query are available for display in the result section of a query. If your database has many databases, this is one way of focusing on a small subset of tables.
At the bottom of Figure 2, you can see the space for entering the various fields to be displayed in the result of the query. Some of these fields may come from different existing tables. Others may be calculated. For each field listed we need to specify where it comes from. If it comes from a table, in the second row we specify the table. We can also specify whether the field should be used to sort the data in the query display. This is done in the third row. In the fourth row we have the option of showing the field or hiding it from the display. The last two rows are for criteria that can be used for filtering the field. When a criterion is set, the query will only display the records that meet the criterion.
Figure 3: Including Relevant Tables in the Query
The first step in constructing a query is to include all relevant tables (see Figure 3). If a field is used in a query, a table containing the field must be included in the query. This is done through show table icon, which opens a list of queries and tables that one can include. If more than one table is included, one must also specify the relationship between the tables or queries. Please note that a query can have as input another query so one can build multi-step filters of the database tables.
The relationship between the tables is specified by dragging one field from one table on top of another field in another table. Access shows the relationship between two tables by creating a line from a field in one table to a field in another table. When a relationships is created between two tables, it is like the tables are joined together to make a larger table. Records in this larger table are arranged so that the two fields dragged on each other have the same values. For example, one could use patient’s ID in the claims table to join it to the primary key of the patient’s table. Or one could use the diagnosis code in the claims table to a table describing the code.
Figure 4: Including Appropriate Fields in Query Design
In a query you need to specify what fields should be included in the resulting data. This is done by entering a field name in the field row in the lower part of query design window (See Figure 4). Fields can be included from all joined tables. In this fashion, a select query makes information in various tables available in one setting. Fields can be dragged from the shown tables to the field cell or one can click on the right of field cell and select a field.
Figure 5: Computing a New Field
A new field can be calculated from one or more fields in the joined tables or queries. The easiest way for doing so is to put the cursor into the cell for fields, then click on the expression builder icon, the wand. A new window opens called Expression Builder window (See Figure 5), This window allows you to type the name of the new field followed by colon and followed by the expression to be used to calculate the field. The new name should not contain space or if it does have space it should be put inside brackets. The expression could be composed of built in functions available in Access. Under functions, you will see many examples of existing built in function. The most commonly used function is iif, which allows one to assign a value if a test is met and otherwise assign a different value. There are also numerous functions for dates, the most used one is for calculating difference between two dates.
Here is an example of an expression for computing a new field called Diagnosis:
Diagnosis: iif(ICD9!Description like “*diabetes*”, “Diabetes”, “Other)
In English this expression says the following: If the field Description in the table ICD9 contains the word diabetes assign to the field diagnosis the value “Diabetes” otherwise assign to the field diagnosis the value “Other.”
Here is another expression:
This expression says: set the field DaysTillNow to be equal to the number of days of difference between today and the field called Date in the table called Claims. The expression Date() is a built in function of Microsoft Access that provides today's date.
There are six different types of queries allowed in Access:
The option to change a query type is available on the query toolbar and can be reached by right clicking on the show tables area.
A select query filters the data by using one or more criteria. In the select query, the records or rows in a table are reduced to the rows that meet a set of criteria. For example, we might have a table of claims and want to restrict it to patients who had a claim of influenza. Select query restricts the content by criteria that a field should meet before it is allowed to be included in the display.
Figure 6: A select query displays records that match the pre-specified criteria
Microsoft Access also allows several other methods for selecting a subset of a table. One way is to restrict the cases by matching a field in another table. This is typically done when setting the relationship among tables. First, you need to show the tables that are used in the query using the show table icon. Once the tables have been added to the show table area of the query, then you can set the relationship among the table by setting different types of joins. There are three different joins.
One to one join: The typical join is when the fields in both tables have to be exactly the same before the content of the tables are joined together.
One to many join: The other two joins allow the field in one table to be always included and the field from the other table included only when it matches. When the two fields do not match, the record is still kept but there will be a null value in place of the missing match.
No join: The last type of join occurs when two tables are present and not joined together. In this circumstance, every record in one table is coupled with every record in another table. For example if one table has two records A and B, and another table has records 1, 2, 3, and 4. Then the consequence of having the two tables in a query but without a join is 8 records of all possible combinations of the two tables: A1, A2, A3 A4, B1, B2, B3, and B4.
For example, one might have a table claims that contains diagnosis codes. The meaning of these diagnostic codes might be available in a separate table called diagnosis codes. A join can select the text for the diagnostic code and combine it with the claim data. A one-to-one join will lead to listing of all claims in which the diagnostic code has a corresponding text in diagnosis table. In a one-to-one join, if the description of the diagnosis code is missing in the diagnosis code table, then all corresponding claims will not show. In contrast, in a one to many join, we can display all claims and their corresponding text for diagnosis; where the diagnosis text is missing in the diagnosis code table, a missing value is entered.
You can specify criteria for a field by entering an expression in the criteria cell for that field. An expression calculates a single value from a combination of fields using mathematical operations, word operations, or logical tests. For example, the following is an expression that can be entered in the criteria row under the field called ICDdescription:
In this expression, the field ICDdescription is restricted by the criterion that it should contain the word diabetes. "Like" is one of the built in functions available through Access. The like function is followed by quotes that contain star or question mark. A star means any number of letters and question mark means 1 additional letter. A star before and after a word means that any word can follow or precede the target word. In here, a sentence such as "diabetes mellitus" is matched because it contains the word "diabete" and is followed by other text. The sentence "Severe Diabetes" is also matched because text precedes diabetes. None of these two sentences are exact match to the word "diabete" as they contain other text. If we had asked for an exact match there would have been zero records selected.
To set up a criterion, you put your cursor inside the cell and click on the expression builder icon, the wand. You can now use any combination of field names from tables and queries that have been included in the query. There are a number of built in functions available. These include the function iif for testing an expression and various functions for dates.
Here are some examples: The expression >120 means greater than 120. The expression > 6/12/05 means passed June 12th 2005. The expression Not “*ism” means not any text ending with the word suffix ism. The expression <Date() means before today’s date. The expression Is Null means that there are no data in the field. The expression Not Diabetes means not matching exactly to the word diabetes, with first letter in of the word diabetes in capital. The expression Like "dia*" matches any text that starts with dia, such as diabetes, dialog, diagram and so on. The expression Between A and D matches any text starting with A, B and C.
Figure 7: Summarizing Several Records
You can summarizes several records into one value by using built in functions such as:
Group by, where all records having the same value will be grouped into one.
Average, where in all records in the group are averaged
Standard deviation, where in the standard deviation of all records in the same group are calculated.
Count, where all values in the group are counted and the count is entered.
Max and min functions, where the maximum and minimum value for the field in the group are selected.
Last or first, where the last or first value in the field in the group are selected.
The summary row, called Total, is added to a query by selecting the icon that looks like a summation sign. When you do so, a row is added to the fields area called “total”. In this row you have to select which of the many features you want to use. For example, by entering "Group By" under the field for Patient ID, the records are grouped by patient ID. If under the field ICDDiagnosis you also enter "last" in the total row, then the last diagnosis listed in the group is selected. In this fashion you can find the last diagnosis for each patient.
If you need to select a subset of the query, you can use the function "Where" in the total row. You can set "Where" in the total row under ICDDescription field. In a "Where" function, only grouped records matching the criterion under the where field are included in the query. Often when the "Where" function is used, the field is entered twice in the query because the "Where" function cannot be displayed in the query. The second entry of the field is used to display the field. Please note that the "Where" function is executed before the "Group by" function is executed. Thus if a patient has three diagnosis, two of which are diabetes related and one of which is not, the "Group by" function will select 3 records without the "Where" function and select 2 records with the "Where" function.
Once you have organized your query you can select the exclamation icon to run the query. You can also run a query by selecting from the view option at the lower left corner of the toolbar the tabular view of the query.
Besides the select query there are also other types of queries. Crosstab query creates a table view of the two fields in the query. Action queries make a table, delete selected records, or append selected records to a table.
Figure 8: Effect of Crosstab query
A Crosstab query takes the data from different columns of the table and organizes a summary of them into a new table. In the example in Figure 8, the billing data is broken into two groups, less than one thousand and more than one thousand. The diagnosis are broken into diabetes and other diseases, The new table counts the number of patients that fall into every cell.
A Crosstab query allows us to classify records based on content of two fields. To start with, each field should be summarized using group by command. You can create a Crosstab by deciding which field will be the row heading, which field you want to be the column heading and which field do you want to summarize in the cells inside the Crosstab table you are producing. Again keep in mind that you should use a function to summarize the content of records picked up for the cell values. For example, you can select as row headings the gender of the patients and as column headings their age calculated in decades. For the cell content you may select count of records. Thus, you will have a count of patients in each cell. Keep in mind that only one field can be used for column heading while several fields can be used for row headings.
You can change the display of query results in several ways. You can sort the display by using the sort row in the lower part of the query design window. You can also select a field by clicking on the bar right above it and drag it to a different position in order to change the order of fields being displayed. Finally you can rename a field by typing before its name a new name followed by colon. You can change the number of decimals displayed in a calculated field by using the Round function in the calculation. But you cannot change how numbers or dates are displayed in the query without change their format in the original table.
Until now we have been discussing the ideas behind the graphical interface for design of query. No one can learn much by reading these concepts. You can only learn by seeing examples of queries and then practicing on your own. We have put together several videos to help you see how this is done. The following topics are covered in these videos:
How to import data into tables
How to filter content of a table by using select query. Includes:
How to use multiple criteria to find records
How to display highest, lowest, and perform calculations
How to summarize data
How to find unmatched records
SQL equivalent statements for queries
Before you can start learning about queries, you need to create a database and fill it with data. In order to do so, please download the following four files and important them into the corresponding Tables:
When you download the files, keep them somewhere you can easily remember as you will need access to the above four files to set up the database used in this lecture.
During conversion of these data into corresponding tables, please change the field names to more descriptive names:
|Table name||Name of field before||New field name|
During importing a database, you need to set primary keys for each table. Any non-duplicating field can be a primary key. If there are no fields that can be set as primary key, you can allow access to set an auto-number as primary key.
See the enclosed video about how to import data.
In some situations, you have to compare a field value against the value of the same field in different records. Oneway of doing so is to copy the table twice and compare the values of the field to the copy of the field. A situation like this arises when we want to find duplicates in a field. First a copy of the table is created and cases where the IDs do not match but the fields do are identified. These are cases with duplicates. See a video on how this is done.
There are three several presentations for this lecture:
Listen to the lecture on graphical interface for querying databases
See how to import data
See an example for identifying diabetic patients
See an example for calculating average cost per diagnosis
See an example of Crosstab query in identifying if women are more likely to have diabetes
See an example of using date functions (number of diagnoses by month)
See an example of how duplicate cases are identified and eliminated.
See the slides for the lecture
Narrated slides and videos require Flash.
Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through doing and practicing the concepts learned in this section. The following assessment is designed to get you to think more about the concepts taught in this session.
Do Until Correct Policy. If you do not do well in these assignments, you can resubmit corrected version of the assignment for a grade of B. You can do so as often as you like.
Ask Us Policy. You can complete these assignments in class, when the teacher is available.
No Time Pressure. You can also request more time for completing these assignments.
Example videos. Note that exmaple videos may not precisely correspond to your instrutions. They may show solutions to slightly different proglems, or not be entirely correct. Their goal is to show you a general idea how to solve the problem, not to give the solution to be directly copied. You need to understand what you do to do it right! Think!
Identify patients that have diabetes in the above database. See it done.
Calculate the average cost of each diagnosis sorted from most expensive to least expensive. Exclude all bills with negative or 0 values. See it done.
Show if men are more likely to have diabetes than women. See it done.
Calculate which month is most likely to have a diagnosis reported. See it done.
Please bring your work to class and show it to your instructor during class time. Alternatively, email screen captures to your instructor. See an example done by Rhonda Richardson.Ask a question and we will answer it within the next 48 hours. If you have no questions, please review the answer to the questions asked by others:
Question: I have tried at least eight times to do Question 2, finding the clients with Diabetes, without success. For some reason after I have downloaded the data, set up the tables and the query following Dr. Alemi's video demonstrations as closely as I can tell, when I run the query all I get are the two column headings with nothing in them. Can you give me some insight into what I could possily be missing? Thank you. I can't proceed with the rest of the steps until I get this resolved. Answer: Please send your work to date to Vikas Arya and he will review it and let you know what you might be missing. Let him know that you left this question and received this answer. In general, if you find yourself unable to proceed, please send your work to the instructor for additional guidance. This question was asked on 10/30/2006 12:28:58 PM and answered on 11/1/2006 8:15:32 AM.
Question: This lecture corresponds to pages 192 through 226 in assigned required book titled “Teaching yourself Visually Access 2003.” However, the book is not among the twon only listed as required for the course, as currently listed Fall 2006 syllabus. Answer: Good point, we will review the required books in the course This question was asked on 10/26/2006 9:19:21 AM and answered on 11/1/2006 8:12:16 AM.
Question: In question 4, the results show that you have 6591 F patient with "Other" diagnoses, how did you get this number since there is a total of only 360 records of patients in the database? I might be missing something or there is something wrong with the query. Answer: This is a very good point. The analysis is reporting the number of claims made by female patient and not the number of female patients. Good point. This question was asked on 10/12/2006 9:16:50 PM and answered on 10/13/2006 8:11:53 AM.
Question: In the expression builder, can you please provide an explanation or description of what function the "+" plays at the outset of counting the incidence of diabetes by gender? In the lecture slides the 'addition' of the "+" symbol takes place without further comment. I missed including it in my first attemtp to run the querey. It would help a great deal to understand why it is necessary for inclusion. Answer: The + symbol indicates addition. In an expression all functions must precede with an operand, + is one such operand. In many cases adding the + sign to a number that is already positive makes no difference but since an operand is needed, the + sign is added This question was asked on 10/12/2006 12:47:14 PM and answered on 10/13/2006 8:05:40 AM.
Question: In question #5 at the end I get 1057 cases instead of 1055 and the original has 2112 instead of what the video show 1063, does that make a difference? Thanks Answer: Yes, the video is showing the process but the actual data is slightly different so your results maybe slightly different This question was asked on 10/30/2005 1:31:22 AM and answered on 10/30/2005 3:18:33 PM.
Question: In doing question #6, when I imported the table, it detected the error and created a separate table and named it "RevisedICD9_Import Error." Is this truely a problem with the way I imported it or is Access actually picking up the intended error and correcting it by placing it in a separate table? Answer: This is because one of the fields in Excel is not being read as planned, for example the field is supposed to be numbers and the computer detects text. Access deletes the record with the error and puts it in the table with RevisedICD9_ImportError. If this is not extensive (not more than one or two records), it should not affect your findings. This question was asked on 10/29/2005 9:08:47 PM and answered on 10/30/2005 3:17:52 PM.
Question: In#1 after importing the file and following the steps when I got to changing the field names I can't get to the next Column the header doesn't change it stays PatientID to the rest, I restarted the PC but nothing happens, can you help? Thanks Answer: Import the data without changing the names and then change the names after the table has been created in the table design mode. This question was asked on 10/27/2005 4:39:58 PM and answered on 10/27/2005 10:07:22 PM.
Question: When importing the data, I cannot change any other column names but the first one. WHat am I doing wrong-it worked for me in class?? Answer: Perhaps you should import the data into a table and then change the names of the fields through the design table options This question was asked on 10/26/2005 7:50:06 PM and answered on 10/26/2005 10:05:22 PM.
Question: For Question #3, I'm not seeing the "Round" option in the expression builder's built-in functions as depicted in the presentation. The closest one I see is "Rnd". Is there a way I can manually build the "Round" expression since it does not appear there? Answer: I beleive rnd has the same function This question was asked on 10/23/2005 4:16:56 PM and answered on 10/24/2005 8:00:34 PM.
Question: We have used the Data Flow Diagrams to aid the design of database. What's your opinion about this methodology. Do you agree with it or not? If you are then in what ways the Data Flow Diagrams will improve our design? If not then why? Answer: The real question is should the design of the database be based on current flow of data or business processes or should it anticipate future needs. I beleive future needs analysis is more appropriate and attract your attention to the following description for needs assessment: http://gunston.doit.gmu.edu/healthscience/740/frstrat.htm This question was asked on 5/1/2005 6:31:15 PM and answered on 5/2/2005 6:51:41 PM.
Question: I do not understand when do I need to compare a table to itself? Answer: When you want to examine patterns across records as opposed to fields. This question was asked on 4/8/2005 11:16:51 PM and answered on 4/9/2005 7:46:57 PM.
Question: In question 6 of Analze Data when I go to run it there is no data appearing in the field, what am I doing wrong? I have your tutorial running while I am doing it so that I can do a step-by-step. I am also running this using Access 2000, does that matter? Answer: I am not sure. I would need to see this. Can you show it to Vikas or me in class. This question was asked on 3/28/2005 11:13:20 AM and answered on 3/29/2005 10:56:47 AM.
Question: My Access at home does not have the option Round for question 3 Any Ideas??? Thanks Kathy Answer: Hmm. Which version of Access you have? It is unlikely that it does not have the function Round, but maybe so. Of course you can do the assignment without rounding the values. But are you sure it does not have it? This question was asked on 3/27/2005 12:18:00 PM and answered on 3/27/2005 9:09:20 PM.
Comment: I found this lecture to be very thorough. The examples that were shown assisted me with the analyze data assignment. This comment was left on 10/26/2006 9:28:31 PM.
Comment: Some of the demos went at a workable pace, but the later ones went much too quickly to get the finer points. I had to replay the videos numerous times to catch all of the steps. This comment was left on 10/30/2006 12:58:34 AM.
Comment: I would like to see more examples. It was some what easy to copy the queries we have learned in class, but I think we needed to come acroos more examples so we can have more sence of what kind of queries to run in different situations This comment was left on 5/4/2005 11:25:56 PM.
Comment: The examples were helpful, but Dr. Alemi goes too quickly for us slow access users This comment was left on 3/27/2005 12:17:06 PM.
Comment: The examples were helpful, but Dr. Alemi goes too quickly for us slow access users This comment was left on 3/27/2005 12:17:13 PM.
Comment: This lecture was very informative for me. It explained in detail all the steps required for queries. This comment was left on 3/30/2005 8:12:00 AM.
Comment: The lecture on graphical view of queries was extremely fast, therefore very difficult to follow - Rate as 1. The import data and different see it done videos were great - Rate as 5s. Huge concern though -- I spent several hours trying to figure out what I was doing wrong for question #6 only to realize that my first repeated numbers were the same as Dr. Alemi's example. There are far more than 7 or 8 repeated cases. This was very frustrating! This comment was left on 4/5/2006 10:57:32 PM.
Comment: Question: When saving "Relationships" is there a technique to "Save As." For example, in my databases to date, all relationships appear in one window which, I must access by scrolling through the list. I would prefer to save new Relationships in unique files...is this possible? How does one perform this task? This comment was left on 10/12/2006 8:16:11 AM.
Comment: The lecture would be improved with a brief overview to describe the rather abstract process for comparing a table to itself in order to eliminate duplicate cases. Then too, please make explicit, when the duplicates are eliminated why are the cases eliminated? Specifically, there is no dicussion of how one determines which cases to eliminiate. In the example shown, case "202" we note the cases has two differrent descriptions. The steps indicate how to eliminate duplicate cases (and thus definitions) from the table by virtue of it being a duplicate...not according to which one has the 'correct' description. This is a bit confusing. Can you explain the process by providing first an overview of what is occuring and second why the step is necessary? It seems intuitive that at least some of the duplicates might be the 'correct' description while others are indeed errors. Or else what am I missing? This comment was left on 11/14/2006 5:06:43 AM.
Comment: The examples were fairly thorough and the individual videos were helpful, however, it seemed like some of the assignment requests were in conflict with the information shown in the video. This comment was left on 11/18/2008 6:33:49 PM.