Healthcare Databases
 

Integrity and Security of Databases

Healthcare databases require data integrity and security. In fact, in healthcare data describe patients and needs to be protected more rigorously than in other diagnoses. Incorrect information presented to clinicians may lead to misdiagnosis, incorrect treatment, and eventually negative outcomes, including patient's death. Security of health data is also required by law, including HIPPA and other regulations.

This lecture describes two interrelated issues in databases: integrity of data and security. It also presents mechanisms built into many SQL-based database management systems that help maintain integrity and security of data.

Both integrity of data and security of data can (and should) be approached in different ways. For example, an EMR system built on top of a database should check for correctness of data, users' authorization, and other issues. This can be done independently from the mechanisms built into database systems.


Database Integrity

When talking about database integrity, one of the most important concept is one of a constraint. A constraint ensures that a database has specific properties. For, example, a constraint may not allow to enter date of visit in the past. Another constraint may prevent creating an order for non-existing patient. In general, constraints check if data physically makes sense, and are not intended to act as clinical decision support systems that provide alerts, warnings, etc.

A database is said to be in a consistent state if all constraints are satisfied (no constraints are violated). Obviously, a consistent database is one in consistent state. The following examples illustrate typical problems that violate database consistency. In many cases these can be detected by common sense when looking at the data, but can also be automatically prevented.

Some of the problems with database integrity can be avoided using normalization of databases as described in an earlier lecture.

Example 1

Consider database with two tables: patient and visit depicted below.

Patient Table:

Visit Table:

One can immediately detect several problems with the two tables:

  • There is no date for visit 3.
  • Visit 2 is for a patient that does not exist in the database
  • Assuming that the database is looked at in 2011, visit number 4 is scheduled several years before the patient is born.

As the above example shows, some problems can be easily detected and others require more checks.

Example 2

Here, let's assume that the database is viewed in 2010. Consider the following patient table.

In the above table date of birth of patient 3 is missing. Also, the age of patient 2 is inconsistent with his/her date of birth.

Example 3

In this last example, the database consists of two tables -- order and order item. It can be easily checked that for some items, total values of order are inconsistent with individual values of items.

Reasons for the Lost of Integrity

Database integrity can be lost for multiple reasons. Most often, the integrity of database is lost when data are entered or imported incorrectly. These types of problems are usually easy to prevent by adding additional checks to the database system. Other reasons for the loss if data integrity are:

  • Computer malfunction, including hardware, software, and network errors. These types of errors are typically impossible to prevent, although redundant storage, computation, and network resources can be used.
  • Computation failure, including user errors, overflow and logical errors, division by zero, etc. These types of error occur because of incorrect data entered (i.e. division by zero), problems with software programming, etc. The majority of computation errors can be prevented. Additional checks before computations within the software built on top of database systems, can prevent majority of them. Also, database constraints can prevent entering incorrect values, i.e. zero or negative values.
  • Concurrency control failure, occur when multiple users access and modify the same resources at the same time. These types of errors are much harder to discover and maintain than computation errors. This is because concurrency errors occur only in specific conditions in which multiple users access the same resources at the same time.

Recovery from Failure

Recovery form failure may be long, labor intensive,a nd complicated process. Particularly when data are lost due to a catastrophic failure. When data are physically lost, they need to be recovered from backup. Very often during the recovery process, when the loss is extensive the database system is not operational.

For non-catastrophic failure, it may be possible to:

  • Reverse/fix only changes that made database inconsistent
  • Redo operations using log (very slow)

These sometimes require to:

  • Check what is the correct state, checkpoints
  • Add redundancy to database

Most database systems, including PostgreSQL, include mechanisms for preventing inconsistency. Two main mechanisms are:

  • Constraints, that check if data inserted into the database system follows pre-defined checks. One type of constraint that was covered earlier in this course is relationships. It requires that values of foreign keys to be existing vales of primary keys following the relationships.
  • Transactions, that allow "reversing" operations on database.

These mechanisms will be discussed in the following sections.

Maintaining Integrity of Databases Using SQL

Structured query language provides mechanisms for enforcinf integrity of data.

Constraints

Constraints provide the simplest method for checking integrity of data. Here are the most common types of constraints:

  • NOT NULL, a field must contain a value
  • UNIQUE, two or more records cannot have the same values for specified fields
  • PRIMARY KEY, combination of the above
  • FOREIGN KEY, value of a field must match value in other table
  • CHECK, enter specified conditions for validity of values, e.g. discounted price < price, Age > 0, …
  • DOMAIN, define a set of possible values, use when there are several fields with the same domain

Constraints are created in SQL when creating pr modifying tables using CREATE and ALTER commands. For example, the following command creates table patient with several constraints.

CREATE TABLE patient
( RecordNumber char(10) PRIMARY KEY,
FirstName char(30).
LastName char(50) NOT NULL,
DateOfBirth date CHECK (DateOfBirth > '1/1/1900'
);

SOme additional infomration on how to use constraints in SQL are available in the W3Schools.com. Unfortunately, in many complicated conditions, it is not possible to create constraints.

Database management systems often provide graphical interfaces to allow for specifying constraints. For example, MS Access includes "Validation Rule," "Default Value," "Allow Zero Length" and other options in definition of columns. Check Access help for more details.

Transactions

Transactions provide mechanism to execute a set of operation in a block and store results only if all of these operations were successful. This idea is simewhat similar to "undo" operation available in many programs. Each transaction needs to be started with BEGIN command, then a set of operations is performed, and the transaction is ended with COMMIT, ROLLBACK or END commands. The END command automatically

The following example illustrates the use of transactions in the manual process of updating tables in PostgreSQL. The left side shows SQL commands and the right side shows comments.

claims=# begin;
BEGIN
claims=# alter table patient alter column pat_sex char(10);
ERROR:  syntax error at or near "char"
LINE 1: alter table patient alter column pat_sex char(10);
                                                 ^
claims=# end;
ROLLBACK

Start the transaction.
Confirmation of the transaction .
Make changes to the patient table.
Problem with command. The syntax is wrong.

End the transaction.
Transaction reversed, because error was found.

Here is another example in which the transaction is aborted manually, bacause of incorrect condition in query.

claims=# begin;
BEGIN
claims=# alter table patient alter column pat_sex type char(10);
ALTER TABLE
claims=# update patient set pat_sex = 'Female';
UPDATE 346
claims=# select * from patient;
ptid |  pat_sex   |    doe
------+------------+------------
    1 | Female     | 1996-09-06
    2 | Female     | 1996-04-26
    3 | Female     | 1996-05-31
    4 | Female     | 1996-07-12
    5 | Female     | 1996-04-11
    6 | Female     | 1996-04-11
    7 | Female     | 1996-04-11
    8 | Female     | 1996-05-03
    9 | Female     | 1996-05-14
   10 | Female     | 1996-08-23
   11 | Female     | 1996-08-09
   12 | Female     | 1996-05-31
   13 | Female     | 1996-04-11
   14 | Female     | 1996-07-08
   15 | Female     | 1996-06-24
   16 | Female     | 1996-04-11
….

claims=# rollback;
ROLLBACK
claims=# select * from patient;
ptid | pat_sex |    doe
------+---------+------------
    1 | F       | 1996-09-06
    2 | F       | 1996-04-26
    3 | M       | 1996-05-31
    4 | F       | 1996-07-12
    5 | F       | 1996-04-11
    6 | F       | 1996-04-11
    7 | F       | 1996-04-11
    8 | M       | 1996-05-03
    9 | F       | 1996-05-14
   10 | F       | 1996-08-23
   11 | F       | 1996-08-09
   12 | M       | 1996-05-31
   13 | F       | 1996-04-11

Start the transaction.

Change definition of the table.

Update table with new infomrmation.

Check if result looks good.

Here is the problem, we accidentally made all patients female. the condition in the update query was missing.

 

 

 

 

 

 

 

 

Rollback the transaction and restore database to the original state.

 

Now all problems are undone.

Similarly, the transaction with correct answer can be commited by the user. Here is the example script.

claims=# begin;
BEGIN
claims=# alter table patient alter column pat_sex type char(10);
ALTER TABLE
claims=# update patient set pat_sex = 'Female' where pat_sex like '%F%';
UPDATE 220
claims=# select * from patient order by ptid;
ptid |  pat_sex   |    doe
------+------------+------------
    1 | Female     | 1996-09-06
    2 | Female     | 1996-04-26
    3 | M          | 1996-05-31
    4 | Female     | 1996-07-12
    5 | Female     | 1996-04-11
    6 | Female     | 1996-04-11
    7 | Female     | 1996-04-11
    8 | M          | 1996-05-03
    9 | Female     | 1996-05-14
   10 | Female     | 1996-08-23
   11 | Female     | 1996-08-09
   12 | M          | 1996-05-31
   13 | Female     | 1996-04-11
   14 | Female     | 1996-07-08
   15 | M          | 1996-06-24
   16 | Female     | 1996-04-11
   17 | Female     | 1996-05-14
   18 | M          | 1996-08-16
   19 | Female     | 1996-09-13
   20 | Female     | 1996-04-26
   21 | Female     | 1996-04-26
   22 | Female     | 1996-05-17
claims=# update patient set pat_sex = 'Male' where pat_sex not like '%F%';
UPDATE 126
claims=# select * from patient order by ptid;
ptid |  pat_sex   |    doe
------+------------+------------
    1 | Female     | 1996-09-06
    2 | Female     | 1996-04-26
    3 | Male       | 1996-05-31
    4 | Female     | 1996-07-12
    5 | Female     | 1996-04-11
    6 | Female     | 1996-04-11
    7 | Female     | 1996-04-11
    8 | Male       | 1996-05-03
    9 | Female     | 1996-05-14
   10 | Female     | 1996-08-23
   11 | Female     | 1996-08-09
   12 | Male       | 1996-05-31
   13 | Female     | 1996-04-11
   14 | Female     | 1996-07-08
   15 | Male       | 1996-06-24
   16 | Female     | 1996-04-11
   17 | Female     | 1996-05-14
   18 | Male       | 1996-08-16
   19 | Female     | 1996-09-13
   20 | Female     | 1996-04-26
   21 | Female     | 1996-04-26
   22 | Female     | 1996-05-17
claims=# commit;
COMMIT
claims=#

Start the transaction.

Update female patients.

 

 

Check if update correct.

 

 

 

 

 

 

 

 

 

 

 

Update male patients.

 

 

Check if correct.

 

 

 

 

 

 

 

 

 

 

Commit the transaction.
All results are stored in the databse.

Transactions are particularly useful when running long scripts with multiple operations, all of which need to be completed to guarantee consistency of the database. Transactions can also be used to prevent some concurrency related problems by checking if data were updated correctly.


Security of Databases

When considering security of databases (and in general all information systems), one needs to keep in mind two important things:

  • Users should be able to access or modify data they are allowed to. Users need to work with the database system, and the security policies, implementation and software should not prevent them from doing so.
  • Users should not be able to access or modify data they are not allowed to. Users should not be able to access data that is beyond their security level, or is not needed for the functions they perform. Here again, policies, implementations and software should preventing them from doing so. This also includes preventing access by unauthorized individuals/systems.

The privacy and security issues are very extensive and go well beyond the scope of this course. In fact, specialized courses are offered that deal with these issues. This lecture is limited to basic commands that show how basic security can be managed in SQL-based database systems.

It is important to distinguish security policy from security mechanism. Security policy simply defines who has access to what data, based on regulations, performed functions, etc. Security policies are not always enforced. Often, systems do not prevent users from accessing data, but rather detect "suspicious" uses and report them to appropriate authorities. For example, most EMR systems allow all clinicians to access all patients' data. However, most systems have implemented mechanisms that check for suspicious uses, like access to data of family members, access to data of patients of other physicians, etc.

On the other hand, security mechanisms are used to enforce security policies. Security mechanisms can be:

  • Mandatory Access Control – based on system-wide policies that cannot be changed by individual users
  • Discretionary access control – creator of an object gets full rights to it, and can assign rights to others

The following section shows basic commands for the discretionary access control in PostgreSQL.

Setting Database Security Using SQL

Security mechanisms in SQL-based are based on users and their privileges. Each object in the database (i.e. table, view) has its owner that can grant or revoke permissions to other users. Database administrators are also allowed to create users and groups of users in the database using ADD USER and ADD GROUP comamnds.

Grant and Revoke Permmissions

The main commad in SQL that lets other users access resources is GRANT. The syntax of teh colums is as follows:

GRANT <privileges> ON <object> TO <users> [WITH <options>]

For example, the command

GRANT select ON patient to jwojt;

allows user jwojt to execute select command on the table patient. Similarly, teh command

GRANT all ON patient to smith;

allows user smith to execute all SQLcommands on the table patient.

To remove permissions, the REVOKE command is used:

REVOKE <permission> ON <object> FROM <users>

Views in Databses

Permissions to access data in SQL are given based on entire tables. It is not possible to grant access to specific fields or records in a table. In order to further restrict access to data, views can ve used. A view is a "virtual table" resulting from executing a query. Access can then be granted to a view that shows only data to which an user should have access. In order to create views, the CREATE VIEW command is used:

CREATE VIEW <name> AS <query>

For example:

CREATE VIEW patients AS SELECT * PERSON WHERE p_type = ‘Patient’;

creates a view "patients" which includes only patient records from the person table (and does not include employees etc.). Then, access can be granted to the view:

GRANT select on patients to smith;

 

What Do You Know?

Advanced learners like you, often need different ways of understanding a topic. Reading is just one way of understanding. Another way is through writing. When you write you not only recall what you have written but also may need to make inferences about what you have read. Please complete the following assessment:

  1. Add constraints to the database for your semester-long project. To do so, (1) update list of fields in the requirements section of your project, and (2) update fields in your database. You can use SQL commands, or graphical interface in Access.

  2. In your semester long project, describe security policies. Do do so, describe types of users (i.e. clinicians, administrative staff, etc.) and write data elements (tables, fields) to which these users should have access. Be consistent with your use cases and scenarios!
  3. (optional) Implement security pllicies in your database. Note that this can't be done in Access, so it is possible only if your project uses otehr type of database.

Submit answers by email to your instructor. 

Presentations

Following resources are available to assist you:

  1. See slides for thi slecture.

More
  1. See W2Schools website for more SQL.

This page is part of the course on Healthcare Databases, the lecture on Integrity and Security of Databases.  It was last edited on by Janusz Wojtusiak, PhD  © Copyright protected.