Building and Using A Clinical Data Repository

Dean F. Sittig, John Pappas, Patricia Rubalcaba

A clinical data repository (CDR) is a real-time database that consolidates data from a variety of clinical sources to present a unified view of a single patient. It is optimized to allow clinicians to retrieve data for a single patient rather than to identify a population of patients with common characteristics or to facilitate the management of a specific clinical department. Typical data types which are often found within a CDR include: clinical laboratory test results, patient demographics, pharmacy information, radiology reports and images, pathology reports, hospital admission/discharge/transfer dates, ICD-9 codes, discharge summaries, and progress notes.

Building such a complex database is a major undertaking. Its basic component is the physical database itself. Key issues which must be addressed here include: storage capacity - is it big enough to handle all the required data now and in the foreseeable future; computing power – can it process all incoming data at the same time as hundreds, or even thousands, of simultaneous users are performing searches; reliability - what happens when a piece of the computing infrastructure breaks...is the data still available; accessibility of the data - how well is the structure of the data defined.

In addition to creating the physical database, several supplemental components often become major developmental efforts. The most important of these is the Master Patient Index (MPI) which contains a set of demographic data along with a number that uniquely identifies each patient. All data that is stored in the CDR has this patient identifier as its primary key. Next in importance is an electronic interface between all the ancillary data sources and the CDR. In addition to the network connection between all the systems, one must also have a semantic mapping, which allows similar data types from disparate systems to be grouped. Finally the CDR must have a user interface which allows clinicians to review a patient's clinical data in a variety of ways quickly and easily.

We are in the process of building and implementing a CDR throughout Partners HealthCare System. The initial deployment is at the Massachusetts General Hospital. There are currently 2500 authorized users who are able to access over 120 gigabytes of clinical data. These data consist of over 203 million laboratory test results and 5 million radiology reports. A typical query for a single patient's last known laboratory test results requires less than 2 seconds to be answered.

The CDR forms the basis of a clinical information system and contains most of the data required by clinicians to care for patients. If well designed and carefully developed, a CDR can enable health care organizations to meet their goals of improving the quality and reducing the cost of health care.

© 1999 Dean F. Sittig

dfs 2/5/99