Commonwealth Informatics

By David Fram, Vice President Research, Co-Founder, Commonwealth Informatics

A CVW Analytics Case Study

Project Background

Sepsis is a significant public health problem and is a major cause of death in hospitals worldwide. The Centers for Disease Control and Prevention (CDC) define sepsis as “a complication caused by the body’s overwhelming and life-threatening response to infection, which can lead to tissue damage, organ failure and death.” CDC is working to better understand the incidence of sepsis in the population and looking for new ways to rapidly identify and treat the disease. This is challenging, however, since there is no definitive diagnostic test for sepsis. The estimated incidence and mortality rate from sepsis in the US population varies significantly based on the surveillance definition and data source utilized.  Multiple reports based on administrative claims data suggest that there has been a significant increase in recent years in sepsis incidence but it is not clear if this is a true reflection of what is happening in society versus an artifact of inaccuracies related to using claims data for surveillance.

Dr. Chanu Rhee, MD MPH, Assistant Professor of Population Medicine, and Dr. Michael Klompas, MD MPH, Professor of Population Medicine, both at Harvard Medical School and Harvard Pilgrim Health Care Institute, decided to assess whether it might be possible to improve sepsis surveillance by using electronic health record (EHR) data instead of administrative data.  Dr. Rhee et al compared the accuracy of sepsis surveillance using electronic health record data versus claims data at two academic hospitals. As part of the research, authors created EHR-based definitions of sepsis and septic shock using EHR-based clinical indicators. Results from this pilot study showed the EHR-based clinical surveillance definition of sepsis was a more reliable predictor of the disease incidence over time than administrative data.


The Challenge

A Complex, Multi-Factored Definition of Sepsis

Building upon their initial findings, Dr. Klompas and Dr. Rhee undertook a new project (CDC National Sepsis Burden Surveillance Project) to try to estimate the national burden of sepsis using EHR data. An important part of this project was to determine if the methodology used in the earlier research was generalizable and scalable to much larger and more diverse EHR datasets. Two of the data sources selected for the project included the Cerner Health Facts database which is a very large, multi-terabyte de-identified database that has EHR data for more than 22 million in-patient encounters at a diverse set of hospitals, as well as the Institute for Health Metrics (IHM) database which has EHR data for patients from community hospitals across the United States with 2.7 million in-hospital encounters.

The data specification created by Drs. Rhee and Klompas for the CDC National Sepsis Burden Surveillance Study describes the many different variables that need to be present in electronic health records and lab data in order to identify patients that likely had sepsis during their hospitalization. These include laboratory test results, blood culture orders, antibiotic administrations, and diagnosis and procedure codes. For those patients that meet the EHR-based sepsis clinical surveillance definition, a results dataset is produced that includes sepsis incidence and outcomes both by patient demographics and clinical characteristics.

Working with Different EHR Data Sources

The two EHR data sources selected for the project were organized very differently. For example, the Cerner Health Facts data provided standardized medication codes for all medicines administered with links to generic and brand names.  This was beneficial in the analysis process as the sepsis surveillance definition simply listed the relevant antibiotics in terms of generic names. The IHM data only provided verbatim drug names which were not standardized across the treatment facilities and required extensive searching and sifting through the data to identify the antibiotics that met the specification.

The data sources used are very large and some of the clinical criteria were hard to express. For example, values that needed to be checked in a window of two days, before or after, another event which made some computations very challenging using relational data sources. As such, the Commonwealth team used a few different analysis approaches over the life of the project to collaborate with Drs. Klompas and Rhee to review, discuss and apply the new CDC sepsis specification to terabytes of EHR data in order to develop correct and repeatable analyses and results that the team had full confidence in.

CVW Analytics for Complex Data Analysis

Drs. Klompas and Rhee selected Commonwealth as the data analysis collaborator for the project because of their previous successful experience working with Commonwealth as the informatics provider for the Harvard Department of Population Medicine’s Electronic Medical Record Support for Public Health (ESP) platform ( “We chose Commonwealth as our informatics collaborator for this project because they are knowledgeable, collaborative and understand the nuances of EHR data and public health surveillance.  We were confident they could help us explore these large EHR data sets and successfully apply our experimental methodology for sepsis surveillance,” said Michael Klompas.

To work with and analyze the large and complex EHR datasets, Commonwealth used Commonwealth Vigilance Workbench (CVW) Analytics. CVW Analytics is a software platform which allows users to conduct complex analyses on large clinical data sets. Commonwealth project leader and co-founder David Fram chose to work in the CVW Analytics environment for the CDC National Sepsis Surveillance Study because it provided him and the team with the ability to explore the EHR data in-depth which was necessary given the size of the data sources and the complexity of the sepsis surveillance specification.

CVW Analytics helped with data mapping including dealing with the specified antibiotics names and looking at summaries of all the reported values of the many different variables available. It was also used for exploring and combing through laboratory and procedures data in the Cerner and IHM databases.

Examples of the necessary data exploration included figuring out how to identify certain types of measurements that were required from the microbiology and laboratory data sets (e.g., identifying measurements of platelets, bilirubin, lactate, creatinine, etc.); also developing standard processes for handling missing or incongruous data (admit and discharge dates out of order, medication stop dates before medication start dates, etc.).

CVW Analytics also provides a visual environment for storing and tracking multiple complex analyses and resulting data subsets which was critical for managing a project of this scale.


Example of visual analysis diagram in CVW Analytics

Results & Conclusions

The final deliverables from Commonwealth to the research team included the subset of patients and their corresponding records vetted by the investigators that met the EHR-based clinical sepsis surveillance definition. The results were combined with data from other partners in order to generate the most credible estimates of sepsis incidence, outcomes, and trends to date, and importantly demonstrated that sepsis trends derived from EHR clinical data differ substantially from administrative claims data.  The results of this study were published in JAMA.  The work from this project will be used for future research to study additional important questions related to sepsis epidemiology and clinical care.

Project conclusions about the process of working with EHR data to apply the various sepsis criteria include:

  • EHR data is inherently complex and the size of the database and volume of records and unstructured data is challenging to work with.
  • EHR data can be used to answer important research questions such as quantifying the burden of sepsis with the right combination of data analytic tools, people and processes.
  • CVW Analytics is a valuable platform for data scientists to use to work with EHR data to answer complicated epidemiological research questions.


To schedule a brief demonstration or learn any other information on the Commonwealth Vigilance Workbench modules, please contact us.


About the Author

David Fram has an extensive background in the conception, development, and support of software products for clinical data management, data visualization, clinical and safety data analysis, and data mining.  Prior to co-founding Commonwealth Informatics, he led the development of Lincoln Technology’s safety data mining activities, and he served as Principal Investigator for Lincoln’s data mining Cooperative Research and Development Agreement with the Food and Drug Administration; he also served as Principal Investigator for the Department of Defense project that eventually led to the Pharmacovigilance Defense Application System.  Over the years, David played a lead role in the creation of a series of life sciences products that achieved widespread deployment, including the Prophet System for biomedical research, the CLINFO system for clinical research data management and analysis, and the Clintrial system for large-scale clinical trial data management.



Date posted: April 2, 2018 | Author: | No Comments »

Categories: Uncategorized

Leave a Reply

Close Bitnami banner