Introduction to the Healthcare Data Quality Blog

Health care spending represents almost 20% of the United States GDP and just about every dollar of that spend is recorded on a medical or pharmacy claim creating one of the most useful sources of data in existence. Despite huge gains in standardization over the last twenty years administrative claims data remains one of the hardest sources of information to unlock in a reliable and repeatable way. As payers shift more and more risk to providers and create increasingly complicated reimbursement contracts the importance (and complexity!) of this data source continues to grow.

I’ve been working with claims data for twenty years. I’ve used this data to create risk adjustment models used by both commercial payers and Medicare, predictive models used for population health targeting, program evaluations, quality measurement, disease burden and cost effectiveness studies, as well as bundled payment and accountable care research. For any of this to be useful the first and most important step is to create a reliable and correctly interpreted database of medical claims, pharmacy claims, and enrollment data.

Unfortunately this data generally arrives with no instruction manual and plenty of room for misinterpretation and error. Through this blog I hope to identify common (and less common but important) ways that health care data is transformed, blinded, and misinterpreted. Using the Freedman Healthcare‘s data publishing system I will highlight ways to identify issues as well as determining which fields can be trusted when conflicts occur.