One of the really big trends in healthcare right now is “Big Data”, it seems everyone is very, very excited about it and what it promises to do. It is the health information Golden Child of the day. There are a lot of hopes that “Big Data” will make healthcare more evidence based, and that policy decisions will be improved as a result of applying “Big Data” approaches.
However, “Big Data” and its use also has “Big Risks” – and might mislead policy makers (and others) to some rather wrong conclusions about what should or should not be done.
To understand the risks associated with “Big Data”, you have to first understand what “Big Data” is – it is the linking together of data sets (data sets that might not have ever been intended to be linked together) and then using that larger data set to undertake analysis. In healthcare, at the provincial level, a “Big Data” approach might involve linking the information contained in the various administrative databases. This includes information on hospitalizations (discharge abstract database), physician services paid for by the medical services plan, Pharmacare data, home and community care data, etc. If it is data in an administrative database and has a unique personal identifier (ie. a Personal Health Number or a Social Insurance Number) it can be “linked” to other databases.
It sounds great – after all, one of the massive problems in healthcare is a failure to do analysis “across the system”. A “Big Data” approach allows for that kind of analysis – it enables health policy researchers to answer some interesting and important questions. “Big Data” is also inexpensive, in the sense that it makes use of data that is already collected so there is no need to go out and collect additional data. It has potential, and it is certainly an informational step forward.
However, there are a few things to consider, important things to consider, things that should make policy makers and health researchers at least pause to consider their reliance on “Big Data” to provide answers to pressing problems.
Caution Needed: Blind Spots, Systematic Biases, Reliability Issues and Dodgy Conclusions
Large databases are now being linked together in “Big Data” projects. The thing is, there are a lot of nuances to the databases, things that without having spent a significant amount of time working with the data or having access to an expert in a particular database (and it is safe to say that the “experts” spend more than a year working just with one of the databases), that a person can be blissfully unaware of. Sometimes the data is not quite what a person thinks the data is. Sometimes, there is a definitional change that drives large differences in the numbers. There is a lot of variation in the quality of the information that is contained in the different databases.
The thing that is absolutely stunning about current health data, is the data that is unavailable in administrative databases- the blind spots. Sometimes the data is nowhere near complete – for example, data on fee-for-service services is reasonably complete but the data on “alternative payment services” is limited, so if a doctor is paid a salary to provide services, there is not a lot of information on the services that were provided or to whom. Further, we have very little data on actual outcomes as reported by patients on their experiences – there is data on the length of hospital stay but little data about whether or not the procedure had an impact on a patient’s quality of life. Then there is another big black box – data on health services that were not publicly paid for, or around 30 percent of total health spending. If all the health databases are collated, approximately 65 ish percent of all public health spending (70 percent of total spending) could be accounted for – this would “paint” a picture of the health system that is a little less than half complete (45.5 percent of the total system as measured by expenditures would be captured).
Now consider that many of the blind spots are systematic. The parts of the system on which there is no data, or little data, or poor quality data are not randomly distributed. They are specific pockets of care about which there is little information. Further, it is conceivable and very likely that they are specific pockets of care that likely affect the population disproportionately.
Then there is an issue of data reliability. Understanding how the data is collected, why it was collected, and for what purpose is critical to understanding whether or not it should be expected to be reliable. "Big Data" collates data from several databases, usually administrative. When those databases were established, they were not established with "Big Data" in mind. The reliability of the databases that contribute to a “Big Data” database is variable. There is some health data that should be taken with a whole shaker of salt. An example is wait times data. Wait times data is measured from the time the surgeon submits the booking form to the hospital, to the time the procedure is completed. If the procedure is never completed for whatever reason, the time spent waiting by the patient for access to care “doesn’t count”. Anywhere where the person inputting the data has little incentive to do so correctly might be vulnerable to issues of reliability and quality. As such, before using a “Big Data” approach – the elements being used should be carefully scrutinized.
If there are large blind spots and/or data that is of questionable reliability or quality in a “Big Data” approach there is a risk of drawing some rather dodgy conclusions. The same risk that emerges when a meta-analysis (a study that compiles the results of studies already done) is undertaken without closely examining the contributing studies/research. The risk of a dangerously mislead conclusion that does not reflect reality and may ultimately harm the health and well-being of either the system or the individuals who are served by it.
Big Data is not a silver bullet for the health system and its inappropriate use may well prove to be a poison pill. Given the current limitations and nuances of the information available, and just as importantly the information not available – the use of “Big Data” to do much more than highlight areas for further investigation should be met with skepticism. In many ways the conclusions drawn from well-thought out and well-conducted original research (the kind where data had to be collected from primary sources and not just harvested from administrative databases that have been collated) might be of significantly higher quality than results from “Big Data” studies. Evidence-based decisions deserve the highest quality information based on an analysis of quality data not just "Big Data", otherwise, there is tremendous risk that the decisions made will be "evidence-based" and completely wrong.