Medical advancement over the past few decades has proceeded at an astounding pace, bringing tremendous gains in health outcomes. Global child mortality, the share of deaths for those 5 and under, fell from 19% in 1960 to just below 4% in 2017, according to United Nations data. Infant mortality in the United States fell from 26/1000 to under 6/1000 during the same period. But improvement from medical advancement is slowing. Where progress through medical science levels off, data analysis can help providers and policymakers target and optimize programming for those who most need it. Linking data across sources—thinking beyond health data to the social factors that affect well-being—can redefine treatment and outreach.
For example, we have demonstrated the potential of data sharing by identifying the population at highest risk for infant mortality: A small subset of 1.6% of the population accounts for 50 percent of infant deaths. Programs and services also may be tailored to targeted subgroups based on their historical efficacy in similar circumstances.
Joint analysis of combined disparate datasets makes both of these scenarios possible, producing insight and providing decision makers at all levels with social context for the individuals they serve.
Through our work in government, healthcare, and other large organizations, we have found that the biggest hurdles to achieving this goal are clarity of ownership, fear and unease around data sharing, and technology barriers to effectively analyzing disparate data.
Taking ownership of outcomes
A battle between structure and function too often inhibits program administration from bringing real results for citizens. That is, governments are structured to efficiently administer programs rather than to effectively create the outcomes those programs were designed for in the first place. The purpose of the Medicaid program, for example, was not simply to fund health care for low-income Americans but to improve the health and well-being of American families and bring greater economic security to the country. Yet complex Medicaid systems track activity and collect information related to running the program rather than health and wellness outcomes for citizens.
The transactional systems that support program delivery rarely maintain the information required to quantify important metrics that trace back to the core purpose of programs and understand the context of the individuals they serve. However, that information often exists elsewhere within government.
Forward-looking government leaders don’t see their jobs as simply administering a program. They instead take ownership over creating outcomes. They view program evaluation and efficacy through a wider lens, which may mean looking across all programs that serve a particular population or leading collaboration across agencies to address a problem.
Overcoming the fear of data sharing
In our experience, everyone wants others to share data with them but few are eager to share their own. That’s understandable: An abundance of caution is appropriate for ensuring sensitive data does not exceed the bounds of its permitted use. However, we have found that these excuses often are not grounded in law or program restrictions but fear.
Many program leaders we have worked with find that permitted use extends dramatically further than they had anticipated. Moreover, the data becomes exponentially more valuable when shared for the collaborative analysis that fuels insight and creates better outcomes for individuals.
Getting over the data-sharing hurdle becomes much easier after leaders conclude that data sharing is in the best interest of the public and permitted by their program. Privacy officers and legal counsel typically find a path to permitted use within the allowances of governing documents.
Unfortunately, permitting use is not the end of the data-sharing road. After data is approved for sharing, organizations often struggle to ensure that data is safely, securely, and responsibly used. This is true for data-sharing across agencies and becomes further complicated when external entities or analysts are involved.
Keep reading. Technology can help.
Using technology for secure, effective data sharing
Technology cannot completely solve the above problems, but it can make things dramatically easier. Workflows and document management speed and simplify execution of data-use agreements. Metadata management, data governance, and other data documentation solutions decrease the probability data is misinterpreted and can expedite data interpretation. Point-and-click business intelligence and data visualization tools lower the barriers to producing analysis. Such tools are fairly common and readily available by commercial vendors in the market. Few tools exist to address the unique concerns related to data maintained by governments and other highly regulated organizations.
Two of these concerns are accurately combining data and enabling advanced analysis while ensuring appropriate use. Their solution demands a scalable, fault-tolerant solution to join disparate data and a secure, elastic platform to enable complex analysis and collaboration.
Public and health data can be complex, ridden with data quality issues, redundant, misleading, conflicting, and otherwise difficult to analyze. These factors make linking disparate data extremely complex because shared unique identifiers across datasets are absent. Over the past eight years, we have iteratively developed a tunable, probabilistic record linkage solution that addresses all of these issues, provides false positive and false negative match bounds, and runs at scale. It can link and deduplicate billions of records in minutes on a platform-agnostic, common cloud architecture.
A secure, virtual data analysis room addresses countless barriers to successful collaborative analysis by distributed internal and external teams. As code repository management solutions revolutionized the software development lifecycle, a virtual data analysis room enables teams to work from the same version of data, benefit from useful transformations and corrections, are unencumbered by computational limitations, and can still manage code effectively using a shared repository. Most important, data stewards and owners—those responsible for ensuring the secure and appropriate use of data—can oversee everything being done with the data and approve analysis before it is put into action. Our team developed the Collaborative Research Environment (CoRE), to serve as this important foundation for data analysis.
Governments, health organizations, nonprofit assistance programs, and others maintain a wealth of information that can be used to tailor programs to the individuals who will benefit most from them. Data can and should revolutionize our approach to public health and the application of need-based programs. It can keep us constantly driving toward the outcomes these programs were initially envisioned to create.
Working together, we can address the barriers of ownership and fear while leveraging technology to facilitate responsible, cross-organization data sharing and collaboration. Insight from the analysis of disparate data can help us make the best use of our current medical technology and apply social programs where they are needed most. We can accelerate the impact of medical advancement by connecting that knowledge to the people it impacts, fine-tuning care and outreach through data, and improving more lives. And we can do it right now.