Department of Public Health and Environment Improves Readiness to Combat Health Threats with Integrated Data Ecosystem
Share
Story highlights
Challenge
Solution
Data Lakehouse Integration
Our first priority was ingesting data from multiple sources into a data lakehouse, a unified platform for storing and analyzing a variety of data types. We applied our experience integrating public health data assets across numerous public and private cloud environments, along with deep in-house expertise in public health and epidemiology, to guide them through this process.
Key Technologies
The platform buildout began with the design and development of foundational architecture, followed by iterating solutions to align with specific needs. The data lakehouse built on the Google Cloud Platform (GCP) utilizes the following tools:
Efficiency and Security
Once the data lakehouse was established and populated, we implemented automated processes to clean and link data in a person-centric way using Resultant’s proprietary Probabilistic Record Linkage running on a Databricks compute engine. We deployed a Secure Data Enclave (SDE), or Collaborative Research Environment (CoRE), to enable data sharing and analysis in a secure, isolated environment.
Custom CDC capabilities
A critical requirement for the agency was the ability to track field-level value changes over time across data sources feeding the data lakehouse, so we leveraged Data Vault 2.0 modeling and real-time data replication as the underlying data modeling methodology. Custom capabilities were developed to support near real-time data replication for data sources that weren’t able to natively support real-time replication.