Payer-Scale Medallion Lakehouse
Project Overview
Developed a Medallion Architecture (Bronze, Silver, Gold) on Databricks to identify high-risk, high-cost members in US Healthcare. The project transforms fragmented CMS synthetic claims data into a unified "Patient 360" view for analytical insights.
Key Technical Contributions
- Architecture: Built an ELT pipeline using PySpark and SparkSQL to process Inpatient, Outpatient, and Pharmacy claims.
- Security & Compliance: Implemented HIPAA-compliant de-identification using SHA-256 Hashing with SALT and the Safe-Harbour method.
- Advanced Analytics: Engineered custom metrics including:
- Total Cost of Care (TCOC): Aggregated spend across multiple silos.
- Provider Fragmentation Score: Measuring risk through unique provider counts per patient.
- Rescue-to-Preventive Ratio: Identifying gaps in preventive care coordination.
Impact & Results
- Identified a cohort of High-Risk patients (Diabetes + CKD) driving 3.5x higher costs than the general population.
- Created a scalable Gold Layer table for longitudinal patient tracking.
Tech Stack
Databricks | PySpark | SparkSQL | Medallion Architecture