HealthSparq Provider Data Engine

Project Overview

Engineered a custom synchronization engine to aggregate provider information from the HealthSparq Nevada Medicaid portal. The system successfully navigated an Imperva WAF environment and bypassed a 300-result visibility cap to retrieve a complete dataset of 3,865 unique provider records.

Key Technical Contributions

Advanced Sharding: Implemented Specialty-Based Sharding to overcome API index limits, extracting 129 internal codes to ensure 100% database visibility.
Security & Authentication: Developed a Stateless Token Handshake to replace volatile cookie-based sessions, performing backend “activation” via insurer gateways for higher stability.
Polite Scraping Architecture: Integrated linear backoff with jitter and reactive re-authentication to mimic legitimate browser fingerprints (Chrome 120 TLS) and avoid robotic signatures.
Data Integrity: Built a transformation pipeline to de-duplicate “Provider-at-Location” instances into a clean, analysis-ready JSONL dataset.

Impact & Results

100% Coverage: Successfully bypassed visibility caps to extract 3,865 unique providers.
Resilience: Created a “Resume Capability” feature that skips completed shards, significantly saving bandwidth and processing time.

Tech Stack

Python 3.12 | curl_cffi | Stateless Auth | Data Orchestration

Mekhma Tamang

Project Overview

Key Technical Contributions

Impact & Results

Tech Stack