Data Methodology
How we collect, process, and validate healthcare pricing data at scale.
Data Sources
CarePrices.ai combines two federally mandated data sources to provide the most comprehensive view of healthcare pricing available:
1. Hospital Price Transparency Files
Under CMS rule CMS-1717-F2 (effective January 2021), all Medicare-participating hospitals must publish machine-readable files containing:
- Gross charges (list prices)
- Discounted cash prices (self-pay rates)
- Payer-specific negotiated charges (per-insurer rates)
- De-identified minimum and maximum negotiated charges
We have indexed 380,000+ facility files containing 11.4 billion pricing rows, covering thousands of CPT and DRG codes.
2. Insurer Transparency in Coverage (TiC) Files
Under the Transparency in Coverage Final Rule (CMS-9915-F), health insurers must publish machine-readable files disclosing their negotiated rates with providers. We ingest and process TiC files from five major national carriers:
- Aetna — 68.6 million aggregated rate records; 2.1 billion plan-level records
- Blue Cross Blue Shield — 72 million aggregated; 337 million plan-level (covering 42 regional BCBS plans)
- Cigna — 28 million aggregated; 1.1 billion plan-level
- UnitedHealthcare — 30 million aggregated; 177 million plan-level
- Kaiser Permanente — 13 million aggregated; 20 million plan-level
In total, the insurer data adds 3.76 billion plan-level rate records and 212 million payer-aggregate records across 22,000+ rated provider NPIs.
Combined Coverage
Together, our hospital and insurer datasets contain 588 million price records — making CarePrices.ai one of the most comprehensive healthcare pricing databases publicly available.
Data Pipeline
Discovery & Ingestion
We locate and download machine-readable files from 380,000+ healthcare facilities. Files come in CSV, JSON, and various non-standard formats. Our system handles URL redirects, authentication barriers, and broken links.
Parsing & Normalization
Custom parsers handle the wide variety of file formats. We normalize CPT/DRG codes, standardize payer names, resolve hospital identifiers via NPI and CMS Certification Numbers, and geocode facility locations.
Validation & Quality Control
Automated checks flag outliers, duplicates, and data quality issues. We cross-reference against CMS fee schedules, NPPES provider data, and known pricing ranges to ensure accuracy.
Aggregation & Indexing
Validated data is aggregated into a queryable database optimized for fast lookups by procedure, location, payer, and facility. Statistical summaries (medians, percentiles, distributions) are precomputed.
Payer MRF Integration
We ingest Transparency in Coverage (TiC) files from 5 major carriers (Aetna, BCBS, Cigna, Kaiser, UHC). Rates are aggregated by provider, CPT, site of care, and plan. Sentinel values are filtered, and rates are cross-referenced with hospital data for consistency.
Component Analysis
For common procedures, we identify all billing components (facility fee, professional fee, anesthesia, supplies) and model the expected total cost by site of care — so users see the full picture, not just one line item.
Coverage Statistics
- Hospitals: 7,401 facilities indexed from hospital chargemasters
- Providers: 22,029 additional provider NPIs from insurer MRF data
- Total data points: 15+ billion pricing records
- Hospital chargemaster: 11.4 billion rows (gross charges, cash prices, negotiated rates)
- Insurer MRF aggregates: 212 million payer-level + 3.76 billion plan-level records
- Carriers: 5 national carriers (Aetna, BCBS, Cigna, Kaiser, UHC) covering thousands of plans
- Geography: All 50 states + DC
- Procedures: Thousands of CPT and DRG codes
Limitations & Caveats
While we strive for comprehensive and accurate data, users should be aware of several limitations:
- Hospital compliance varies. Not all hospitals publish complete or correctly formatted files. Some facilities are missing or have incomplete data.
- Update frequency differs. Hospitals update their files on different schedules (quarterly, annually, or irregularly).
- Prices may not reflect your actual cost. Published rates are a starting point. Your out-of-pocket cost depends on insurance coverage, deductibles, copays, and whether the provider is in-network.
- Facility fees vs professional fees. Hospital chargemaster prices typically cover only the facility component. Professional fees (doctor, anesthesiologist, radiologist) are billed separately. We estimate total procedure costs by combining components, but actual bills may vary.
- Insurer data limitations. Carrier MRF rates include all plan types (commercial, Medicare Advantage, Medicaid managed care). We filter outliers but cannot always distinguish plan types in aggregated views. Some carriers have limited NPI coverage.
- Code mapping challenges. Not all hospitals use standard CPT codes consistently. Some use internal charge codes that require mapping.
Contact
For questions about our methodology, data licensing, or to report data issues, contact us at [email protected].