AI Assistant Blog Methodology Facilities FAQ About Contact Compare Prices

Data Methodology

How we collect, process, and validate healthcare pricing data at scale.

Data Sources

CarePrices.ai combines two federally mandated data sources to provide the most comprehensive view of healthcare pricing available:

1. Hospital Price Transparency Files

Under CMS rule CMS-1717-F2 (effective January 2021), all Medicare-participating hospitals must publish machine-readable files containing:

  • Gross charges (list prices)
  • Discounted cash prices (self-pay rates)
  • Payer-specific negotiated charges (per-insurer rates)
  • De-identified minimum and maximum negotiated charges

We have indexed 380,000+ facility files containing 11.4 billion pricing rows, covering thousands of CPT and DRG codes.

2. Insurer Transparency in Coverage (TiC) Files

Under the Transparency in Coverage Final Rule (CMS-9915-F), health insurers must publish machine-readable files disclosing their negotiated rates with providers. We ingest and process TiC files from five major national carriers:

  • Aetna — 68.6 million aggregated rate records; 2.1 billion plan-level records
  • Blue Cross Blue Shield — 72 million aggregated; 337 million plan-level (covering 42 regional BCBS plans)
  • Cigna — 28 million aggregated; 1.1 billion plan-level
  • UnitedHealthcare — 30 million aggregated; 177 million plan-level
  • Kaiser Permanente — 13 million aggregated; 20 million plan-level

In total, the insurer data adds 3.76 billion plan-level rate records and 212 million payer-aggregate records across 22,000+ rated provider NPIs.

Combined Coverage

Together, our hospital and insurer datasets contain 588 million price records — making CarePrices.ai one of the most comprehensive healthcare pricing databases publicly available.

Data Pipeline

1

Discovery & Ingestion

We locate and download machine-readable files from 380,000+ healthcare facilities. Files come in CSV, JSON, and various non-standard formats. Our system handles URL redirects, authentication barriers, and broken links.

2

Parsing & Normalization

Custom parsers handle the wide variety of file formats. We normalize CPT/DRG codes, standardize payer names, resolve hospital identifiers via NPI and CMS Certification Numbers, and geocode facility locations.

3

Validation & Quality Control

Automated checks flag outliers, duplicates, and data quality issues. We cross-reference against CMS fee schedules, NPPES provider data, and known pricing ranges to ensure accuracy.

4

Aggregation & Indexing

Validated data is aggregated into a queryable database optimized for fast lookups by procedure, location, payer, and facility. Statistical summaries (medians, percentiles, distributions) are precomputed.

5

Payer MRF Integration

We ingest Transparency in Coverage (TiC) files from 5 major carriers (Aetna, BCBS, Cigna, Kaiser, UHC). Rates are aggregated by provider, CPT, site of care, and plan. Sentinel values are filtered, and rates are cross-referenced with hospital data for consistency.

6

Component Analysis

For common procedures, we identify all billing components (facility fee, professional fee, anesthesia, supplies) and model the expected total cost by site of care — so users see the full picture, not just one line item.

Coverage Statistics

  • Hospitals: 7,401 facilities indexed from hospital chargemasters
  • Providers: 22,029 additional provider NPIs from insurer MRF data
  • Total data points: 15+ billion pricing records
  • Hospital chargemaster: 11.4 billion rows (gross charges, cash prices, negotiated rates)
  • Insurer MRF aggregates: 212 million payer-level + 3.76 billion plan-level records
  • Carriers: 5 national carriers (Aetna, BCBS, Cigna, Kaiser, UHC) covering thousands of plans
  • Geography: All 50 states + DC
  • Procedures: Thousands of CPT and DRG codes

Limitations & Caveats

While we strive for comprehensive and accurate data, users should be aware of several limitations:

  • Hospital compliance varies. Not all hospitals publish complete or correctly formatted files. Some facilities are missing or have incomplete data.
  • Update frequency differs. Hospitals update their files on different schedules (quarterly, annually, or irregularly).
  • Prices may not reflect your actual cost. Published rates are a starting point. Your out-of-pocket cost depends on insurance coverage, deductibles, copays, and whether the provider is in-network.
  • Facility fees vs professional fees. Hospital chargemaster prices typically cover only the facility component. Professional fees (doctor, anesthesiologist, radiologist) are billed separately. We estimate total procedure costs by combining components, but actual bills may vary.
  • Insurer data limitations. Carrier MRF rates include all plan types (commercial, Medicare Advantage, Medicaid managed care). We filter outliers but cannot always distinguish plan types in aggregated views. Some carriers have limited NPI coverage.
  • Code mapping challenges. Not all hospitals use standard CPT codes consistently. Some use internal charge codes that require mapping.

Contact

For questions about our methodology, data licensing, or to report data issues, contact us at [email protected].

About the Data Compare Prices