M

MAPFRE

MAPFRE improves homeowner insurance fraud detection by 31% using synthetic data

+31%Fraud Detection Rate (Recall)
$310,000Estimated Annual Savings per 100 Fraud Reports
+0.85%Precision Improvement

The Challenge

Property and casualty insurers face a persistent challenge: homeowner insurance fraud is both more costly per claim and significantly rarer than auto fraud, creating severe class imbalance in training data. When MAPFRE extended its existing AI fraud detection system — originally built for auto claims — to homeowner policies, the team encountered a dataset where genuine fraud examples were too few to produce a reliable model. Standard remediation techniques such as under-sampling were off the table; the minority class was already too small to reduce further without destroying signal. Without a viable training corpus, the model struggled to generalize, leaving costly fraudulent claims slipping through detection.

The Solution

MAPFRE addressed the data scarcity problem by augmenting real claims data with AI-generated synthetic records using DataCebo's CTGAN model from the open-source Synthetic Data Vault framework. Unlike standard generative adversarial networks, CTGAN uses a conditional vector constructed from categorical variables, enabling the generator to learn the complex distributions inherent in tabular insurance data without mode collapse. The team conducted systematic experiments varying both the volume and composition of synthetic data added to the training set, drawing on a rich feature set that included claims history, policy attributes, graph-based interconnection data, geocode information, and weather inputs. The validated synthetic augmentation pipeline was subsequently deployed to production, integrating directly with MAPFRE's existing fraud detection infrastructure.

Results

Synthetic data augmentation delivered measurable improvement across both primary detection metrics — an outcome that is statistically uncommon in fraud modeling:

  • +31% increase in fraud detection recall, meaning significantly more fraudulent claims are now flagged for investigation
  • +0.85% improvement in precision, reducing false positives that consume investigator time and delay legitimate claim payouts
  • ~$310,000 in estimated annual savings per 100 fraudulent claims identified, reflecting both recovered losses and reduced investigation overhead

The simultaneous gain in recall and precision — metrics that typically move in opposite directions — gave MAPFRE the confidence to move the model from experimentation into full production deployment.

Key Takeaways

  • Synthetic data is a viable remedy for extreme class imbalance when the minority class is too small to survive under-sampling without losing meaningful signal.
  • Achieving simultaneous recall and precision gains is possible with well-tuned synthetic augmentation — teams should not assume a tradeoff is inevitable.
  • Model selection matters: CTGAN outperformed competing synthetic data vendors for this tabular use case, underscoring the need to benchmark tools against your specific data structure.
  • Feature richness amplifies results — incorporating graph-based, geospatial, and weather data alongside standard claims features meaningfully improved model performance.
  • Start with the fraud type where labeled examples are scarcest; that is where synthetic augmentation delivers the highest marginal return.

Share:

Details

AI Technology
Generative AI
Company Size
Enterprise
Company
MAPFRE
Quality
Verified

Have a similar implementation?

Share your customer's AI results and link it to your vendor profile.

Submit a case study →