Data Science As the Panacea for Healthcare Fraud, Waste, and Abuse

Posted by Daniel D. Gutierrez on Thu, Jun 12, 2014

medicare fraudEdited by Yan Zhang

Healthcare fraud, waste, and abuse (FWA) are national problems that affect all of us either directly or indirectly. National estimates project that hundreds of billions of dollars are lost to healthcare FWA on an annual basis. These losses lead to increased healthcare costs and subsequently increased insurance premiums. 

In one brazen scheme, a group set up a network of fraudulent medical-supply stores in the Southwest, hoping to cheat Medicaid and Medicare. The gang hired recruiters to bring them innocent patients eligible for the government services. They then paid off local doctors to prescribe motorized wheelchairs worth $7,500 but instead gave them motor scooters worth just $1,500, pocketing the difference. Investigators shut down the scheme after noticing billings for wheelchairs in Arizona, Texas, and other states scaling into the hundreds of millions of dollars.

The size of the healthcare sector, the enormous amount of money involved, and the lack of surveillance and monitoring mechanisms across the healthcare ecosystem make it an attractive target for FWA. According to the Office of Management and Budget, in 2010, about 9%, or $47.9 billion was lost to fraud in Medicare alone. It is therefore imperative to develop effective FWA technologies and solutions for reducing the costs associated with our healthcare system.

Given the scope and scale of national healthcare, advanced data science is necessary to detect and mitigate FWA. Opera Solutions is the industry leader in applying Big Data technologies to the most challenging and significant business problems. We are the company charged with developing the analytics to identify fraud for the Centers for Medicare & Medicaid Services (CMS) on the health insurance exchanges. Here’s a look into just how big this challenge is — and some of the approaches we’re taking to overcome one of the costliest burdens in America.

Types of Healthcare Fraud

Many types of fraud are perpetrated on the healthcare system. Here is a sampling of common patterns found in healthcare fraud:

medicare fraudService Provider Fraud

  • Billing for services that are not actually performed
  • Performing medically unnecessary services solely for the purpose of generating insurance payments
  • Unbundling — billing each stage of a procedure as if it were a separate treatment
  • Use of a single patient ID to generate billing across multiple providers
  • Upcoding — billing for more costly services than the ones actually performed
  • Home healthcare companies demanding payment for treating clients actually in the hospital
  • Home healthcare companies and visiting nurses billing additional amounts
  • Patient transportation services claiming charges for patients who were never moved
  • Durable Medical Equipment (DME) claims for services and supplies not provided
  • Using stolen patient IDs to submit claims
  • Billing cosmetic surgeries as necessary repairs
  • Routinely overusing modifiers that exempt claims from editing
  • Excessively charging more than the unit thresholds
  • Billing for individual services within a global surgery billing period

Insurance Subscriber Fraud

  • Falsifying records of employment/eligibility to obtain a lower premium
  • Filing claims for medical services that were not actually received
  • Using another person’s coverage or insurance card to illegally claim the insurance benefits
  • Falsifying information on health insurance exchange to obtain government subsidies

General Fraud Detection Strategies

Data science technologies can help in healthcare FWA detection and prevention in a number of important ways:

  • Detecting the patterns of FWA in the billing produced by doctors and hospitals using a proprietary bottom-up Ensemble method
  • Profiling and segmenting claimants to identify those who are likely to commit fraud using unsupervised learning methods
  • Identifying connections among fraudsters via social network analysis
  • Detecting abnormal medical event sequences for the patients
  • Defining the similarity between claims to identify hidden claims duplicates  
  • Detecting fraud by applying analytics to huge volumes of Medicare claims data and using a combination of anomaly detection, business rules, and predictive models.
  • Revealing fraudulent activities by analyzing unstructured data (e.g. Tweets, e-mails, etc.) using advanced text analytics
  • Incorporating user inputs and domain knowledge by implementing feedback loops in the analytics models

Machine Learning for Fraud Detection

An efficient FWA analytics solution requires a combination of advanced predictive modeling algorithms, a user-friendly interactive interface, efficient workflow management, and the ability to seamlessly integrate with the existing system. Many existing FWA solutions face the challenge of addressing them effectively. For example:

  1. Rules-based approaches utilize simple logistics from known schemes, obvious patterns, hotlists, and retrospective review. They can be both too aggressive, flagging too many suspects for review, which wastes resources with false positives, and too conservative, failing to detect ever-changing FWA and resulting in many false negatives. Rules are also time-consuming to maintain and always require subject-matter expertise to update with additions or edits.
  2. Many predictive modeling solutions are limited by the statistical approaches and data that are utilized to build the models. First of all, these models are static and can only capture FWA patterns similar to historically identified fraud cases that exist in the model building data. Secondly, these models look at FWA patterns from a subset of the claims, beneficiary, and provider levels and in isolation. As a result, the anomaly detection model will misjudge aberrant behavior without fully considering the context, which results in a large number of both false positives and false negatives. Lastly, one typical challenge is that risk scores are difficult to interpret and take actions upon.      

Opera Solutions’ FWA Radar utilizes its industry-leading expertise in Big Data predictive analytics and extensive experience in developing healthcare-focused predictive modeling solutions to provide a comprehensive FWA offering. With the integration of multiple data sources and advanced machine-learning techniques — which produce results in real time — our solution identifies not only known but also newly emerging FWA patterns through a self-learning module, which incorporates user feedback.

Fraud detection methods based on data science can be roughly divided into two categories: supervised and unsupervised machine learning. Supervised learning requires all cases in the training data set to be labeled by domain experts. Unsupervised learning does not have this requirement, as the objective is to find outliers in the cases. Examples of the algorithms that have been applied to Medicare fraud detection include neural networks, decision trees, association rules, Bayesian networks, and genetic algorithms, among others. As a result of applying these methods, some fraud behaviors can now be detected, including home/hospital stay conflict, hospital stay with no associated physician inpatient visit, excessive lab/radiology services per client per day, x-ray duplicate billing, fragmented lab and x-ray procedures, lab/x-ray interpretation with no associated technical portion, and ambulance trips with no associated medical service.

One of the examples is neural networks, which have been used extensively in detecting fraud in general thanks to their ability to handle complex data structures and nonlinear variable relationships. However, one common concern with neural networks is overfitting (which produces a relatively small error on the training dataset but a much larger error when new data is presented to the network). Overfitting is especially prominent with skewed data such as healthcare claims, which have many more legitimate cases than fraudulent. Fortunately, a number of strategies have been devised to address the overfitting problem, such as adding a weight delay term to the error function, and another technique called “early stopping,” which uses two different training data sets (one to update the weights and biases, and the other to stop training when the network begins to overfit the data) — both of which improve the generalization of the neural network from the training data set to the test data set.

Applying supervised learning to Medicare fraud detection often involves combining several supervised algorithms (as an ensemble) to improve classification performance. Here are some examples of fraud-detection learning ensembles:

  • A Bayesian network whose weights were refined by a rule generator
  • The use of a k-nearest neighbor algorithm whose distance metric is optimized by a genetic algorithm in detecting two types of fraud: inappropriate practice of service providers and “doctor-shoppers” (soliciting multiple physicians using a variety of false pretenses to receive prescriptions for controlled substances)
  • A model that combines fuzzy-set theory and a Bayesian classifier was designed to detect suspicious claims
  • Association rules and a neural segmentation algorithm for fraud detection

Most unsupervised methods are combined with supervised methods in Medicare fraud detection. For example, a clustering algorithm can be applied to divide all insurance subscriber profiles into groups. Then a decision tree can be built for each group and converted into a set of rules. When an unsupervised method is followed by a supervised method, the objective is usually to discover knowledge in a hierarchical way.

Opera Solutions’ healthcare FWA capacity also includes a proprietary Ensemble scoring methodology that combines multiple models at various granularity levels. Each has a unique ability to address a particular aspect of the problem. This allows the solution to capture the complicated structure of procedure and diagnosis codes at the visit level and maximize performance in predicting outliers.


The development of a safe, high-quality, and cost-effective healthcare system like Medicare requires effective ways to detect and mitigate FWA. The application of machine learning to healthcare FWA is necessary but highly challenging. The overall objective is to develop FWA detection methods and algorithms that are scalable, accurate, and fast. Ultimately, they need to be able to handle the immense volume of healthcare data in real time while keeping costs and error rates low. If you’d like to learn more about how we detect and prevent healthcare fraud, download our FWA Radar sheet.


Download Now


Daniel D. Gutierrez is a Los Angeles–based data scientist. He is also a recognized Big Data journalist and is working on a new machine-learning book due out in later this year.

Yan Zhang is an Opera Solutions data scientist based in San Diego. He leads Opera Solutions’ healthcare analytics team.


Topics: Healthcare, Big Data, Data Science, Machine Learning