Michael Jahrer, VP of applied machine learning at Opera Solutions, proves his data science mettle by using deep learning to predict who will be a safe driver in the year ahead.
Porto Seguro, the third-largest insurance company in Brazil, set out to improve its predictions of who would file an insurance claim in the next year. The company sponsored a competition through Kaggle, the premier platform for predictive modeling and analytics competitions, and our own Michael Jahrer, VP of applied machine learning, took home the win.
With more than 5,000 teams and nearly 6,000 data scientists competing, this week-long contest attracted a wide range of data science royalty. For all intents and purposes, Jahrer won by a landslide. He correctly predicted six drivers would file claims in the next year, compared with the four his nearest competitors found. In addition, the delta between his score and second place was so great (0.00285) that 25 teams landed in that same amount of numeric space after second place.
Jahrer said three key decisions made the difference for him: cross-validating his scores with the private leaderboard, the use of an autoencoder for feature-engineering, and the fact that he blended six models to come up with his final score. And while these three approaches may not sound revolutionary, his fellow competitors were surprised and delighted by them. Jahrer explained his approach in the contest’s discussion board, and his cohorts laid on the praise, with one competitor saying his solution is “on another level.” Others called it “elegant,” “beautiful,” “amazing” and “jaw-dropping.”
For this contest, what impressed his peers wasn’t so much his skill or that he’d used a technique that had never been used before, but rather that his choice to use certain techniques with this specific problem was unorthodox. Ultimately, he debunked three truisms of the data science community:
(1) That deep learning works with only very large datasets
(2) That unsupervised learning isn’t useful
(3) That humans should do the more creative tasks and let the machines do the number crunching.
For the data science community, seeing new applications for old tricks is exciting. And for the business community, Jahrer’s accomplishment indicates higher potential accuracy for predictive ability, which means higher earning and saving potential for businesses everywhere — if they can learn from his approach and emulate it.
To learn more about the competition and Jahrer’s approach, download the full case study. You can also check out our press release.
Sarah Anderson is the marketing director at Opera Solutions.