“Didn’t you just go to a similar Big Data conference recently?” my wife asked me. “How much could have changed in a few months?” I was hesitating about attending another conference in a short time span. My wife is right about most things, but in this case, I am glad I didn’t listen and went anyway. I learned about many new advances in both commercial and open source tools and across the whole technology stack: new hardware, in-memory databases, and new-and-improved tools.
Everybody seems to be talking about machine learning these days, and a quick check of Google returns 21.2 million search results for the term. Clearly, this is a popular topic. Yet the term “machine learning” can have many different meanings, depending on the context in which it is being discussed. It is also associated with an equally lengthy list of data science techniques and technologies. Business leaders often feel overwhelmed by this rather bewildering array of terms, analytical approaches, and technology solutions. There are hundreds of algorithms, with new variants seemingly appearing every day. Researching them online does not seem to clarify the choices or point to an obviously superior decision, as most articles target deep experts and revolve around nuances of a particular model or open source package. As a result, business leaders can be reluctant to adopt something they don’t fully understand, resulting in missed opportunities.
The struggle is real — and it’s becoming increasingly apparent to companies that have dipped their toes into popular data science tools. As enterprises test the limits of their new tools, old technology, and data scientists’ time, their infrastructure is starting to show its cracks. Read on to see how these issues are revealing themselves — and more importantly — gather some ideas on what to do about it.
Over the past year, I have been averaging 2–3 customer meetings per week, resulting in over 100 customer and partner conversations around Big Data, analytics, and data science for the enterprise. From these conversations, I have found one key recurring theme: scale. Large enterprises no longer want to build one model quickly or implement just one use case in production. They all struggle with a large backlog of ideas. They need a way to rapidly turn these many ideas into real use cases that deliver tangible business value.
However, many companies simply can’t find a pathway to make this happen. Across my numerous conversations, I noticed very similar patterns and identified 5 common obstacles that can prevent companies from achieving scale for data science.
Most companies realize they are sitting on a treasure trove of customer data that has the potential to deliver tremendous business benefits; however, most also have no idea how to realize those benefits. How can companies use their data to bring in more customers, increase the amount they spend, and make them more loyal? How can companies use data to turn unhappy customers into loyal champions of the brand? And perhaps most important, how can companies use that data to drive a significant increase in revenue?
Most redemption programs suffer from the same challenge: delivering rewards that customers actually want. To make this possible, the programs offer ever-more rewards, which puts the onus on the customer to find desirable ways to spend their points. In the end, redeeming points can be more of a chore than a reward, ultimately diminishing the value of the very program that was supposed to create value and differentiation in a crowded space. But with millions of customers, no one (or even 100) reward(s) will meet the desires of everyone. So what are credit card issuers to do? How do they put the value back into these programs, so customers are incentivized to choose one card over another?
We’ve all been there. That place in the airport where you’ve just learned your flight has been canceled. If the weather’s bad, you can understand and roll with it. But if it seems arbitrary and royally screws up your plans — well, that’s another story. As a customer, you wonder, “Why do airlines do this? And how do they decide that my flight is canceled when others are not?” But airlines are asking their own questions, namely: “We have to cancel X number of flights, but which flights should we cancel to minimize the loss of revenue and customer loyalty?” Here, we delve into both sides of the issue, and the answers should provide a little context — and hopefully quell the frustration for everyone.
It was 7:00 a.m. on a Saturday morning, and the 10th floor of Los Angeles City Hall was filled with more than 450 people gathered to spend their day off of work with their noses buried in their laptops. They were data scientists, and they had come to innovate new technologies to solve complex social problems using the city’s newly open data.
The Hadoop Summit conference, hosted by Hortonworks and Yahoo, has become a must-see Big Data event. The Hadoop distributed computing architecture is now an integral part of what it means to be a data scientist, and a few days of concentrated effort each year is enough to get a vision for where the industry is headed. The Hadoop Summit serves this purpose well by providing thought-provoking technical sessions, keynote addresses, and a vendor exhibition that brings many of the major players in the Hadoop ecosystem together under one roof.
Edited by Yan Zhang
Healthcare fraud, waste, and abuse (FWA) are national problems that affect all of us either directly or indirectly. National estimates project that hundreds of billions of dollars are lost to healthcare FWA on an annual basis. These losses lead to increased healthcare costs and subsequently increased insurance premiums.
Ever wonder how services like Netflix or Pandora choose media to suggest to you? If you’ve been reading this blog for a while, you’re familiar — at least a little bit — with recommender engines. In our post “How Machine Learning Will Affect Your Next Vacation,” we talked about the impact machine-learning recommender engines have on regular consumers. But here, we want to dive deeper and talk about the math and science behind recommender engines.