Shopping around for a Big Data analytics solution is a daunting task for anyone. But for those who are somewhat familiar with data science, a common area of misunderstanding — and underestimating — is clustering techniques. Whether the assumption is that all clustering techniques are created equal or that a company needs only one or two clustering techniques, business buyers are often left scratching their heads. The fact is several types of clustering techniques exist — each with its own strengths and weaknesses — and companies need access to a variety of techniques to accomplish optimal results.
In my role of leading Product Management and Presales at Opera Solutions, I am constantly exposed to direct customer interactions, most often in the early stages of the sales cycle. In these meetings, part of my job is to assess our prospective customer’s pain points and needs as much as they are assessing our products, technology, and capabilities. Thus, given the exposure we get at Opera Solutions, I am in a good position to understand real-world business needs around analytics across industries.
We often talk about corporate cultures, but the experience with hundreds of customers led me to think of corporate psychology, and juxtaposing Maslow’s hierarchy of needs to the current state of data science adoption and readiness in the industry. Companies need to recognize the stage they are in and not be seduced by the hype or promise of the technology. Data science adoption needs not follow a sequential maturity process; dynamic corporations can certainly accelerate things when the need and will exist. So for fun, here’s a take on Maslow’s Hierarchy of Needs adapted to data science.
Everywhere you turn, both business and IT talk about data science. But there’s also trepidation about how to get started, especially in the context of attaining an organization’s business goals and objectives beyond the realm of lab or departmental experimentation.
“Didn’t you just go to a similar Big Data conference recently?” my wife asked me. “How much could have changed in a few months?” I was hesitating about attending another conference in a short time span. My wife is right about most things, but in this case, I am glad I didn’t listen and went anyway. I learned about many new advances in both commercial and open source tools and across the whole technology stack: new hardware, in-memory databases, and new-and-improved tools.
Everybody seems to be talking about machine learning these days, and a quick check of Google returns 21.2 million search results for the term. Clearly, this is a popular topic. Yet the term “machine learning” can have many different meanings, depending on the context in which it is being discussed. It is also associated with an equally lengthy list of data science techniques and technologies. Business leaders often feel overwhelmed by this rather bewildering array of terms, analytical approaches, and technology solutions. There are hundreds of algorithms, with new variants seemingly appearing every day. Researching them online does not seem to clarify the choices or point to an obviously superior decision, as most articles target deep experts and revolve around nuances of a particular model or open source package. As a result, business leaders can be reluctant to adopt something they don’t fully understand, resulting in missed opportunities.
In a previous post, “5 Obstacles to Achieving Scalable Data Science, and How to Overcome Them,” we talked about perspectives distilled from hundreds of conversations with our customers and partners and the challenges they face in trying to achieve a scalable data science capability. All of these customers have an extensive backlog of ideas, but they struggle to convert these ideas into actual use cases, or mini-applications, that can run in a production environment and generate real business value. These businesses universally encounter the following key obstacles:
(1) They have too many tools and technologies to manage effectively.
(2) Data is everywhere, but deriving value from it is extremely difficult.
(3) The traditional “artisan” approach to use cases severely limits the number of business problems they can solve.
(4) Operationalizing data science, with hundreds of models in production, is extremely difficult.
(5) Companies are willing to experiment but are afraid to make the long-term commitment necessary to foster widespread adoption.
The struggle is real — and it’s becoming increasingly apparent to companies that have dipped their toes into popular data science tools. As enterprises test the limits of their new tools, old technology, and data scientists’ time, their infrastructure is starting to show its cracks. Read on to see how these issues are revealing themselves — and more importantly — gather some ideas on what to do about it.
Over the past year, I have been averaging 2–3 customer meetings per week, resulting in over 100 customer and partner conversations around Big Data, analytics, and data science for the enterprise. From these conversations, I have found one key recurring theme: scale. Large enterprises no longer want to build one model quickly or implement just one use case in production. They all struggle with a large backlog of ideas. They need a way to rapidly turn these many ideas into real use cases that deliver tangible business value.
However, many companies simply can’t find a pathway to make this happen. Across my numerous conversations, I noticed very similar patterns and identified 5 common obstacles that can prevent companies from achieving scale for data science.