In a previous post, “5 Obstacles to Achieving Scalable Data Science, and How to Overcome Them,” we talked about perspectives distilled from hundreds of conversations with our customers and partners and the challenges they face in trying to achieve a scalable data science capability. All of these customers have an extensive backlog of ideas, but they struggle to convert these ideas into actual use cases, or mini-applications, that can run in a production environment and generate real business value. These businesses universally encounter the following key obstacles:
(1) They have too many tools and technologies to manage effectively.
(2) Data is everywhere, but deriving value from it is extremely difficult.
(3) The traditional “artisan” approach to use cases severely limits the number of business problems they can solve.
(4) Operationalizing data science, with hundreds of models in production, is extremely difficult.
(5) Companies are willing to experiment but are afraid to make the long-term commitment necessary to foster widespread adoption.
How can companies ideally address such obstacles? How can they eliminate this non-scalable, “one use case at a time” approach? Answer: The same way Henry Ford solved a similar problem a hundred years ago. Before the industrial revolution, assembling cars was a difficult and slow process, so much so that each vehicle assembly became a custom project. Recognizing the inefficiency and expense of this approach, Ford streamlined and accelerated the process with two key innovations: interchangeable parts and the assembly line. Ford’s approach allowed him to cut the average assembly time by nearly 90%, which significantly reduced the cost and ushered in the era of ubiquitous car ownership in the United States.
Similar innovations are now available in the Big Data space, and they are already revolutionizing the analytics industry. In our lexicon, the interchangeable parts are Signals, and our assembly line is called Signal Hub. Signals are mathematical transformations of raw data. This raw data has been highly processed into reusable variables that contain valuable descriptive and predictive information. Signals are modular units of intelligence that a business can readily use to test hypotheses, learn which combinations of factors accurately predict future activity, and formulate sophisticated business decisions based on rigorous data science. Signals can be descriptive (e.g., the total amount a customer has spent over the past 30 days) or predictive (e.g., a customer’s propensity to buy a certain product category).
Signal Hub is an integrated analytics environment that allows data scientists and analytics teams to create, use, and monitor a Signal Layer — not one model or one variable at a time, but thousands of them, executed in parallel. This architecture allows users to always have access to the freshest Signals relevant to their needs.
For example, in one implementation for one of our customers, we loaded over 100TB of historical data from 100+ source tables. We use that data to calculate 4,000+ Signals every day for each of their 90+ million customers by running over 50 workflows in parallel. Each workflow has automatic monitoring, alerts, and actions. For example, if a particular workflow has issues because of missing data or too much Signal drift, or changes in Signals’ value over time, the system can automatically roll back the workflow to the last successful run. This ensures that the Signal Layer always contains the latest and most accurate Signals, ready for consumption. Signal Hub then simultaneously feeds actionable insight to enterprise applications and execution systems to enable dozens of use cases, such as email marketing campaigns, enabling rapid, sophisticated, and precise business action.
Once this Signal Layer is in place, creating a new use case requires only days, not months, as is typical of Big Data analytics efforts that utilize traditional methodologies. Just choose the Signals, apply some business logic, and you are ready for execution. Complex data science is still taking place, but the complexity is abstracted and obscured, and the workflow complexity is eliminated in favor of simplicity. This design philosophy allows data scientists and analytics teams more time to think, experiment, learn, and drive value for the business.
Over the past few years, the industry has made substantial investments in tools that make “one model at a time” creation relatively easy. It is now time to tackle the next big challenge. Creating a Signal Layer for the enterprise is the approach that will move advanced data science from a sideshow position to a ubiquitous enterprise necessity.
Want to learn more about how Signal Hub can drive value for your business? Download our Signal Hub Technical Brief.
Anatoli Olkhovets is Vice President of Product Management at Opera Solutions.