Everybody seems to be talking about machine learning these days, and a quick check of Google returns 21.2 million search results for the term. Clearly, this is a popular topic. Yet the term “machine learning” can have many different meanings, depending on the context in which it is being discussed. It is also associated with an equally lengthy list of data science techniques and technologies. Business leaders often feel overwhelmed by this rather bewildering array of terms, analytical approaches, and technology solutions. There are hundreds of algorithms, with new variants seemingly appearing every day. Researching them online does not seem to clarify the choices or point to an obviously superior decision, as most articles target deep experts and revolve around nuances of a particular model or open source package. As a result, business leaders can be reluctant to adopt something they don’t fully understand, resulting in missed opportunities.
While data science is indeed complex, the reality is that a business can address over 90% of its use cases with only a handful of techniques. In this short article, we attempt to demystify the field to bridge the gap between science and business. The most widely used machine learning techniques can be grouped into three categories: (1) supervised, (2) unsupervised, and (3) reinforcement learning.
1. Supervised learning techniques can be used when so-called “labeled” historical training data is available. Labeled data refers to data that has an informative tag, or label, which offers valuable information pertaining to a particular application (for example, knowing all customers who bought a product after an email campaign). Models can be trained on past events and then used to predict future outcomes. This approach is very common for predicting various propensities, such as the propensity to buy a product in a particular category or take a particular action. Popular techniques include the following (ranked from least to most sophisticated):
Linear and logistic regression: Used to predict an outcome based on one or more predictive variables.
Linear regression, specifically, is used to predict a continuous value (e.g., price), whereas logistic regression is used to predict discrete values (e.g., yes/no). A great deal of effort is usually expended for feature engineering, which is the process of finding appropriate predictive input variables. Benefits of these techniques are that they are easy to use and the training process is not computationally intensive.
Decision trees: Used to automatically discover a set of rules to split data points into increasingly smaller groups. There are several types of trees, including classification trees, or regression trees, which are used for different types of target variables. One of the main benefits of decision trees is the easy ability to explain prediction results.
Random forest and gradient boosting machine (GBM): Learning approaches for building a collection of multiple models (called an “ensemble”), most commonly decision trees, which together produce higher predictive accuracy than discrete or disconnected modeling approaches can deliver. This approach is particularly adept at handling noisy data or data sets with large volumes of missing variables.
Deep learning: Uses algorithms that mimic the operation of the human brain, analyzing raw data via a series of processing layers, each extracting a higher level of abstraction features (e.g., starting with millions of pixels, ending with recognizing it’s a picture of a cat). Deep learning techniques are extremely intensive computationally, but recent advances in both software and hardware have made their wide applicability possible. With proper training, deep learning networks can be used in the widest range of applications and can pull out the most nuanced nuggets of information from raw data.
2. Unsupervised learning techniques are used when historical outcomes (called “labels”) are unknown, requiring a user to learn the patterns in the data itself. These techniques are most commonly used for customer segmentation and anomaly detection. Popular techniques, from simpler to more sophisticated, are as follows:
Clustering: Grouping data into clusters (or segments), with entries in each segment having similar characteristics. This technique is most commonly used for customer or product segmentation. The most popular techniques are k-means and k-medoids. More advanced models, which may work better for noisy data, include GMM (Gaussian Mixture Models) and DBSCAN (density-based spatial clustering of applications with noise).
Dimension reduction: A technique to represent complex data with fewer parameters (dimensions). The most common technique is principal component analysis (PCA), which converts many potentially correlated variables into a smaller set of independent ones.
Deep autoencoder: Uses neural network techniques to find hidden patterns in data. The neural net is first trained using supervised learning, and then it can be used for unsupervised feature extraction, dimension reduction (similar to how one would use PCA), and anomaly detection.
3. Reinforcement learning is used to train systems that take a particular action, get feedback, and then use that feedback to improve subsequent actions. These techniques are most common in gaming systems (e.g., learning to play a game by making moves, with the feedback being winning or losing). However, they are not commonly applied in the enterprise space, due to the difficulty of setting up experimental “moves” and creating an appropriate feedback loop.
No single technique is best. Data scientists often need to try many approaches, spend time preparing the data and engineering features, tweaking model and learning parameters, and picking what works best for their targeted use cases.
How does one push the frontier of predictive accuracy? By combining multiple models. For example, users may put a PCA (principal component analysis) model before a logistic regression model to help with feature engineering. In the extreme example, a team from Opera Solutions won the 2015 KDD Cup, a premier international competition for data scientists with 821 teams participating, by using an ensemble of 60(!) sophisticated machine learning models — 26 gradient boosting machines, 14 neural networks, 6 logistic regressions, and many others. Each model was tuned to a particular step of the analysis, and together they extracted the last ounce of accuracy from the source data.
While it takes 60+ models to win competitions, most business problems can be solved with just one or two of the techniques described above. Their accuracy will already be far greater than that of the rules-based approaches commonly used today. Getting started with data science for business problems does not have to be mysterious or complex.
Want to learn more about how to drive business results with machine learning and Big Data? Check out our white paper, "Delivering Big Data Success with the Signal Hub Platform."
Anatoli Olkhovets is Vice President of Product Management at Opera Solutions.