Here at Opera Solutions, we often refer to data equity, which we define as not just the amount of data you have, but also the ability to derive value from it. And to get value from your data, you need to ensure it is high quality. But how? Knowing the answer could make the difference between data equity and data bust.
Companies have good reason to doubt the quality of their third-party data. Data brokers have grown into massive enterprises, supplying the world’s major companies with the data needed to connect with customers. Not only is selling data a rapidly growing industry, it has also been a largely unregulated one. Brokers know that companies are willing to pay a premium for particularly large and complete datasets. And with very few methods for companies to validate third-party data, brokers focus on quantity over quality.
Things may be changing for the better, however, as the data brokerage industry matures. Certain protected categories of data, such as medical data or personally identifiable information (PII), are more strictly regulated. And with the General Data Protection Regulation (GDPR) taking effect in the EU in May 2018, companies that do business in Europe will be pressured to properly vet third-party data and their data collection methods to maintain the accuracy of their records. Outside of Europe, however, most data is still unregulated, so it’s best for businesses to take matters into their own hands and verify the data they’re buying is accurate.
To illustrate some of the inaccuracies that can be found in purchased data, we decided to conduct our own informal test. We looked up our own profiles with a popular third-party data vendor, and the results were as expected, which is to say, inaccurate. For one author, some household attributes were correct, but almost all the individual attributes were wrong. Her address and home data were over five years out of date. The vendor also claimed she had two teenage children and a mortgage, both untrue. Only one credit card purchase at “selected retailers” was recorded in the past two years even though she uses a credit card regularly. The household interests section was also inaccurate: The vendor listed her as interested in 42 categories, but only 10 were correct.
Unfortunately, the date for our second author was no more accurate. Single with a college degree, he was profiled as married with only a high school diploma. While he has never owned a MasterCard credit card, this vendor thought it was his primary method of payment. Under “interests,” not one of the listed categories was remotely correct: As an apartment-dwelling New Yorker, he had no interest in gardening, children’s items, or cooking, despite those being listed as primary interests in the dataset. In fact, the only data that was correct was birthdate, gender, and knowing he had a Visa credit card — not exactly sophisticated insights.
This is just the tip of the iceberg in terms of data accuracy problems. While virtually all companies run their data through rudimentary quality control processes (cleaning, de-duplicating, and standardizing are the first steps toward boosting data accuracy), this is not enough. Other methods can go even further to ensure that the data businesses rely on is accurate and dependable. In particular, here are three rules of thumb that can help you overcome inaccuracies in your third-party data and extract the most value for your business.
1. Know Your Data
The first and easiest step to illuminating data quality problems is simply exploring your data and applying common sense. Those who frequently work with B2B and B2C marketing data and are familiar with Big Data will be aware of how datasets should look and may be able to catch surface-level problems. They can see, for example, if the count of records in a certain dataset is on the right order of magnitude, if the match rate holds steady from one month to the next, or if the number of null values for a field is within an acceptable range for an analysis or model to be run. Data brokers won’t perform these checks, but you can.
To explore your data further, you need a platform that enables data to be uploaded, accessed, and manipulated by both data scientists and business users. Data scientists can create charts, pull out summary statistics, and compare refreshes to historical datasets to determine whether any anomalies exist. Business users can check for trends and patterns and apply business rules and sense to flag potential problems.
2. Analytics Add Value
“All models are wrong, but some are useful.” Statistician George Box was first credited with this quote in 1976, before the idea of Big Data even existed, but it is as relevant in Big Data today as it was to statistics in the 70s. Third-party data may be inaccurate, but it is still useful in aggregate because there is enough accurate information to generate some reliable insights, and the inaccuracies are largely random. This means that models and analyses applied to large numbers of individuals, households, or businesses can produce useful results even though drawing conclusions about a single record or data point is difficult. A platform that has built-in capabilities to generate these analyses and predictive models can simplify the process of digging around in a messy dataset to unearth business insights.
3. The More Data Sources the Better
In that same vein, all data brokers are wrong, but they are wrong in different ways, which is why more data sources will give you a more accurate picture of your customer base. Since data brokers use multiple sources and methods to collect and aggregate their data, the fields and records they offer for purchase will differ. Even if they claim to offer thousands of data points per consumer or business record, they often specialize in certain areas, and the accuracy and completeness of fields vary depending on their source. The records they supply can also differ, with varying levels of coverage based on demographics, location, or industry.
By knowing these tendencies, purchasing data files that complement one another, and combining them with first-party data you trust, you can create an aggregated dataset that has fuller and more accurate coverage of your target audience. Using multiple data sources can help fill in gaps, cross-validate data, and weed out inconsistencies. This does require more effort and a greater initial investment, which may not be feasible for smaller companies, but remember that even incremental improvements can give your company a competitive edge. Being able to target more customers more accurately will pay off in the end.
And you don’t have to do it alone. A well-designed Big Data analytics platform can provide many of the functionalities that are required to merge multiple data sources effectively. With the appropriate solution design and some tweaking, an automated end-to-end data workflow can be scheduled or triggered to ingest, cleanse, standardize, verify, match, and dedupe multiple data sources to produce a single merged dataset that is ready to plug in to marketing campaign or business intelligence processes.
To obtain a clean, trustworthy third-party dataset that can be leveraged for your business needs, you need a platform that can do all of the above — exploring and visualizing data to catch obvious issues, applying analyses and models to extract insights, and merging multiple datasets to obtain a golden record. This will help you move toward true data equity. Opera Solutions’ Signal Hub platform solves all these challenges and more.
Download our Signal Hub Technical Brief to see how Signal Hub can help your company manage and extract value from all your data sources.
Nicholas Wetherbee and Alissa Zhang are associate product managers at Opera Solutions.