The Hadoop Summit conference, hosted by Hortonworks and Yahoo, has become a must-see Big Data event. The Hadoop distributed computing architecture is now an integral part of what it means to be a data scientist, and a few days of concentrated effort each year is enough to get a vision for where the industry is headed. The Hadoop Summit serves this purpose well by providing thought-provoking technical sessions, keynote addresses, and a vendor exhibition that brings many of the major players in the Hadoop ecosystem together under one roof.
Edited by Yan Zhang
Healthcare fraud, waste, and abuse (FWA) are national problems that affect all of us either directly or indirectly. National estimates project that hundreds of billions of dollars are lost to healthcare FWA on an annual basis. These losses lead to increased healthcare costs and subsequently increased insurance premiums.
As a relatively new term, “data science” can mean different things to different people due in part to all the hype surrounding the field. Often used in the same breath, we also hear a lot about “big data” and how it is changing the way that companies interact with their customers. This begs the question — how are these two technologies related? Unfortunately, the hype often masks reality and worsens the Signal-to-noise ratio when it comes to our increasingly data-driven society. Rest assured, there truly is something deep and profound representing a paradigm shift in our society surrounding data, but the hype isn’t helping to clarify data science’s exact role in Big Data. In this article, we strive to put to rest many of the misunderstandings surrounding data science.
Switching to a new medical coding system won’t be easy, but when combined with data science and machine learning, ICD-10 presents enormous potential benefits for both the financial and the clinical sides of healthcare.
Part of why the healthcare industry is such a notorious laggard in jumping on the Big Data bandwagon is that every attempted change faces a huge domino effect, rendering many good ideas useless until everyone — and everything — is ready. One big step in the right direction, however, is an important upgrade to the computerized codes used for electronic medical records (EMR), which will take hold in the next year or two. These codes, known as ICD or International Classification of Diseases, determine what ailments patients have and how much they and their insurers should pay for a treatment. The set of codes, currently called ICD-9, is scheduled for its 10th revision this fall (but there may be a year-long delay). The new ICD-10 codes allow for much greater detail than the existing codes in describing illnesses, injuries, and treatment procedures. Through the use of data science techniques, this increased granularity is expected to allow for improved tracking of public health threats and trends as well as better analysis of treatments.
You may not be paying much attention to data science, but it’s paying attention to you — and will deliver personalized search results in due time.
For those of us who prefer to stay on the shopping side of recommender engines, how online retailers seem to know which hotels we’ll book, what flights we’ll take, or what brand of pots and pans we’ll buy next is simply math, or for those of us with a little more knowledge, algorithms. For most everyone else, it pretty much falls under something more akin to science so advanced it might as well be magic. Of course, for the machine learning data scientists who create these recommender engines, all engines are different, complex — and definitely not magic.
Think you know English? Think again. See if you have what it takes to teach a computer how to understand humans.
Anyone who has tried to learn English as a second language is only too familiar with its many — many — challenges. In addition to idioms, sarcasm, and a wide array of meanings when combined with various prepositions (think: make up, make out, make it, and of course, makeup), there’s also pop culture, trends, products, and more to keep straight. Luckily for us, we’re human, and even those well established in their native languages will be able to speak and decipher English with enough practice and exposure. But what about machines? How do we even begin to program them in a way that they can read and understand sentiment? Answer: very carefully. The process requires machine learning data scientists to use Natural Language Process (NLP) techniques, a form of advanced analytics. They use these techniques to build models that can decipher sentiment and weed out the meaningful information among the noise.
A new report, featured in The New York Times, highlights the benefits of Big Data and machine learning in the airline industry and beyond.
The travel industry has always been a vast data collector. Every airline reservation, every hotel booking, every car rental ends up in a conventional database of structured data. But today Big Data — the unstructured data that includes ratings on blog sites, likes on social media, conversations with call centers, customer clickstreams, and more — is becoming increasingly important in determining how travel companies keep customers coming back.
Opera Solutions’ Arnab Gupta explores the vast opportunities in Big Data on CNBC’s “Squawk Box.”
"The Big Data phenomenon is so vast, it can support 50 IBMs."
That was just one of the thought-provoking statements Opera Solutions’ CEO, Arnab Gupta, made during his appearance on CNBC’s “Squawk Box” as he spoke with hosts Andrew Ross Sorkin, Becky Quick, and Joe Kernen on April 24th.
After spending several painful months negotiating contracts with your vendor partners and haggling over every cent for IT resource rates, you finally managed to get the best rate card across multiple roles and geographies. So with a little help from some run-of-the-mill, rate-compliance monitoring system, the hundreds or thousands of projects with these contracted vendors will run on cruise control, right? Wrong. A few months down the line, you’ll notice you are paying more for every hour of work than you were prior to the rate negotiations, even though the rates were negotiated down by 5–10% across the board.
Here at Opera Solutions, we eat, drink, and breathe the concept of Man + Machine. But its implications hold power in any organization that’s trying to leverage its data to improve its bottom line. In medicine, trading, and many other domains, humans can achieve extraordinary (possibly even super-human) results when partnered with a machine. This is because people and machines each play an important role in extracting meaning from Big Data flows.