For the very few that are not familiar with the term “killer app” we will start with a definition. Merriam-Webster defines “killer app” as “a computer application of such great value or popularity that it assures the success of the technology with which it is associated.” In layman’s terms, it is the ‘thing’ that you must have and use, either because you want to or because everyone else is using it. And in order to use it, you must acquire the device on which it runs. The killer app is the ‘last straw’ that pushes slow adopters and those disinclined against the technology to finally adopt it. If all my friends and family are text messaging and I am the only one left out, I probably need to get a mobile phone and start texting – like it or not. It is said the spreadsheet was the killer app for PC’s and email was the killer app for internet access (although there is plenty of debate on these topics).
For many years data governance was the thing that we knew we should do because it was the correct thing to do but somehow it never got the priority it should get. It was like exercising or flossing or eating vegetables (depending upon your proclivities). We did it but not necessarily with passion or as regularly or as deeply as we should have.
Then along came artificial intelligence (AI) in all its many forms and practices: machine learning, deep learning, artificial neural networks, reinforcement learning, generative adversarial networks, predictive analytics, recommendation systems, natural language processing and the list goes on. All of these practices require data. AI promises game-changing advancements in business models, customer experience, personalization, preventive maintenance, automation, efficiency and many other areas. A September 2018 report from McKinsey and Company predicts that artificial intelligence (AI) will boost the global economy by $13 trillion by 2030, adding roughly 1.2% to global GDP a year. In short, companies that ignore AI do it at their own peril. That said, because the stakes are so high and the risk of being ‘leap frogged’ is very real, few companies are ignoring AI. The question is not ‘whether’ but how to apply AI.
Bad Data ≠ Good AI
We often hear that AI systems get better with more data. This is true. But what we don’t often hear is that the data must be good data (unless you are a data scientist who already spends 90% of her time scrubbing and cleansing data – then the quest for good data is your life). With the reports coming out now about implicit and unconscious bias in AI models we are just starting to hear about the implications of bad data in AI models. As AI is embedded in more and more areas within companies, as decision making becomes more AI-based and automated, the consequences for companies to continue with bad data will become more severe.
To make this more real, let’s look at the consequences of bad data on some common AI use cases.
First, let’s take a simple example using photos. If we were training a machine learning algorithm to identify oranges and we mislabeled a whole bunch of apples as oranges in our training data set, then the model will ‘learn’ incorrectly and think that apples are oranges. Apply this concept to facial recognition or self-driving cars and the potential negative consequences are self-evident.
If we are building a recommendation system and we use the wrong product code for customer purchases, we could end up recommending the wrong products to our customers. Recommending the wrong products or inappropriate products can be more damaging to customer relationships than recommending no products at all.
If we are building a system to predict HR attrition and we don’t have good data about compensation, overtime, work-life balance, job satisfaction, promotions, etc then we could wind up losing good people that otherwise could have been retained if we had better data for our predictive model.
If we are using machine learning to do predictive forecasting for our supply chain based on past customer spending patterns and we have bad customer order data, then our forecasts will be off and we might over purchase and over stock.
If we are building an AI system to predict when machines will need maintenance and we are automating the dispatch of the field service technician based on the predictions from the AI model, we could waste a good deal of money sending technicians to investigate healthy machines.
If we are creating a customer personalization system and we have bad customer data, we could do more harm than good (eg saying “Hello Michael” to Michele).
Simply put, the stakes have risen on living with bad data and practices that lead to bad data. The consequences of bad data in the age of AI, machine learning and decision automation are simply too high.
Data is a valuable business asset and should be treated like one. Good data is the foundation for AI and machine learning. AI may, in fact, be the ‘killer app’ that pushes even the stodgiest of companies to embrace a data culture and improve their data governance and data quality.