Diagnosis & Prescription are the terms often used in medical context. However today I want to walk you through how these terms are related to Data Analysis.
There are mainly 4 types of Data Analysis:
Focus of this blog is on Diagnostic & Prescriptive data analysis.
Diagnostic Data Analysis:
Diagnosis means identification of nature of illness by the examination of symptoms. The term Diagnostic data analysis is similar when something is not right – data illness. In that case you analyze the data, facts, figures and conclude what has happened and why.
There are primarily two complaints when we need to do Diagnostic data analysis:
- Data/reports are incorrect
- Why the retail offers went in loss
From my experience I can say that Data loss or incorrect data is mainly due to data flow across different systems. It is very similar to “The Whisper Challenge” game. Game starts with first person saying “Million is One with 6 zeros” and the last persons receives “One in a million is zero”
As a data analyst, we need to be a good listener, a conscious reader to understand the problem. Following are the preferred approaches for diagnostic data analysis:
- Identify “What has happened”: Here you can read the problem again and list down your understanding & impacts for data illness
- When possible replicate the issue in Dev/QA environment and begin the analysis.
- Identify the source of the data: Here you go to the root of the data by tracing the system which generates the data, how that is passed on to downstream/upstream and check if any gaps are there. You also verify if it reached to the target system in the expected format.
- Identify the correlations: If there are interdependent processes then identify and analyze each one of them
- Identify the integration points:
- Once data is received at the downstream/upstream what are the integration points where the data is being processed/transformed
- Check all the programs involved in data transformation and analyze them one by one
- Best approach here is take few sample records and analyze them thoroughly. for e.g. take any identification number, verify the data for that in each and every table one by one.
- If any of the program involves UNION of large amount of data, then comment it to limit the number of records.
- If an ETL tool is merging multiple files then disable the nonessential workflows
- Try to compare similar data, do not compare apple with oranges.
- Data Validation & Verification:
- Verify the source and destination counts
- Verify the aggregates if any
- Verify the data accuracy
- Check the data quality
- Check the transformations
As I am writing this, I remember this scenario:
A large Pharma chain has offered 30% off to senior citizens and it was going good initially. However, the business observed that the profit margin had declined a little in the subsequent quarter after the offer was launched.
When we did the analysis, we observed that it was because instead of increasing the customer footprints it got reduced. The shopping based on the customer data looked like senior citizens across households just collected the orders for extended family and shopped everything together at 30% discount.
The goal of the Diagnostic data analysis is to uncover the root cause of the problem in hand.
Prescriptive Data Analysis:
When the diagnosis is completed, you get prescription ready or when the business asks some solutions based on the diagnosis you give the prescription. Prescriptive data analysis is often defined as “What should we do next?” So, it is called as future of data analytics.
Following are some ways to do Prescriptive Data Analysis:
- Define the business result that you need to achieve
- Gather the data from all relevant sources
- Data cleansing to bring uniformity for data analysis
- Create your own models to test this data. You can also select a suitable off the shelf products for predictive analysis. I am not going to recommend any particular here, you can pick the one that suits your case and style.
- Evaluation and Validation – The model you created in step 4, needs to be evaluated, validated for it’s robustness when feed with wide range of data.
Often big retailers come to IT and ask for reports based on planogram and buying patterns. The data analyst apply various prescriptive analysis techniques to uncover the customer buying habits. On the basis of results they provide insights to the business team to rearrange store planograms.
A point to note here is, in retail the store planogram contributes a lot on compulsive buying. Let’s take a moment here to understand what planogram means. Planogram is store map which defines the location of items and how the aisles are planned in stores. Biscuits beside tea and conditioner beside shampoo are the common examples for planogram design.
This image is very common in every retail store planogram design; eggs beside bread impose compulsive buying which has come out over years of inherent human understanding.
Lets come back to our example from Diagnostic data analysis for Large pharma. So as I mentioned earlier that 30% discount offer to senior citizens impacted the retailer. Thus we did data analysis and based on buying behavior we suggested to replace the offer as 30% off to senior citizens on products used by them like adult diapers, walking stick & some over the counter drugs. This was prescription to the above loss diagnosis.
This helped to increase the footprints and also increased the revenue as now one comes to buy a cough and cold drug also buy few other front-end items like dairy, cosmetics or seasonal items and that increased the revenue.
The goal of prescriptive data analysis is to help business/practitioner to make data driven decisions which can help in improving profit margins or end user experience.
Let’s deep dive in data.