Skip to main content

Data & Intelligence

How to Build a Winning Data Platform

Recently, at Informatica World 2019, I heard the importance of data platform in building AI capabilities for the organization. What is interesting is that Informatica, known for their products delivering the “Switzerland of Data”, is now using AI capabilities to enhance their own suite of products with CLAIRE capabilities. In further exploring a few other articles on the importance of Data, I also came across Monica Rogati’s Data Science Hierarchy of Needs and was impressed by the way she relates the AI structure to Maslow’s Hierarchy of Needs.

In a way, the “self-actualization” that Maslow defines as “achieving one’s full potential” is the AI capability. However, to get there, you need the basics of data platform foundation. Now an important distinction between Monica Rogati’s Data Science Hierarchy and my pyramid structure is the assumption that you would use the capabilities from software products such as Informatica which offers you GUI-based capabilities where you can focus more time on governance, analysis, and quality and less time on writing custom coding. So please consider that as you are reading this article.

Data Platform Model

Data Platform Path

It’s paramount to identify and clearly define the “use case” that the AI team is going after. Without a meaningful use case, just building machine learning and automation for the sake of exploration doesn’t provide any value. Once the use case is defined, find where the data resides in the enterprise or outside the enterprise (benchmark, 3rd party, etc.)

With commercial and open source tools available in the data marketplace, you can quickly build data integration to collect real-time or batch data into a data lake. Don’t overthink quality of data at this point.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

Once you collect data into a data lake, understand the data you collected by profiling the datasets and mapping them back to your use case. You can also define tags in your data to put a business context of your datasets. In addition, take effort to classify the data you collected into categories that make meaningful business sense.

Once you tag and classify your datasets, integrate data from multiple sources into one data model that can support your defined use cases. In some cases, this can also be enhancement of your existing data model to support multiple use cases.

Integration should also include data enrichment. So many open datasets such as weather, traffic patterns, currency, disaster, health conditions are available for the public to consume. In addition, third party datasets such as Dun & Bradstreet can help validate customer addresses.

It’s clear that to integrate such large, disparate datasets and build data models out of those datasets, your cloud or on premise data platform should be able to perform at scale. So use performance tuning and storage/compute techniques that will provide on-time results.

Good quality data doesn’t mean anything without showing results in a format that can be consumed by different audience levels (line level to executives). Reporting platforms such as Power BI, Tableau, and Microstrategy have been market leaders for a reason with their ability to build beautiful visualizations with streaming or large batch datasets. Hence large cloud vendors such as Salesforce have been acquiring BI companies like Tableau to enhance their visualization. Visualization for your data platform using BI software like MicroStrategy, Tableau, Power BI, etc.

Defining Metrics

One other important factor is to define the metrics and measures clearly to take actions based on facts.

Building the data platform is not a one time activity. Data similar to infrastructure needs continuous monitoring and improvement based on feedback from business subject matter experts (SME) who also act as data SMEs. Therefore, as you build your data platform use monitoring services and build notifications and alerts based on thresholds driven by business needs. Additionally, you can rate your data based on the relevance of the datasets to your decision making process. This will improve the quality of the data that is important for the organization. This activity will also improve prioritizing critical datasets over others similar to putting tighter SLA’s on important systems and their recovery procedures.

All the steps above will lead into building Machine Learning algorithms and automation processes that will provide relevant opportunities and direct impact to your organization’s bottom line.

While the above sequence of events will manage your data throughout the lifecycle of data preparation, data security and data governance play a key role to manage the data lifecycle as well. In addition, Dev Ops will provide agility to building data platform to keep the business moving and changing as mergers and acquisitions dominate the current landscape.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Arvind Murali, Chief Data Strategist

Arvind Murali is the Chief Data Strategist for Data Governance with Perficient. His role includes defining data strategy and governance to deliver transformative data platforms. Arvind has served as an executive advisor for data strategy and governance to organizations across several industries. Arvind’s dedication to solving challenges and identifying new opportunities has provided valuable business-focused results for clients, such as providing self-service access to data for global sales teams; helping physicians create informed wellness plans; and delivering insights about current supply chain inventories. He is a passionate Vlogger on YouTube and discusses real-world insights, data platform trends, and the importance of governance as big data continues its exponential growth.

More from this Author

Follow Us