In today’s business world, the major challenge that enterprises face is to have one source of business data (Master Data) which is non-redundant, more consistent, accurate, complete and error free. Master data of an enterprise could be information about its customers, products, employees, suppliers, business locations, etc. Master data plays an important role in making key business decisions. An organization could have several systems as data sources where each system has its own data model and structure. So there could be more chances with high probability to have inaccurate and redundant data when there are heterogeneous source systems.
The first step in any MDM implementation is to check the quality of Master data from different data sources. This could be achieved by establishing a quality process that should collect, validate and cleanse the data from different data sources to have a single source and unified view of master version. Data Integration plays a vital role in the MDM implementation. Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources. Data Integration is a complicated process in reality. There is no unique and universal approach for data integration. Organizations go for different integration disciplines as per the business need.
Data integration enables enterprises merge different sources or consolidate different sources into a single data source. Data integration covers “Data Warehousing” and “Data migration”. ETL (Extract, Transform, Load) is a type of data integration process comes from data warehousing and covers how the data is loaded into warehouse system from different data sources. Some ETL processes involve cleansing. The sequence is then ECTL (Extract, Clean, Transform, Load). Today in software market we have several Data Integration tools from different vendors. Following are the list of some major ETL tools.
Tool Name Vendor Oracle Warehouse Builder (OWB)Oracle Data Integrator Oracle IBM Infosphere Information ServerDB2 Infosphere warehouse edition IBM SAS Data Integration Studio SAS Institute Power Center Informatica Informatica SQL Server Integration Services Microsoft Data Services SAP Business Objects Talend Studio for Data Integration Talend
In order to provide more reliable master data, MDM implementations must perform the following before or during the integration process.
Data Profiling starts by understanding the source data in detail. It enables the enterprise to assess the quality of source data. It gives you clear picture about data source structures, meta-data and relationships among them. The proper Data Profiling process avoids inconsistencies, anomalies, and redundancies. It can be applied to any type of source system. It could be a warehouse system or any master data store.
Data Audit enables an organization to assess how a particular data is fit to the business need. Data Audit requires extensive study and understanding of source systems. Data Profiling plays an important role in Data Audit. Proper Data Audit reduces high rate of data inconsistencies and redundancies.
Data Cleansing/Data Scrubbing
Data Cleansing or Data scrubbing is the process of identifying and correcting the errors during master data integration processing stage. Data Cleansing takes care of indentifying the inaccurate, inconsistent and incorrect data from data sources and correct them by replacing the relevant information. Sometimes Data cleansing may involve Data Standardization and Data Harmonization. It becomes a routine activity for data warehousing systems which continuously refresh huge amounts of data from variety of data sources. In these scenarios the probability of getting dirty data is high. So there should be defined proper Data Cleansing policies to get rid of or to avoid these improper data elements.
Data Consistency provides uniform view of data across an enterprise. It enables usability of single source of data which is more accurate across enterprise. Better Data Cleansing provides better Data Consistency which provides better quality of data. Data inconsistency can be checked initially during data loads from source systems and during the data transactions across an enterprise. We can setup certain validation rules to check the consistency of data. Data Concurrency is a major factor for Data inconsistency in multi-user environment where multiple users access the same data elements at the same time. There should be proper handling of Data Concurrency at transaction level to avoid Data inconsistencies.
Data Stewardship is the process of managing master data and providing users the quality of data as per the business needs. Data Stewardship must have defined certain guidelines and governance rules to access and use the master data. Managing the master data involves “Clear understanding about each data element and its meta data structure”, “removing the unused data”, “removing duplicates”, “making sure that data is being accessed or used as per the given access privileges and governance rules” “data consistency across enterprise”, “establish proper interfaces between data mapping systems” , “maintain proper data transaction audit information” etc.
Data Transaction Auditing
Transaction auditing is a way of capturing who is doing what to which entity in the master data management solution. This type of audit is critical in many implementations, especially for centralized (or transaction hub) implementations where the master data (or parts of it) are truly the system of record. In contrast, data auditing is a way of capturing changes made to an entity. This enables the ability to query what an entity looked like at a given point in time and investigate what changes have been made to an entity for a given date range. MDM implementations must provide both data auditing and transaction auditing capabilities.
The term Data Integration has broad range of definitions. It is a long process and involves all the above processes or activities to maintain integrity of data and provides single version of truth data across an enterprise.
IBM provides many tools to support successful MDM implementations. IBM Infosphere Information Server suite is a software package bundled with Information Analyzer, Data Stage and Quality Stage tools to provide better integration scenarios, techniques and interfaces to collect and store data from and to different systems.
Infosphere Information Analyzer
Information Analyzer does Data Profiling. It helps you in understanding the meta-data structures and relationships. It enables users to assess information over a wide range of systems and data sources. It enables of sharing of information and results across the enterprise. It uses data quality rules for greater validation and pattern analysis.
Infosphere Data Stage
Data Stage is an ETL tool that facilitates data integration methods for heterogeneous systems. Its underlying framework enables users to collect, transform and load data with rich set of techniques and interfaces. It provides more flexibility and scalability in build and deploy master data infrastructure. Its underlying architecture provides greater speed at processing master data.
Infosphere Quality Stage
Infosphere Quality Stage is responsible for maintaining quality of data across its usage. It does Data Cleansing / Data Scrubbing process. It identifies and corrects inaccurate and incomplete data sets during integration process. It helps users to create and maintain consistent view of data across an enterprise.