Data analytics, Master data management has been a top trending Business Intelligence implementation in last couple of years. One of the key success criteria for these programs is to maintain good quality data. Businesses also demand more value from the data that is maintained in the enterprise data repository. So there is a strong emphasis to maintain high data quality which ensures completeness, accuracy & relevance of the information that is stored.
Informatica Data Quality (IDQ) has been a front runner in the Data Quality (DQ) tools market. This 2-part blog series will provide a glimpse into the features these tools offer. IDQ has 2 variants:
- Informatica Analyst
- Informatica Developer
Two Variants of IDQ
Informatica analyst is a web based tool that can be used by business analysts & developers to analyse, profile, cleanse, standardize & scorecard data in an enterprise.
Informatica developer is a client based tool where developers can create mappings to implement data quality transformations/services. This tool offers an editor where objects can be built with a wide range of data quality transformations like Parser, standardizer, address validator, match-merge etc.
Develop once & deploy anywhere – Both tools can be used to create DQ rules or mappings and can be deployed as web services. Once the DQ transformations are deployed as services, they can be used across the enterprise and platforms.
The Importance of Data Profiling
Data Profiling – BI programs involve data in disparate systems. So, it is essential to profile the data in order to understand the content and structure of data. Both tools offer the capability for Data profiling.
These tools have a default profile option which shows the statistics in each column of data objects (flat files, relational table, etc.). A typical column profiling will show:
- Column name
- Number & Percentages of unique and Null values
- Data Patterns for the column
- Data type derived from the column data
- Percentage of values that match the data type inferred
- Data type defined, minimum & maximum values
Reference tables – IDQ enables users to maintain reference tables where they can define a set of allowed values. For example, a list of country/state codes can be maintained in reference table. When a column is profiled against reference table, it shows the number & detail of addresses that don’t match the country/state codes. Reference tables can be easily created from the list of unique values of column profiles and edit the table to add or remove values from it.
Rule – Rules are defined to validate if the data meets a business condition. For example, a rule can be created to check if an email has a domain name in it. Rules can be used while profiling or in data transformations.
Scorecard can be generated for a column to display a graphical representation of valid values. It also presents a trend over time so it can be used to measure data quality initiative progress.