Skip to main content

Data & Intelligence

Informatica Data Quality – A Peek Inside – Part 1

Data analytics, Master data management has been a top trending Business Intelligence implementation in last couple of years. One of the key success criteria for these programs is to maintain good quality data. Businesses also demand more value from the data that is maintained in the enterprise data repository. So there is a strong emphasis to maintain high data quality which ensures completeness, accuracy & relevance of the information that is stored.

Informatica Data Quality (IDQ) has been a front runner in the Data Quality (DQ) tools market. This 2-part blog series will provide a glimpse into the features these tools offer.  IDQ has 2 variants:

  • Informatica Analyst
  • Informatica Developer

Two Variants of IDQ

Informatica analyst is a web based tool that can be used by business analysts & developers to analyse, profile, cleanse, standardize & scorecard data in an enterprise.

Informatica developer is a client based tool where developers can create mappings to implement data quality transformations/services. This tool offers an editor where objects can be built with a wide range of data quality transformations like Parser, standardizer, address validator, match-merge etc.

Develop once & deploy anywhere – Both tools can be used to create DQ rules or mappings and can be deployed as web services. Once the DQ transformations are deployed as services, they can be used across the enterprise and platforms.

The Importance of Data Profiling

Data Profiling – BI programs involve data in disparate systems. So, it is essential to profile the data in order to understand the content and structure of data. Both tools offer the capability for Data profiling.

These tools have a default profile option which shows the statistics in each column of data objects (flat files, relational table, etc.). A typical column profiling will show:

  • Column name
  • Number & Percentages of unique and Null values
  • Data Patterns for the column
  • Data type derived from the column data
  • Percentage of values that match the data type inferred
  • Data type defined, minimum & maximum values

Reference tables – IDQ enables users to maintain reference tables where they can define a set of allowed values. For example, a list of country/state codes can be maintained in reference table. When a column is profiled against reference table, it shows the number & detail of addresses that don’t match the country/state codes. Reference tables can be easily created from the list of unique values of column profiles and edit the table to add or remove values from it.

Rule – Rules are defined to validate if the data meets a business condition. For example, a rule can be created to check if an email has a domain name in it. Rules can be used while profiling or in data transformations.

Scorecard can be generated for a column to display a graphical representation of valid values. It also presents a trend over time so it can be used to measure data quality initiative progress.

The following screenshot shows a column profile and a list of Invalid records based on a rule.Informatica Data Quality Column ProfileUp next in this series – Basic data quality transformations…

Thoughts on “Informatica Data Quality – A Peek Inside – Part 1”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Albert Qian

Albert Qian is a Marketing Manager at Perficient for our IBM PCS, DevOps, and Enterprise Solutions Partners focused on cloud computing technologies.

More from this Author

Follow Us