In my last blog, I presented a brief overview on Informatica Data Quality (IDQ) tools, the significance of Data Profiling and how to use the Analyst tool to profile data. In this second blog, I will introduce a few commonly used Informatica Developer tool Data Transformations.
But, Why Data Transformations?
Long story short, Data Quality (DQ) Initiatives are executed to address the data errors and problems. Data profiling facilitates in identifying data that is inconsistent, redundant, incomplete or inaccurate. In order to maintain quality data, the problem data needs to be cleansed and transformed before populating it to the target data store. So, DQ business transformation rules needs to be defined, built and integrated with processes that interact with the data store. The Informatica Developer tool helps in building, managing and deploying the DQ transformation rules.
Informatica Developer Transformations – The Basics!
The Informatica Developer tool can be used to build DQ transformation mappings to cleanse, enhance & enrich data. It also enables users to profile data at any stage of a transformation. So we shall profile before and after transformations to demonstrate its effectiveness. A few of the basic transformations that are exclusive to the Developer tool are:
- Address validatior
Parser – This transformation reads input data, segregates it to parsed data (matching user defined pattern/token), Overflow parsed data & unparsed data. For example, A Parser can be used to get the first name from a full name. Another example is to get the domain name from an email. It can be executed to match data with token sets, regular expression, reference table entries or using data patterns with wild cards (*,+). This transformation has built-in pre-defined token sets for emails, zip codes, SSNs, phone numbers, Words, etc.
Labeler – Identifies input and tags each field with a type of character or strings. For example, a phone number can be labeled as NNN-NNN-NNNN or a full name “Thomas A Edison” can be labeled as “WORD INIT WORD”
Standardizer is used to transform/remove data format inconsistencies or errors. A typical standardizer transformation uses any one or more of following strategies:
- Replace Reference Table strings with valid value
- Replace Reference table match with custom strings
- Remove Reference table matches
- Replace custom strings
- Remove custom strings
For example, a Reference table can have values like Street, Road, Drive and valid values as ST, RD, DR respectively. The address line “6500 LEGACY DRIVE” can be standardized to “6500 LEGACY DR.”
Address validator transformation is used to evaluate an address against the address reference data provided by Informatica. This transformation provides accuracy (match score/percentage) & deliverability of the address input. It also enhances addresses by providing standardized address that matched. In case of no match or error, it provides the nature of error. The transformation can be executed in suggested line, batch & certified modes. The certified mode validates if the addresses meet postal carrier certification standards.
Integrate DQ rules with PowerCenter ETL
The DQ mappings created in the developer tool can also be exported to Informatica PowerCenter as custom objects so the batch processes that integrate different systems can also execute DQ transformation. For example, a DQ mapping with address validation can be exported to PowerCenter as a mapplet and used in a PowerCenter ETL mapping that feeds the host system. This way, the data integration/migration layer will also execute DQ rules while executing ETL processes.
IDQ offers many more transformations (like match, merge, comparison) and services that can be used in Data Quality implementations. Contact us for more information on why, where and how to apply Informatica Data Quality in your organization.