Informatica: Removing Duplicate Source Records / Blogs / Perficient

Duplicate records are occasionally found in source data. Due to primary key constraints on a target database, only one version of a duplicate source record should be loaded into the target. The following methods demonstrate some of the most efficient ways to deal with duplicate source data.

1. Aggregator Transformation

When to use	Any time an Aggregator transformation has no major impact on performance.Source is relational. Or source is file and output row order does not matter.
Pros	Ease of implementation.Works with all types of sources.
Cons	Limits in choice of output row. Aggregator cache can slow performance of large loads. Duplicates are silently removed and therefore no auditing.

Use AGG Transformation and group by the keys that u want to remove dup. To improve speed, sorted ports can be used for presorted data. Make sure the first row listed in the order by clause in source qualifier is the same as the Group By port in the Aggregator. If your source is not relational, add the Sorter transformation.

2. Rank Transformation

When to use	When you want a specific row from each group and your source is a flat file.
Pros	Ease of implementation.Sorts non-relational data for custom row output.
Cons	Cache can slow performance of large loads. Limits in choice of output row. Duplicates are silently removed and therefore no auditing.

Revolutionize Your Business With Generative AI

From product design and software development to virtual agents, content creation, and reporting, GenAI is transforming business. Our AI experts help you unlock GenAI’s full potential and drive growth.

Let’s Get Started

No modifications needed for Source Qualifier. This is most useful if the source is not relational. Set Number of Ranks to 1 and Top/Bottom property.

3. Sorter Transformation

Send all the data to a sorter and sort by all fields that you want to remove dup. In the properties’ tab, select Unique. This will select and send forward only Unique Data.

4. qualifier

At Source qualifier, you can enable ‘Select Distinct’ Option. Or, you can also write your own query in SQL so only distinct rows would be selected. However it works for only relational sources. For flat file sources, you can do it in as pre session command:

sort abc.txt | uniq

Thoughts on “Informatica: Removing Duplicate Source Records”

Mathew January 21, 2014 at 5:15 am

You have listed out such a nice ways of removing duplicate records. But, duplicate data remove is become so easy by using deduplication software nowadays.
Khushbu February 3, 2016 at 10:55 am

Can someone let me know how can i remove duplicate tags in an xml in informatica

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Informatica: Removing Duplicate Source Records

by Linda Lin on July 30th, 2013 | ~ minute read

Revolutionize Your Business With Generative AI

Tags

Thoughts on “Informatica: Removing Duplicate Source Records”

Leave a Reply

Linda Lin

Categories

Follow Us