Detect and remove duplicate or more similar records

More
7 years 7 months ago - 7 years 7 months ago #14449 by Kurt
I am new to this and this should be easy. My excel file contains row values that has been duplicated, triplicated and more. How should clean and remove the redundancies ?

Code Description
100 abc
100 abc
100 abc
101 bcd
101 bcd
101 aab
101 bba
102 bbc
102 bbc
102 bbc
102 bbc

Hoping to get the following output
100 abc
101 bcd
101 aab
101 bba
102 bbc
Last edit: 7 years 7 months ago by Kurt.

Please Log in or Create an account to join the conversation.

More
7 years 7 months ago - 2 years 3 months ago #14458 by KevinJohn
We can detect duplicates by first declaring columns that are supposed to be unique as the primary key and create an SQL query in the transformation that will test and compare each and every row for duplicates.

We can also work with ETL-tools deduplicator.





Further Reading
Last edit: 2 years 3 months ago by admin.
The following user(s) said Thank You: Kurt

Please Log in or Create an account to join the conversation.