check number of occurrence - how to

More
8 years 6 months ago #12694 by orlandoc
Hi,

How do I check number of occurrence? An example, I have a file DOM0000547.txt (see attached) this is piped delimited record.
I have to make sure I don't have duplicate value on column 2 (the value after the first piped)- "DOM0000547|22015031608830496"
so an example, this value "22015031608830496" occurs multiple times in this file. I want to validate if this value exist more than 1 in this file, discard this file.




File Attachment:

File Name: DOM0000547.txt
File Size:81 KB
Attachments:

Please Log in or Create an account to join the conversation.

More
8 years 6 months ago #12695 by admin
We will create a working example for you

Mike
ETL Architect

Please Log in or Create an account to join the conversation.

More
8 years 6 months ago #12697 by admin
Here is our example.

It uses demo files so it should work immediately.

The way it works is very simple

First we calculate number of occurrences for specific field within the file.
Than we calculate maximum number of occurrences within the file
(If it is more than 1 it must be rejected)
Than we join two streams of data using file name and reject the files based on number of occurrences.

I hope this information is useful for you

Mike
ETL Architect

Please Log in or Create an account to join the conversation.

More
8 years 6 months ago #12698 by admin
ATS file

Mike
ETL Architect
Attachments:

Please Log in or Create an account to join the conversation.

More
8 years 6 months ago #12699 by orlandoc
Is there a way to perform sql count technique like this when reading a file? ...

SELECT YourColumn, COUNT(*) TotalCount
FROM YourTable
GROUP BY YourColumn
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC




I already have an existing transformation & package. I don't want to mess too much with my existing transformer just to count duplicate.
Attachments:

Please Log in or Create an account to join the conversation.

More
8 years 6 months ago #12700 by admin
My understanding was that you want to reject ENTIRE FILE if any duplicates are found.
Your example rejects duplicated records only.

Can you clarify please.

Mike
ETL Architect

Please Log in or Create an account to join the conversation.