- Posts: 290
- Thank you received: 1
check number of occurrence - how to
8 years 6 months ago #12694
by orlandoc
check number of occurrence - how to was created by orlandoc
Hi,
How do I check number of occurrence? An example, I have a file DOM0000547.txt (see attached) this is piped delimited record.
I have to make sure I don't have duplicate value on column 2 (the value after the first piped)- "DOM0000547|22015031608830496"
so an example, this value "22015031608830496" occurs multiple times in this file. I want to validate if this value exist more than 1 in this file, discard this file.
How do I check number of occurrence? An example, I have a file DOM0000547.txt (see attached) this is piped delimited record.
I have to make sure I don't have duplicate value on column 2 (the value after the first piped)- "DOM0000547|22015031608830496"
so an example, this value "22015031608830496" occurs multiple times in this file. I want to validate if this value exist more than 1 in this file, discard this file.
Please Log in or Create an account to join the conversation.
8 years 6 months ago #12695
by admin
Mike
ETL Architect
Replied by admin on topic check number of occurrence - how to
We will create a working example for you
Mike
ETL Architect
Please Log in or Create an account to join the conversation.
8 years 6 months ago #12697
by admin
Mike
ETL Architect
Replied by admin on topic check number of occurrence - how to
Here is our example.
It uses demo files so it should work immediately.
The way it works is very simple
First we calculate number of occurrences for specific field within the file.
Than we calculate maximum number of occurrences within the file
(If it is more than 1 it must be rejected)
Than we join two streams of data using file name and reject the files based on number of occurrences.
I hope this information is useful for you
It uses demo files so it should work immediately.
The way it works is very simple
First we calculate number of occurrences for specific field within the file.
Than we calculate maximum number of occurrences within the file
(If it is more than 1 it must be rejected)
Than we join two streams of data using file name and reject the files based on number of occurrences.
I hope this information is useful for you
Mike
ETL Architect
Please Log in or Create an account to join the conversation.
8 years 6 months ago #12699
by orlandoc
Replied by orlandoc on topic check number of occurrence - how to
Is there a way to perform sql count technique like this when reading a file? ...
SELECT YourColumn, COUNT(*) TotalCount
FROM YourTable
GROUP BY YourColumn
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
I already have an existing transformation & package. I don't want to mess too much with my existing transformer just to count duplicate.
SELECT YourColumn, COUNT(*) TotalCount
FROM YourTable
GROUP BY YourColumn
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
I already have an existing transformation & package. I don't want to mess too much with my existing transformer just to count duplicate.
Please Log in or Create an account to join the conversation.
8 years 6 months ago #12700
by admin
Mike
ETL Architect
Replied by admin on topic check number of occurrence - how to
My understanding was that you want to reject ENTIRE FILE if any duplicates are found.
Your example rejects duplicated records only.
Can you clarify please.
Your example rejects duplicated records only.
Can you clarify please.
Mike
ETL Architect
Please Log in or Create an account to join the conversation.