ETL - Enterprise Data Processing Solution (EMail, FTP, SQL)

More
11 years 2 months ago #5290 by CodeGnome
I could not think of a good and applicable subject.

I do have questions, but want to provide a little background first. We are planning to use Advanced ETL Processor Enterprise as a file processing engine for our 750+ users. These users fall into two basic categories, Clients and Vendors each of which have distinctly different file processing requirements. Each user within a category will have different transformation needs some of which will be in common with other users. So first I need to be able to create workflows (packages) that can accommodate these dynamics rather than building a transformation for each user. Users will send files via FTP.

Here is how I envision this working. Users will upload a file to our FTP site, the FTP Server software will move the file to an ETL home folder structure, and send an email to an ETL account. ETL will then monitor this account and retrieve emails that contain the user type, user id, and uploaded file name. Based upon this information ETL will retrieve the uploaded file from the FTP and begin processing it.

How feasible is this scenario?

Can this be built to run as dynamically as I have described?

I have looked through the help and this forum for assistance in developing the email portion and have not had any success, any assistance is greatly appreciated.

Marshall Hughes

Please Log in or Create an account to join the conversation.

More
11 years 2 months ago #5291 by admin
Marshall

1 There are several ways of doing that and the solution depends on the number of factors

How big are the files
Now many files do you get per day
How long should it take to process the data.
Eg what is the reasonable time between "the finish of ftp upload by the user" and the finish of your own processes.

If files are large I would suggest using FTP and would also make sure that all files are compressed.

For small files you can just use email.

2 Regarding extracting the information from the email.

There is a chapter in the documentation dedicated to working with variables (17.5).
The idea is very simple every time something is executed within the package some variables are populated.

For example when you receive email <LAST EMAIL> <SMTP Email> etc variables will be populated
You can than access those variables from the script eg parse the message body get relevant parts save them into different variables and use those variables somewhere else.

3 latest version support IMAP4 and in your case it would be a better option because once email is processed you can move it into diffrent folder (POP3 just deletes the message)

4 I also recommened watching out online tutorials
They will give you a good start

www.etl-tools.com/etl-tools/advanced-etl...online-tutorial.html
www.etl-tools.com/etl-tools/advanced-etl...online-tutorial.html

Peter

Mike
ETL Architect

Please Log in or Create an account to join the conversation.

More
11 years 2 months ago #5295 by CodeGnome
Thanks Peter, I'll take a look at the tutorials. Some of the files we receive will be large, I am currently testing with a 146,000 row file. We will receive multiple files every day. We would like the processing to happen as quickly as possible, but I must say the performance I am seeing during my testing is satisfactory. The purpose of the email is to notify ETL that a new file has landed, and ETL will retrieve it via FTP. The entire solution must be dynamic. Thanks for you input.

Please Log in or Create an account to join the conversation.

More
11 years 2 months ago #5303 by admin
Some performance improvement tips

Disable logging from data validator object and do not write anything into rejected records file

We are looking at it from our site as well

Mike

Mike
ETL Architect

Please Log in or Create an account to join the conversation.