Automate File Processing

A Question from the Customer:

An external PLC generates txt files and those must be downloaded and converted into an SQL DB.
The PLC generates 4 files each day, at a size of 5MB pr. File = 20MB a day.
The file name is date and time named like this: 14022719.txt, the syntax is YYMMDDHH
The files are generated at the same 4 hours each day, 01, 07, 13, 19
In the files, first data row is a time stamp YYMMDDHHMMSS
It is important that one record with a specific date/time is written only once in the SQL.
The files would always be the same size.

ETL Job:

Each day, The ETL converter should copy the file to a local directory via FTP; convert the file to SQL, then delete the file from the local dir. It could be 4 times a day, maybe one hour after the txt file is completed on the PLC.
We can not delete the file from the PLC, since others should also have access to the data.
The data on the PLC will be deleted after 10 days using FIFO

The Problem:

How to I keep track of which files I have copied ?
Could create a TXT file on the local drive and write the file names, compare with what's on the FTP drive ?
But how do we do this? That is file comparison..can ETL do this ?
Is there any other way to do this ?
Keep in mind, it could be that the internet connection is offline for a day or 2, the server could be down, The PLC could be down. In some situations you will need to copy 4 files, sometimes 6 files..depends.. Or you could have gaps in the time stamps of the file names from the PLC if it has been down for a while. I know we would then also have gaps in the SQL, but that is OK.

The Solution:

There are several ways of performing this task

First of all, you need to speak with the company which provides the files. It takes time to upload files and you need to make sure that you are not downloading files while they are being uploaded. For example, you can ask them to put additional file once the process is completed. If file is there the process completed and you can download the files.

Next step is to load data into the database. That is easy, there are several tutorials about it.

Avoiding loading the same data twice.

First of all, we need to get source file name, this can be done by using a metadata transformation object

Then we need to check if the file was loaded before, we can use Validator “In List object” for that

and last step to load the data into the table and load the list of processed files into separate table

Overall logic as follows

  1. Download data from the ftp server using mask into the local folder
  2. Run transformation from the folder using mask same time populating a list of processed files and checking if the file was not processed before
  3. Copy processed files into a separate folder

For more technologies supported by our ETL Software see Advanced ETL Processor Versions and Visual Importer ETL Versions

Confused? Ask question on our ETL Forum

  • knowledgebase/automate_file_processing.txt
  • Last modified: 17/09/2018 09:43
  • by admin