Background:
This article was created as an answer to one of our ETL forum posts, we hope it will help a lot of users in a similar situation.
"The scenario is based on files landing in a monitored directory. At this stage, I am not worried about volume or frequency. But will be handling each file individually for internal reasons. Each file has a unique name but a common format of xxx-yyyymmddhhmmssnnnn.ext
"xxx" represents a client prefix (eg. abc, bcd, cde). The format of each file for the same client will be the same. But the format is different for each client. The file extension '.ext' will generally be .csv, .xml, or even .xls.
The file transformations between clients will be different. However, the output is always the same format and the target is a SQL Server database table.
I am developing each of the transformations in isolation, but with the same basic structure.
My idea is to have a single monitor looking at a single folder for *.* That then executes a single Package based on the creation of a file. The Monitor passes the file to the Package that then does some environment management, put the source file in a staging location, then based on the file prefix conditionally (Probably through a CASE action) call the associated transformation and load to the SQL server and then clean up."
Notes:
The monitor (Directory monitor) is a windows service that is constantly monitoring a directory for changes. When it happens it adds predefined action to the execution queue. Monitor passes the file name as a variable into the action, this variable name is <Directory Monitor File Name>.
Solution:
"Get ClientID" script
is a starting point, it extracts the first three characters of the file name. The Result is stored in '<ClientID>' variable.
var
FileName: String;
ClientID: String;
begin
FileName := GetVariable('<Directory Monitor File Name>');
ClientID:=UpperCase(SubString(FileName,1,3));
SetVariable('<ClientID>',ClientID);
Result:=true;
end;
Check Client directory
Checks if the client directory exists and if it does not it creates it together with subdirectories.
Every transformation has an ID. We use this ID to execute a specific transformation for a specific customer.
Table Transformations
Holds mapping between ClientID and TransformationID:
CREATE TABLE [dbo].[Transformations]
( [ClientID] [char](3) NOT NULL, [TransformationID] [int] NULL,
CONSTRAINT [PK_Transformations]
PRIMARY KEY CLUSTERED ( [ClientID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]) ON [PRIMARY]
GO
"Insert Into Transformations"
Sql script inserts a record for missing clients:
insert into Transformations(
[ClientID],
[TransformationID])
values ('<ClientID>',0)
GO
Once the file is ready for ready processing it is moved into Staging Area.
Get transformation ID
executes the following SQL:
Select TransformationID
from Transformations
where ClientID='<ClientID>'
The result is saved into <TransformationID> variable.
Execute Transformation script
begin
ExecuteObject(GetVariable('<TransformationID>'));
Result:=true;
end;
Upon completion, the file is moved into Processed Area.
Directory Structure
Variable as Mask
Error message
Acts as a default error handler