best practice for Transformations

More
9 years 4 weeks ago #11198 by thatisfun
All the sample for transformations are simple. like there are only several columns and simply rules need to be applied.

What is the best practice for large table and hundreds business need to reply. such like following cases:

Reader table: 60 columns
writer table : 80 columns

there are about 200 rules need to apply.


What is the best practice for this case? put every transformation rules in one will be huge.
  • using filter separate different case and put into different transformations?
  • change Database schema ?

What is the common best practice for this case?

Please Log in or Create an account to join the conversation.

More
9 years 4 weeks ago #11200 by admin
Thank you for posting such an interesting question.

The best practice would be to use divide and conquer strategy
Use several transformations instead of one
Use sub transformations
Use tabs inside transformations.
Use library of objects
Use version control

Another very important thing to understand that mapping is position based.
That means if you modify table structure by inserting field in the middle all your mapping will be shifted
This issue is very easy to address you can add fields to the end instead or
just insert another transformation and use auto-map function

Peter

Mike
ETL Architect
The following user(s) said Thank You: ckelsoe

Please Log in or Create an account to join the conversation.

More
9 years 2 days ago #11468 by thatisfun
Thanks a lot for your suggestions. we still have following questions.

1. reader loading procedure. let's say we load everything & process it. and write into another table. how do we update the table we read from and make sure we didn't read old record which processed already.

2. for Transformation design. what is the best practice. if the destination table schema change . how do we update the transformation node? looks like we have to redo it again. refresh we lost all the link. ideally it should only remove the link which changed.

Please Log in or Create an account to join the conversation.

More
9 years 2 days ago - 9 years 2 days ago #11469 by admin
1. reader loading procedure. let's say we load everything & process it. and write into another table. how do we update the table we read from and make sure we didn't read old record which processed already.

You can use sql as source

select * from table where updated_flag='N'

and two data writers

one will load data into target table
Anoter one will update updated_flag to 'Y'


2. for Transformation design. what is the best practice. if the destination table schema change . how do we update the transformation node? looks like we have to redo it again. refresh we lost all the link. ideally it should only remove the link which changed.

This question was already answered above

YOU DO NOT HAVE TO REDO EVERYTHING

Mapping is position based.
That means if you modify table structure by inserting field in the middle all your mapping will be shifted
This issue is very easy to address you can add fields to the end instead or
just insert another transformation and use auto-map function

Please post screeshots and follow the forum rules next time

Mike

Mike
ETL Architect
Last edit: 9 years 2 days ago by admin.

Please Log in or Create an account to join the conversation.

More
9 years 2 days ago #11470 by admin

Mike
ETL Architect

Please Log in or Create an account to join the conversation.

More
9 years 2 days ago #11473 by thatisfun
Thanks for your quick reply. however your sample is for one table. read and write to the same table. when it happens between two tables. how does ETL synchronize those processes?

I believe insert and update is linear operations. however the suggestion is parallel.

Please Log in or Create an account to join the conversation.