Slow changing dimensions

There several types of dimensions which can be used in the datawarehouse

Type 0 changing dimension:

It is not really slowly changing dimension: We just load the data and keep it as it is. Typical example of it woul be list of postcodes

Type 1 slowly changing dimension:

Overwrite the old value with the new value. This is very useful when dealing with corrections. For example the dimensions holds list of countries some of them we misspelled so we want to current it
We don’t care about the history in this case because it was incorrect at the first place.

Type 2 slowly changing dimension:

Create a new record in the dimension with a new primary key. For example when customer changes the address we will create new record. Sometimes some additional fields are populated as well such current record flag, or effective date.

Type 3 slowly changing dimension:

Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change.
This type of dimensions can be used for loads when loan is fully paid.

Now lets have a look at practical ways of population dimension tables:

Dimension table Structure

Customer Dimension


Data is coming in as huge text file, which holds orders together with customer details.

Customer Dimension Data

 

Customer details are duplicated so we have to deduplicate it first

Customer Dimension Transformation

Type 0 slowly changing dimension

Check if record exist if not insert new record

Type 1 slowly changing dimension

For every incoming record
Count number of records matching the key
Than if Record Count <> 0
Update Customers table
or
Insert new record

Note:

We might be performing unnecessary updates plus for large and wide tables it could be very slow. One of the ways of avoiding doing it is using CRC(MD5) for checks. If CRC is the same no changes were mage to the record and therefore there is no need to update it

Type 2 slowly changing dimension

Deduplicate the data
Calculate Record CRC
If this CRC exist in the database discard the record
If not create new record

Type 3 slowly changing dimension:

Deduplicate the data
Calculate Record CRC
If this CRC exist in the database than do nothing
If not update record with new data

More Information about Advanced ETL Processor

Online tutorials

Visual Importer ETL  saved us 25-30 hours per month. Much of that comes from being able to build in error checking into the scripting. I've even been able to script corrections that run on the fly triggered by the error checking on incoming order files.

Gene Kovacs,
Director of Technical Business Operations,
A2B

Our customers

BP

BBC

HSBC

Orange

CAT

SEPA
emc
NHS
Australian Rugby Union
xerox
Databases we work with