Fast, powerful, flexible, feature rich ETL tools. Complete business and data transformation automation.

Text, XML, Excel, Access, DBF, Foxpro, ODBC, OLE DB, MS Sql Server, Oracle, MySql, PostgreSQL, Firebird, Interbase, SQLite, POP3, SMTP, File System, FTP, SSL, Unicode.

Built-in scheduler, business rules designer, package designer, report designer, data browser. Powerful data transformation language, regular expressions and more...

HomeArticles

Datawarehousing Articles

Calculating checksum in datawarehousing

Checksum is a calculated value that is used to determine the integrity of data. Checksum serves as a unique identifier for the data (a file, a text string, or a table row). If the data changes then so does the checksum value. This makes it easy to verify the integrity of the data.

Checksum is very useful in data warehousing especially when working with very wide slow changing dimensions. It takes less processing power to compare one checksum field than to compare all fields within dimension.

dbSL ETL software can calculate check sum for entire file, string or row of data. it supports following algorithms CRC16, CRC24, CRC32, Adler32, CRC64, eDonkey, eMule, MD4, MD5, RIPEMD160, SHA1, SHA224, SHA256, SHA384, SHA512,Whirlpool. It can return the data in HEX or BASE64 format.

Calculating checksum

 

Time dimension for PosgreSQL based data warehouse

This sql script will create an populate Time dimension for PosgreSQL based data warehouse


CREATE TABLE time_dim
(
time_key integer NOT NULL,
time_value character(5) NOT NULL,
hours_24 character(2) NOT NULL,
hours_12 character(2) NOT NULL,
hour_minutes character (2)  NOT NULL,
day_minutes integer NOT NULL,
day_time_name character varying (20) NOT NULL,
day_night character varying (20) NOT NULL,
CONSTRAINT time_dim_pk PRIMARY KEY (time_key)
)
WITH (
OIDS=FALSE
);

COMMENT ON TABLE time_dim IS 'Time Dimension';
COMMENT ON COLUMN time_dim.time_key IS 'Time Dimension PK';

insert into  time_dim

SELECT  cast(to_char(minute, 'hh24mi') as numeric) time_key,
to_char(minute, 'hh24:mi') AS tume_value,
-- Hour of the day (0 - 23)
to_char(minute, 'hh24') AS hour_24,
-- Hour of the day (0 - 11)
to_char(minute, 'hh12') hour_12,
-- Hour minute (0 - 59)
to_char(minute, 'mi') hour_minutes,
-- Minute of the day (0 - 1439)
extract(hour FROM minute)*60 + extract(minute FROM minute) day_minutes,
-- Names of day periods
case when to_char(minute, 'hh24:mi') BETWEEN '06:00' AND '08:29'
then 'Morning'
when to_char(minute, 'hh24:mi') BETWEEN '08:30' AND '11:59'
then 'AM'
when to_char(minute, 'hh24:mi') BETWEEN '12:00' AND '17:59'
then 'PM'
when to_char(minute, 'hh24:mi') BETWEEN '18:00' AND '22:29'
then 'Evening'
else 'Night'
end AS day_time_name,
-- Indicator of day or night
case when to_char(minute, 'hh24:mi') BETWEEN '07:00' AND '19:59' then 'Day'
else 'Night'
end AS day_night
FROM (SELECT '0:00'::time + (sequence.minute || ' minutes')::interval AS minute
FROM generate_series(0,1439) AS sequence(minute)
GROUP BY sequence.minute
) DQ
ORDER BY 1

Based on on information provided here

http://wiki.postgresql.org/wiki/Date_and_Time_dimensions

 

Loading current date into database field

Typical task pupulating database field with current date can be be done in several ways

First of all we can use GetSystemVariable calculation function:

GetSystemVariable(VariableName)

GetSystemVariable returns value of 'VARIABLENAME'.

Possible values for 'VARIABLENAME' are:

COMPUTERNAME
OSUSERNAME
DBUSERNAME
BLOCKNUMBER
LINENUMBER
RECORDNUMBER
SYSTEM_DATE
SOURCE_FILE_NAME
SOURCE_TABLE_NAME

GetSystemVariable('SYSTEM_DATE') function teturns date according to the regional settings of computer.
It very important to use currect date format form the regional settings screen we can see that the format is dd/mm/yyyy

Windows Vista Regional SettingsAnother way to determine date format of GetSystemVariable('SYSTEM_DATE') is to execute it against non date field

Mapping using MS SQL Server connection with no date format

GetSystemVariable-No-FormatData preview screen
GetSystemVariable-Data-Preview

Mapping using MS SQL Server connection with currect date format

GetSystemVariable-CurrectFormat

If we specify wrong date format I won’t be able to load the data
GetSystemVariable-WrongFormat

As you can see order date field is missing.
DataPreview-Null

Another way of getting data is by using GetSystemDate function

GetSystemDate(Format):String, Returns Current system date/time in format specified

Expression-Editor-GetSystemDate
And last way is to use database functions for example getdate() for sql server

GetDateNote:

It does not work for MS SQL Server Connections use ODBC connection instead.
You may get error message saying that field must have date format. Just assign any date format.

 
  •  Start 
  •  Prev 
  •  Next 
  •  End 

Page 1 of 6

Testimonials

"I've been very impressed with Advanced ETL Processor. It is extremely powerful and can validate and transform practically any data set you work with."

David Gig,
Director of Information Technology

Our customers

BP

BBC

HSBC


Databases we work with

Go to top