Comparison of data loading strategy into QliKView


We do a lot of consutancy work and we decided to introduce version control after one of our customers modified transformations beyond use. Version control helped us to restore project to it's original state. Another situation when version control is very usefull is a large project with multiple developers.
Latest ETL Software changes
+ New Version control tab was introduced
+ Version control can submit several objects now
+ Version control dialogue can delete multiple objects now
- Various bug fixes
We have two ETL products Visual Importer ETL and Advanced ETL Processor.
Which one to choose depends on the complexity of the data and now you want to work with it
This article provide some examples of complex data
If you just want to load data from table, delimited file or simple XML file choose Visual Importer ETL.
If you work with complex data and require data validation, transformation or sorting and on top of it you also want to be able to generate the data select Advanced ETL Processor.
Both Advanced ETL Processor and Visual Importer ETL have 3 editions:
Standard, Professional and Enterprise:
Standard Edition is designed for desktop user who just wants to run data load or transformation manually.
Professional Edition is designed for user who wants to execute business automation processes manually.
Enterprise Edition is designed for user who wants complete automation of business processes.
In this article we are going to show you how easy it is to get support for our products.
First of all there are several ways of getting support.
If you are stuck we can call you and connect remotely.
It is very important to provide as much information as possible first time.
Our support team is very fast and quite often they resolve the problem within one hour.
For Advanced ETL Processor click “Help” and then click “Email repository and Log Files to support” than the default email client will open and the use can enter additional information and attach more files. Using same menu the use can send email to support team or visit support forum.
Same principle works for Visual Importer ETL and Active Table Editor.


We encourage you to use our support forum first, so other users will benefit as well. If your issue is urgent contact us, We are always happy to connect remotely and assist.
We have the following situation:
Imagine that you are working for large finance organization and would like to load the year end balances into SQL Server database. The excel file look like this.

So we create a simple SSIS package using Excel connector and Ole DB destination, before loading the data we hit preview button.

For some reason last 3 lines have null values, This is because our connection string has not got IMEX=1
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\DEMO\Excel\mixed_data_types.xls;Extended Properties="EXCEL 8.0;HDR=YES";
We correct it
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\DEMO\Excel\mixed_data_types.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

That looks much better so we load the data. Later same day, the finance manager says that the balance is wrong you look at the data and there are some null values again.
It is a common problem: trying to load the data from Excel file half of the data is coming as nulls, or columns with more than 255 characters are truncated
Our source file has mixed datatypes some account codes have anly numbers and some have characters as well. We have more cells with numbers only so Excel ODBC/Jet assumes the field type is numeric and shows the rest of it as nulls.
Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted into text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.
As partially explained here
http://support.microsoft.com/kb/257819
ODBC/MS Jet scans first TypeGuessRows to determine field type
(TypeGuessRows=8 IMEX=1)
In your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.
In your eight (8) scanned rows, if the column contains three (3) numeric values and five (5) text values, the provider returns three (3) null values and five (5) text values.
In your eight (8) scanned rows, if the column contains four (4) numeric values and four (4) text values, the provider returns four (4) numbers and four (4) null values.
In your eight (8) scanned rows all of them less than 255 characters the provider will truncate all data to 255 characters
In your eight (8) scanned rows, if the column contains five (5) values with more length than 255 the provider will return more than 255 characters
We can change TypeGuessRows and set it to 0. This works very well for files up to 16384 lines.
The valid range of values for the TypeGuessRows key is 0 to 16. However, if the value is 0, the number of source rows scanned is 16384.
Read this Microsoft KB article: http://support.microsoft.com/kb/281517.
For larger files it may not work again, so the best solution to the problem is not to use excel as data source but use text files instead. But this may bring another problem if user open CSV after it was exported it may loose leading zeros.
Set IMEX=1 in connection string
Close any programs that are running.
On the Start menu, click Run. Type regedit and click OK.
In the Registry Editor, expand the following key depending on the version of Excel that you are running:
Excel 97
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
Excel 2000 and later versions
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
Select TypeGuessRows and on the Edit menu click Modify.
In the Edit DWORD Value dialogue box, click Decimal under Base.
Set the value to 1
Open Excel file
Make sure that the cells in the first line of the table have relevant data for example
This solution apply to all versions of MS Excel ODBC driver, Ole DB, MS Jet, .NET, DTS and SSIS
We have spent enormous amount of time trying to get it fixed. So far we were not able to find a better solution.
The way Excel import works makes it not possible to automate it. You have to modify most of excel files manually in order to load them.
This why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work correctly with Excel all the time. Try it yourself now.

|
|
The repository database stores all the information about connections, transformation scripts, packages, sql scripts, reports and execution logging. This is where the results of ETL designer hard is stored and obviously no one wants to lose it.
Default repository.The default repository is MS access. This type of repository works fine for development and for small production environment. From time to time we recommend to perform “compact and repair” using MS access. For heavy production environment for example when we want to run packages every minute in parallel we recommend using something else like MS SQL server or Oracle.
Note:
Information about current repository connection can be seen at the window header
Repository can be created by running script or by using repository creation wizard.
They are located in C:\Users\Public\Documents\DBSL\Repository Scripts.
To create new MS Access repository copy
C:\Users\Public\Documents\DBSL\Repository Scripts\Repository\ Repository.mdb in to different directory and connect to it.
When software is installed default repository connections are created such as MS Access, Oracle, SQL Server, MySQL, PosgreSQL. When application first stars it connects to default MS Access repository.
Provided that you are using default settings all open objects will be saved and application will connect to different repository.
If you want to create new connection to the existing repository:
Same dialogue can be used to amend exiting repository connections.
While execution scheduled tasks the size of the repository is constantly growing. From time to time it might become necessary to clear execution log tables otherwise it might have negative impact on performance.

Information above is relevant to both Advanced ETL Processor and Visual Importer ETL and some of of it to Active Table Editor
![]() |
|
|
|
|
|
|
|
|
![]() |
![]() |