Avoiding common problems when working with ETL

Introduction.

First of all, we do have documentation and recommended settings are described in the documentation, plus we have online tutorials.

Getting help fast.

The most important thing is to follow the rules and provide all necessary information first time.

Bad support request example:

Hi,
We have just downloaded your software and we have an error message.
Please help it is urgent

Which software? Which Error?

Repository.

Problem: Strange access violations when executing the packages

We do not recommend using MS Access Repository for production. It works fine for development, but over time it tends to get corrupted. Plus it has 2 GB database limit. When Access Repository  is corrupted users may get strange access valuations. If you wish to use Access as Repository do compact and repair once a week.

  1. Stop the agent
  2. Close Application
  3. Open repository in MS access
  4. Do “Compact and Repair”

Problem: Some of the data is not being saved into the repository when working with MySQL repository

Check settings for ODBC Dsn and described in the documentation
Increase max_allowed_paket_size
 
Problem: Not able to use PostreSQL repository

Check settings for ODBC Dsn and described in the documentation and make sure that they are exactly the same.

Mapping

Problem: We have added new field to the table and now our mapping is invalid and nothing work. Do we have to remap everything?

The mapping is position based to solve the problem you can

  1. Add field to the end
  2. Add another transformation before the writer and use automap

Problem: Not able to load data into Date fields.

We use standard ODBC format for date fields ‘yyyy-dd-mm hh:mm:ss’
If the data is already in this format no modifications are required.
If not use “Format Date” function in Advanced ETL Processor or apply format in Visual Importer

Avoiding loosing repository data.

  1. Make regular backups
  2. Use integrated version control.

Performance

Problem:I am working with bad data and transformation is very slow my logs are huge.

Too much logging slows down transformation.

Editing resident tables in QlikView

Release of Q-Eye allows QlikView users view and even edit residential tables.

Here are some examples:

1. Viewing Data

SUB INSPECT (T)
 LET LocalAppDataPath = GetRegistryString('HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion', 'ProgramFilesDir');
 LET QViewerPath = '$(LocalAppDataPath)\DB Software Laboratory\Q-Eye\QEye.exe';
 STORE $(T) into [$(QvWorkPath)\~$(T).qvd] (qvd);
 EXECUTE "$(QViewerPath)" "$(QvWorkPath)\~$(T).qvd";
 EXECUTE cmd /c del "$(QvWorkPath)\~$(T).qvd";
ENDSUB

2. Editing Data

SUB EDIT (T)
 LET LocalAppDataPath = GetRegistryString('HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion', 'ProgramFilesDir');
 LET QViewerPath = '$(LocalAppDataPath)\DB Software Laboratory\Q-Eye\QEye.exe';
 STORE $(T) into [$(QvWorkPath)\~$(T).qvd] (qvd);
 EXECUTE "$(QViewerPath)" "$(QvWorkPath)\~$(T).qvd";
 DROP TABLE $(T);
 LOAD * FROM [$(QvWorkPath)\~$(T).qvd] (qvd);
 EXECUTE cmd /c del "$(QvWorkPath)\~$(T).qvd";
ENDSUB


If we want to edit the table we just

CALL EDIT('Log');

Or if we want to view the table we use this code

CALL INSPECT('Log');

Note: By default Q-Eye uses partial load, to avoid problems load entire QVD file before editing it 


This example is based on:
http://bi-review.blogspot.co.uk/2012/12/how-to-look-inside-resident-tables-at.html

ETL and XSLT

Here is More XML example

XML With Attributes

To load it all we need to do is to set data source type to XML and select appropriate XML tags for "Table Tag" and "Record Tag"

Data reader settings :
XML with Attributes Datareader

Note: "Ignore Tags" is checked

There is no data in the grid, because we have to transform XML into more readable format first, using XSLT.
XML With Attributes Data

About XSLT:

XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents, or other objects such as HTML for web pages, plain text or into XSL Formatting Objects which can then be converted to PDF, PostScript and PNG.

Typically, input documents are XML files, but anything from which the processor can build an XQuery and XPath Data Model can be used, for example relational database tables, or geographical information systems.

Source: Wikipedia

XSLT:

XSLT Transformation

Note: To get to this dialog: open data reader properties, click XML file, check transform XML and click Magnifying glass button

Here is the result of the XSLT transformation:
XSLT transformation Result

Note: 

More information about XSLT can be found here: 
http://www.w3schools.com/xsl/xsl_transformation.asp

Transforming XML Data

Here is Simple XML example

Simple XML Example

To load it all we need to do, is to set data source type to XML and select appropriate XML tags for "Table Tag" and "Record Tag"

Data reader settings :
Simple XML Data Reader Settings

Note: "Ignore Tags" is checked

Here is our data in the grid:
Simple XML Data

More complex XML example:

XML With Additional Tags

Note that we have here additional tag <TableName>. To load this data we need to uncheck "Ignore Tags"

Here is our data in the grid:

XML With Additional Tags Data

Adnvantages and disadvantages of working with XML

Loading data from XML can be very complex task, but the complexity of this task depends of people who design XML at the first place.In this article we will provide you with some examples of loading data from XML files and transforming it. We will also talk about things to avoid and how to make life of developers easier.

What is XML anyway?

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards.
The design goals of XML emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

Source: Wikipedia

XML is incredibly flexible, but it has some disadvantages as well.

For example:

<CustomerOrderMessage>
<OrderNumber>1</OrderNumber>
</CustomerOrderMessage>

The XML above has only one byte of information the rest of it is metadata.Using too much metadata requires more processor power and increase network traffic(Which is great news for hardware vendors, but bad news for the people who have to pay for it)

The flexibility of XML can lead to unnecessary complexity and it can make it hard for developers to understand, therefore lead to mistakes and development time and cost increases.

In some of the cases it can be necessary to convert XML into simplified format so it can be loaded into the database.
This XML can be loaded by most of the ETL tools.

Note: CustomerTable is a "Table tag" and CustomerRecord  is a "Record Tag"

<CustomerTable>
<CustomerRecord>
<CustomerID>1</CustomerID>
<CustomerName>Peter Jones</CustomerName>
</CustomerRecord>
<CustomerRecord>
<CustomerID>2</CustomerID>
<CustomerName>Bill Watson</CustomerName>
</CustomerRecord>
</CustomerTable>

This XML may need to be transformed in the format above so it can be laoded into the database later:

<CustomerTable>
<CustomerRecord CustomerID="1" CustomerName="Peter Jones"/>
<CustomerRecord CustomerID="2" CustomerName="Bill Watson"/>
</CustomerTable>

And as a Conclusion here are some basic XML design tips:

  • Use XML when it is necessary
  • Too much metatadata is a bad thing
  • Keep tags short
  • Keep it simple and clean
  • Check ETL documentation and design XML in such a way so it can be loaded without conversion

JDBC and QlikView

In the latest version of our QlikView Connector we have introduced support for JDBC.

Note: The connector currently works with 27 datasources.

Screenshots

Click on thumbnail to see the screenshot

Read more... Read more... Read more... Read more...

JDBC drivers are supplied by database vedors. The list of JDBC drivers can be foud here and here

Read more...

ETL Stored Procedure Mapping

Great news: we have just introduced support for stored procedures as data target for all our ETL Products.

Stored procedure data transformation

Also

Performance improvements for MS SQL Server Connection
Various Bug Fixes
Documentation Update

A Stored Procedure is an important tool for working with databases. They allow greater flexibility than SQL statements, ETL developers and programmers use stored procedures for various tasks within databases.

When to use stored procedure as ETL data target?

  1. For complex calculations
  2. Loading data into multiple target tables
  3. Very large lookup transformations
  4. Validation against large list of values for example validating post codes

More Articles...