This article describes common problems of loading data into the database from different operating systems.

Typical situation:

You have a sales data warehouse. Every day there are hundreds of small files to process. They come from various locations.
ETL tool downloads the files into one folder than loads them into the Data warehouse using a mask.

One morning you get a call from the Manager saying that the load failed and there several thousands of files to process...

Loading data files from different operating systems

So what could be the problem?

Maybe the file format has changed.

You open the file in the notepad and looks the same...

Most likely the format is different but you just can't see the difference

There are only two possible causes.

  • Different encoding  for example UTF8 instead of ASCI
  • Different end of line character for example LF instead of CR+LF

Data is coming from different data sources some users use Windows, some us MAC, some use UNIX/Linux

Systems based on ASCII or a compatible character set use either LF  (Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r', 0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF, 0x0D 0x0A). Some rare systems, such as QNX  before version 4, used the ASCII RS (record separator, 0x1E, 30 in decimal) character as the newline character.

LF:    Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS X, FreeBSD, etc.), BeOS, Amiga, RISC OS, and others
CR+LF: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/M, DOS (MS-DOS, PC-DOS, etc.), OS/2, Microsoft Windows, Symbian OS, Palm OS
CR:    Commodore 8-bit machines, TRS-80, Apple II family, Mac OS up to version 9 and OS-9
RS:    QNX pre-POSIX implementation.

The Unicode  standard defines a large number of characters that conforming applications should recognize as line terminators:

LF:    Line Feed, U+000A
FF:    Form Feed, U+000C
CR:    Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL:   Next Line, U+0085
LS:    Line Separator, U+2028
PS:    Paragraph Separator, U+2029

The Solution

Use Advanced ETL Processor

Latest Version of Advanced ETL Processor and Visual Importer ETL can automatically determine source file encoding and works with multiple Line Terminators

 

Data Reader

Direct link, no registration required.
Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 
 
Xerox
Swiss banking
Bank Of Oklahoma
Red Cross
Alta Pacific bank
Copeinca
Gas alberta
NHS
Royal Brunei
First Oklahoma bank
Noresco
Iqvia

Testimonials

What customers say about us

  • swissbanking

    I used Advanced ETL Processor in 2 Enterprises for many business processes and Business automation (outside finance department). I did not find any other tool with so many functions and broad flexibility for that Price! If you need support for bugs or solution design you will get it very fast. Best Support I have ever seen.

    Lionel Albrecht
  • iqvia

    IQVIA and DB Software Laboratory (DBSL) partnered in 2010 and have been working in close cooperation ever since. Over this period of time, DBSL software components formed an integral part of a large number of IQVIA applications currently used by over 20 UK NHS Trusts (Hospitals).

    Dmitry Dorsky,
    Director
  • xerox

    The product is easy to learn and once a developer understands the ETL way for solving the problem at hand, the developer's productivity will increase. Even our DBAs now uses the ETL software to quickly create solutions instead of SSIS or SQL jobs.

    Daniel Fung
    Solutions Architect

Read ETL Software customers feedback

This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies