This article describes what is Dirty Data and how to Deal with it

What is Dirty Data anyway?


In reference to databases, it is data that contain errors. Dirty data can contain such mistakes as spelling or punctuation, incorrect data associated with a field, incomplete or outdated data or even data that is duplicated in the database.

 Other common causes of dirty data are:

  • Wrong fields sizes
  • Wrong and inconsistent formats
  • Logical inconsistency like typing a zip code into the phone number box
  • User Errors

Most of the problems come when working with Text or Excel Files

Life is much easier when the data source is ODBC compliant database however there are still some potential problems

Imagine that you are loading orders from different countries into your Oracle data warehouse.

Part of the data comes from text files, a part from MS Excel files and some of the data is direct ODBC connection to the source database.
Some files are a result of manual consolidation of multiple files

Data-warehouse Table Definition is

  • COUNTRY_ID INTEGER
  • ORDER_ID INTEGER
  • ORDER_DATE DATE
  • AMOUNT NUMBER(10.2)


Every country has different formats for ORDER_DATE and Amount field. This situation is far too familiar for many ETL Consultants

 

In order to load data, we need to make sure that the format of Amount and Order_Date fields is consistent.

For the amount field, we need to get rid of dollars, pounds and commas.

It could easily be done by using the replace function of Advanced ETL Processor. 

What you see is what you load

 

For ORDER_DATE field we will apply multiple date formats.

The result of the Date Format function is a string in 'YYYY-MM-DD HH:NN:SS.ZZZ' format 

What you see is what you load
 

 Full Data Transformation:

 The result of Data Transformation:

  


This is just a small example of how Advanced ETL Processor can help you to validate and transform data. 

About Advanced ETL Processor

Advanced ETL Processor is an ETL tool designed to automate extracting data from ANY database, transform, validate it and load into ANY database. Typical usage of it would be extract data from Excel File, Validate Date Formats, Sort data, deduplicate it and load it into the Oracle database, run stored procedure or SQL script, once loading is completed. Unlike Oracle SQL loader, BCP,  DTS or SSIS Advanced ETL Processor can also add new and update old records based on the primary key.

Direct link, no registration required.
Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 
 
Xerox
Swiss banking
Bank Of Oklahoma
Red Cross
Alta Pacific bank
Copeinca
Gas alberta
NHS
Royal Brunei
First Oklahoma bank
Noresco
Iqvia

Testimonials

What customers say about us

  • swissbanking

    I used Advanced ETL Processor in 2 Enterprises for many business processes and Business automation (outside finance department). I did not find any other tool with so many functions and broad flexibility for that Price! If you need support for bugs or solution design you will get it very fast. Best Support I have ever seen.

    Lionel Albrecht
  • iqvia

    IQVIA and DB Software Laboratory (DBSL) partnered in 2010 and have been working in close cooperation ever since. Over this period of time, DBSL software components formed an integral part of a large number of IQVIA applications currently used by over 20 UK NHS Trusts (Hospitals).

    Dmitry Dorsky,
    Director
  • xerox

    The product is easy to learn and once a developer understands the ETL way for solving the problem at hand, the developer's productivity will increase. Even our DBAs now uses the ETL software to quickly create solutions instead of SSIS or SQL jobs.

    Daniel Fung
    Solutions Architect

Read ETL Software customers feedback

This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies