This Article describes what is Dirty Data and how to Deal with it

What is Dirty Data anyway?

In reference to databases, it is data that contain errors. Dirty data can contain such mistakes as spelling or punctuation, incorrect data associated with a field, incomplete or outdated data or even data that is duplicated in the database.

 Other common causes of dirty data are:

  • Wrong fields sizes
  • Wrong and inconsistent formats
  • Logical inconsistency like typing a zip code into the phone number box
  • User Errors

Most of the problems come when working with Text or Excel Files

Life is much easier when the data source is ODBC compliant database however there are still some potential problems

Imagine that you are loading orders from different countries into your Oracle data warehouse.

Part of the data comes from text files, part from MS Excel files and some of the data is direct ODBC connection to the source database.
Some files are a result of manual consolidation of multiple files

Data-warehouse Table Definition is


Every country has different formats for ORDER_DATE and Amount field. This situation is far too familiar for many ETL Consultants


In order to load data, we need to make sure that the format of Amount and Order_Date fields is consistent.

For the amount field, we need to get rid of dollars, pounds and commas.

It could easily be done by using the replace function of Advanced ETL Processor. 

What you see is what you load


For ORDER_DATE field we will apply multiple date formats.

The result of the Date Format function is a string in 'YYYY-MM-DD HH:NN:SS.ZZZ' format 

What you see is what you load

 Full Data Transformation:

 The result of Data Transformation:


This is just a small example of how Advanced ETL Processor can help you to validate and transform data. 

About Advanced ETL Processor

Advanced ETL Processor is an ETL tool designed to automate extracting data from ANY database, transform, validate it and load into ANY database. Typical usage of it would be extract data from Excel File, Validate Date Formats, Sort data, deduplicate it and load it into the Oracle database, run stored procedure or SQL script, once loading is completed. Unlike Oracle SQL loader, BCP,  DTS or SSIS Advanced ETL Processor can also add new and update old records based on the primary key.

Direct link, no registration required.

This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies