Data Cleansing is a process of correcting data errors and removing invalid information

Bad data example

data quality issues

Inconsistent values: US and USA from a human point of view are the same but for computers they are different. This can happen when merging data from different data sources

Missing values: UK and Germany values are missing. Most likely this data is incorrect and must be removed from the final dataset.

Data entry errors: Spain and SPain are two different values

Uniqueness: ORDER_ID must be unique

Inconsistent Date Formats: It is a common problem when merging data from various countries

Non-numeric characters inside numeric fields: Same as above, can be easily corrected using "Delete characters transformation function"

Leading and trailing spaces: Invisible enemy of a data analyst. Use "Trim transformation function" to correct this error

Data Cleansing Example

Steps to follow

  • Download and install Advanced ETL Processor [Link]
  • Download and Unzip example[Link]
  • Create a new transformation and open the .ats file

Data Cleansing 

  • Double-click on the Reader object and amend the source file path
  • Double-click on the Writer object and amend the target file path
  • Run the transformation by pressing the green arrow.

 How the Data Validation process works

Data reader loads Excel file into memory, validator rejects rows with empty Country name field. 

Removing Empty Values

Removing Empty Values

Cleansing the data.

Once bad records are rejected the transformer performs additional cleaning

  • Delete Characters Transformation function deletes Dollar sign, Pound sign, Comma and Space characters from Amount field.
  • Date Format Transformation function reformats Order Date field into standard ODBC format.
  • Lookup transformation Function corrects Country Field values

transforming and cleansing data

delete characters

date format properties

lookup properties 1

lookup properties 2

Please contact us if you need help with transforming the data

Visit ETL Tools Forum