Running SQL against Excel file

There are several ways to connect Excel Files and run SQl against it:

Data can extracted via ODBC, OLE DB(Jet) or .Net providers

Odbc connection strings:

For Excel 2.0-2003

DRIVER={Microsoft Excel Driver (*.xls)};IMEX=1;MaxScanRows=16;DBQ=C:\Invoice.xls;

For Excel 2007

DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};IMEX=1;MaxScanRows=16;DBQ=C:\Invoice.xls;

Both Advanced ETL Processor and Visual Import ETL use odbc to connect to Excel

OLE DB connection strings:

For Excel 2.0-2003

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";

For Excel 2007

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";

"HDR=Yes;" indicates that the first row contains column names, not data. "HDR=No;" it does not

Very useful feature for the the Excel file below when you want to retrieve Invoice Number:

Invoice Excel File

Running SQL against Excel file

SQL syntax for ODBC and Ole DB is slightly different

  • For ODBC: SELECT "Column Name" FROM "Sheet One$". I.e. excel worksheet name followed by a "$" and wrapped in double quotes.
  • For Ole DB: SELECT [Column Name] FROM [Sheet One$]. I.e. excel worksheet name followed by a "$" and wrapped in "[" "]" brackets.
Extract data from single Excel worksheet
SELECT * FROM [Sheet1$]
To specify a named range of cells as your record source, simply use the defined name. For example:
SELECT * FROM MyRange

Visual   Importer - Data source properties dialog

Unnamed Range
To specify an unnamed range of cells as your record source, append standard Excel row/column notation to the end of the sheet name in the square brackets. For example:
SELECT * FROM [Sheet1$A1:B10]

Visual Importer - Extracting data from Excel

Common Excel Import problems

ADO/ODBC must determine data type for each column in your Excel worksheet or range  (This is not affected by Excel cell formatting settings.) This is done by scanning number of rows defined by registry setting TypeGuessRows (default value is 8). Quite often there are numeric values mixed with text values in the same column,

For example sorted financial coding structures often have numbers at the beginning of the list 001-999 than text AAA-XXX

Both the Jet and the ODBC Provider return the data of the majority type, but return NULL (empty) values for the minority data type. If the two types are equally mixed in the column, the provider chooses numeric over text.

Plus when first rows are less than 255 characters long it will truncate all data to 255 characters even if cell values below are longer

On of the ways of avoiding this problem is using Import Mode (IMEX=1). This forces mixed data to be converted to text. However it only works when first TypeGuessRows Rows have mixed values.
If all values are numeric than setting IMEX=1 will not convert the default datatype to Text, it will remain numeric.  

The best combination to avoid problems is TypeGuessRows=0 + IMEX=1.
Setting TypeGuessRows=0 forces driver to read all data to determine field type.
Unfortunately our own experience shows that quite often it does not work.
And when it does work it slows everything down

So the only solution is not to use mixed values and be prepared for the data being truncated to 255 characters

As you can see there is plenty room for improvement for Microsoft. The first version of Microsoft office was released very long time ago and it is still not possible to read excel file correctly.
Why do not they just add another settings to odbc driver EverhingIsText=1?
this so simple and will solve all the problems.

More about Loading data from Excel
Visit Support Forum Learn More Download Buy It
Development team listens to it's customers and implements new features in days sometimes hours.

Paul Gibb,
MPS

Our customers

BP

BBC

HSBC

Orange

CAT

SEPA
emc
NHS
Australian Rugby Union
xerox
Databases we work with