There are several ways to connect Excel Files and run SQL against it:
Odbc connection strings:
For Excel 2.0-2003
For Excel 2007+
Both Advanced ETL Processor and Visual Import ETL can use ODBC to connect to Excel
OLE DB connection strings:
For Excel 2.0-2003
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1";
For Excel 2007+
"HDR=Yes;" indicates that the first row contains column names, not data. "HDR=No;" it does not
A very useful feature for the Excel file below when you want to retrieve the Invoice Number:
Running SQL against Excel file
SQL syntax for ODBC and Ole DB is slightly different
- For ODBC: SELECT "Column Name" FROM "Sheet One$". I.e. excel worksheet name followed by a "$" and wrapped in double-quotes.
- For Ole DB: SELECT [Column Name] FROM [Sheet One$]. I.e. excel worksheet name followed by a "$" and wrapped in "[" "]" brackets.
Extract data from a single Excel worksheet
SELECT * FROM [Sheet1$]
To specify a named range of cells as your record source, simply use the defined name. For example:
SELECT * FROM MyRange
Unnamed Range
To specify an unnamed range of cells as your record source, append standard Excel row/column notation to the end of the sheet name in the square brackets. For example:
SELECT * FROM [Sheet1$A1:B10]
Common Excel Import problems
ADO/ODBC must determine the data type for each column in your Excel worksheet or range (This is not affected by Excel cell formatting settings.) This is done by scanning number of rows defined by the registry setting TypeGuessRows (default value is 8). Quite often there are numeric values mixed with text values in the same column,
For example, sorted financial coding structures often have numbers at the beginning of the list 001-999 than text AAA-XXX
Both the Jet and the ODBC Provider return the data of the majority type but return NULL (empty) values for the minority data type. If the two types are equally mixed in the column, the provider chooses numeric over text.
Plus when first rows are less than 255 characters long it will truncate all data to 255 characters even if cell values below are longer
One of the ways of avoiding this problem is using Import Mode (IMEX=1). This forces mixed data to be converted to text. However, it only works when the first TypeGuessRows Rows have mixed values.
If all values are numeric than setting IMEX=1 will not convert the default datatype to Text, it will remain numeric.
The best combination to avoid problems is TypeGuessRows=0 + IMEX=1.
Setting TypeGuessRows=0 forces driver to read all data to determine field type.
Unfortunately, our own experience shows that quite often it does not work.
And when it does work it slows everything down
So the only solution is not to use mixed values and be prepared for the data being truncated to 255 characters
As you can see there is plenty of room for improvement for Microsoft. The first version of Microsoft Office was released very long time ago and it is still not possible to read excel file correctly. Why do not they just add another setting to ODBC driver EverhingIsText=1? This so simple and will solve all the problems.
All our products work correctly with Excel because we do not use OleDb or ODBC