Solving problems with loading data from Excel files into databases
Common problem: trying to load the data from Excel file half of the data is coming as nulls, or columns with more than 255 characters are truncated
The logic behind Excel mixed data types
As partially explained here
http://support.microsoft.com/kb/257819
ODBC/MS Jet scans first TypeGuessRows to determine field type
Here how Excel ODBC/MS Jet works
(TypeGuessRows=8 IMEX=1)
In your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.
In your eight (8) scanned rows, if the column contains three (3) numeric values and five (5) text values, the provider returns three (3) null values and five (5) text values.
In your eight (8) scanned rows, if the column contains four (4) numeric values and four (4) text values, the provider returns four (4) numbers and four (4) null values.
In your eight (8) scanned rows all of them less than 255 characters the provider will truncate all data to 255 characters
In your eight (8) scanned rows, if the column contains five (5) values with more length than 255 the provider will returm more than 255 characters
NOTE:
Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.
Nobody wants to load half of the data, everybody wants to load data as it is
The only way to make import from Excel work is
Set IMEX=1 in connection string
Close any programs that are running.
On the Start menu, click Run. Type regedit and click OK.
In the Registry Editor, expand the following key depending on the version of Excel that you are running:
Excel 97
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
Excel 2000 and later versions
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
Select TypeGuessRows and on the Edit menu click Modify.
In the Edit DWORD Value dialog box, click Decimal under Base.
Set the value to 1
Open Excel file
Make sure that the cells in the first line of the table have relevant data for example
- mixed numbers and text characters for text fields
- only numbers for numeric fields
- If some of the data will be longer than 255 characters make sure that first line cell has more 255 characters otherwise it will be truncated
This solution apply to all versions of MS Excel ODBC driver, Ole DB, MS Jet, .NET, DTS and SSIS
We have spent enormous amount of time trying to get it fixed. So far we were not able to find a better solution.
The way Excel import works makes it not possible to automate it. You have to modify most of excel files manually in order to load them.
This why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work currecly with Excel all the time
|
|





