SQL Server – Excel Import and Mixed Data Types

Overview

I am currently managing a project that, added to its significant development investment, requires a large amount of data gathering up-front. One component of this effort is the importing of information pertaining to each of the client’s physical clinics, parsing the data and pre-populating our database with data necessary to go live on day one. Collecting this data will take a relatively substantial time investment which, fortunately, will be undertaken by the client and not myself! I am, however, responsible for executing the import scripts to perform the data insertions and, as the first completed excel spreadsheets have begun to trickle through, I’ve discovered some issues with the process.

Problem #1 – Mixed Data Types

Loading the first spreadsheet everything appeared to execute fine, but I noticed that for one of our columns – which can be textual/numeric/alphanumeric – data is not always imported (when the column contains mixed data types). Specifically, some rows have a value, others are NULL Here is a simple sample query.

Select Name, Nickname

from OPENROWSET('Microsoft.Jet.OLEDB.4.0',

'Excel 8.0;Database=C:\015;',

'SELECT * FROM [SHEET1$]')

My investigation led to the fact that the excel connection manager scans the first 8 rows to determine the type for a specific column. In my case the first 8 rows are numeric, therefore alphanumeric rows are ignored. This is a problem because I will have hundreds of spreadsheets to import and cannot guarantee the ordering. Frankly, I would prefer if the import broke rather than providing me with seemingly good data that is actually bad…Had I had an off-day, this issue may have gone unnoticed until release-day!

The fix I found requires a couple of registry settings to be changed as well as an update to the OPENROWSET connection string.

The first registry setting is named TypeGuessRows and accepts values from 0-16. This setting is essentially an override for the 8 rows used to determine the datatype of the column. 1-16 specify the number of rows to use to determine the column type, 0 is used to scan all rows - in my case I cannot be sure that 16 rows would be a large enough sample size to see both numeric and textual content so I set the registry setting to 0.

The second setting is named ImportMixedTypes and essentially tells the driver what to do when mixed types are found – in my case I want them converted to text and set the registry value to Text. In order to read this registry setting I also set the IMEX variable in my connection string to 1 as in the example below.

32-bit

HKEY_LOCAL_MACHINE\SOFTWARE \Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows

HKEY_LOCAL_MACHINE\SOFTWARE \Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes

64-bit

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes

Select Name, Nickname

from OPENROWSET('Microsoft.Jet.OLEDB.4.0',

'Excel 8.0;Database=C:\015;IMEX=1;',

'SELECT * FROM [SHEET1$]')

Problem #2 – 64 bit OPENROWSET

After researching the issue and figuring out what needed to be done I felt pretty good – however, executing the statement against a SQL Server 2008 database on my local (x64) machine yielded the following error message:

Msg 7308, Level 16, State 1, Line 17

OLE DB provider 'Microsoft.Jet.OLEDB.4.0' cannot be used for distributed queries because the provider is configured to run in single-threaded apartment mode.

Any hopes of a quick (registry) fix on this one quickly disappeared. The issue is caused by the lack of a 64 bit JET driver and I was led to the following solution(s) on the Microsoft SQL Server forum:

http://social.msdn.microsoft.com/Forums/en-US/sqldataaccess/thread/4887d91f-6ac7-40c0-9fc8-5cdd0634e603

Option 1:
Use 32 bit SQL Server on the 64 bit machine….

Option 2:
Build a bridge out of SQL Express.  Keep your main SQL Server instance 64 bit, but also install SQL Express 32 bit Side by Side…

Option 3:
Reverse the flow of data from pull to push.  Instead of having SQL Server pull data from the source that only supports 32 bit clients, push data from that source to SQL Server (as it supports 32/64 bit clients).

Fix

The second option suggested connecting a link between the 64-bit SQL instance and a 32-bit SQL express instance. While the bulk of our development work is performed against large centralized databases, the moment I heard the phrase “registry setting change” I knew I’d be developing on my local box and then porting the data to testing (and then production). The next logical step was to install SQL Server Express (x86) on my local box and use this database to gather the initial temporary data. I will then export the data to our testing servers in order for it to be parsed and our database populated.

Comments

Popular posts from this blog

Excel - Adding an existing Pivot table to the data model

Mirth

Getting Started with Mirth (Part 1)