Tải bản đầy đủ - 0 (trang)
1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS

1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS

Tải bản đầy đủ - 0trang

CHAPTER 1 ■ Sourcing Data from MS Office Applications



2.



Select the Input and Output Parameters tab, and expand Output Columns. Then click

the column whose column length you wish to change. This is shown in Figure 1-19.



Figure 1-19.  Modifying datasource types in Excel

3.



Select Unicode String [DT_WSTR] and enter a length (500 in this example). Of course,

the columns will be those of your source data.



4.



Confirm by clicking OK.



5.



Add a Data Conversion task to the Data Flow pane and connect the Excel Source task

to it. Then double-click the Data Conversion task to edit it.



6.



Select the output column that you modified in step 3, and specify that the output data

type is String [DT_STR], with the length you require.



31

www.it-ebooks.info



CHAPTER 1 ■ Sourcing Data from MS Office Applications



How It Works

When the Excel worksheet is simple, you probably do not need to make many tweaks to an SSIS import package.

However, there could be times when you need to “coerce” SSIS to import the source data correctly. Specifically,

you mayoccasionally need to specify the length of the data in a column imported from Excel. This is because

the old Excel 255-character limit on the amount of data that a cell could hold has been lifted for some time now.

Indeed, SSIS detects cells containing more than this character amount (if they are in the first “n” rows specified

using the TypeGuessRows Registry setting).

There are occasions when you will have to adjust some of the standard settings in order to:





Import text longer than 255 characters by selecting Unicode Text Stream [DT_NTEXT] to

specify text more than 255 characters in the Input and Output properties of the Excel Source.







Specify a different source data type.







Convert Excel Unicode data into non-Unicode text data using a Data Conversion task.



Should you wish, you could override Excel’s choice of source data to force a data type conversion. The

available conversions are shown in Table 1-3.

Table 1-3.  Excel to SSIS Data Type Mapping



Excel Data Type



SSIS Type Name



SSIS Data Type



Date:



Date



[DT_DATE]



Short Text:



Unicode String



[DT_WSTR]



Number:



Double precision float



[DTR8]



Long Text:



Unicode Text Stream



[DT_NTEXT]



In step 3 of this recipe, select any of these data types from the Input and Output Properties tab for the field

you wish to change. Excel data is read as Unicode, and try as you might, you cannot specify that it is otherwise

(for instance, by changing the source data type). So you have to convert the data from Unicode to a non-Unicode

string using the SSIS Data Conversion task. You can do this as follows

1.



Add a Data Conversion task to the Data Flow pane and connect the Excel Source task

to it. Then double-click the Data Conversion task to edit it.



2.



Select the input column that you modified in step 3, and specify that the output data

type is String [DT_STR] with the length you require.



3.



Confirm with OK.



Hints, Tips, and Traps





You will need to handle Unicode character conversion errors by configuring the error

output. At the very least, set the Data Conversion to ignore errors.







It is not possible to select any other source types, and attempting to do so results in a

variety of errors.







As Excel source data (at least when generated by end-users) can be full of errors, it is

a good idea to include some error handling. There is an introduction to this in Chapter 15.







You cannot specify a Unicode string and merely change the length parameter; SSIS will

revert to text stream.



32

www.it-ebooks.info



CHAPTER 1 ■ Sourcing Data from MS Office Applications



1-10. Pushing Access Data into SQL Server

Problem

You want to transfer some or all the tables in an Access database into SQL Server directly from Access itself.



Solution

Use the Access Upsizing Wizard, which you can run from inside Access as follows:

1.



From Access 2007/2010/2013 Activate the Database Tools ribbon, click SQL Server.

(From Access 2000 or Access XP, click Tools ➤ Database Utilities ➤ Upsizing Wizard).



2.



Click Use Existing Database, and then Next.



3.



Select an ODBC driver that you have created, or configure a new one at this point as

described in Recipe 6-12, and then click OK.



4.



Select the table(s) you wish to import, add them to the Export to SQL Server pane

using the Chevron buttons, and then click Next.



5.



Uncheck all the table attributes to upsize, and “No, never” for the “Add timestamp

fields to tables” pop-up. Then click Next.



6.



Select “No application changes”. Click Next and then Finish.



7.



Close the upgrade report.



8.



If you now switch to SSMS, you can see the results of the upsizing process—and the

real work refactoring the database can begin!



How It Works

The Access Upsizing Wizard is a venerable tool that has been around for at least 15 years to my knowledge

(possibly more, but I cannot remember exactly). Despite its simplicity and extreme slowness, it is a tried and

trusted solution that works well for small data loads and RAD development where small to medium-sized data

transfers from Access into SQL Server are all that is required.

Here, I am only considering using this tool to transfer into SQL Server. I am not looking at application

conversion because this area is a matter of considerable divergence of opinion. Fortunately, many products and

books and papers exist on this subject, so I will leave you to consult them while I avoid the field completely, and

stick to this book’s subject matter—data ingestion into SQL Server.

That said, in my experience with upsizing Access databases, the real problem is not anything technical at

all, but is all too often the lack of proper database design in the source Access database. All too frequently, third

normal form is a distant dream in databases drawn up over time by end users and/or enthusiastic amateurs.

This can be accompanied by the total lack of a coherent naming convention for source tables and fields, and

redundant, duplicated, or superfluous data. In other words, you can be dealing with vast amounts of rubbish

masquerading as a database. So attempting to re-create the same mess only bigger and faster is to miss the point,

which is that you should perhaps be seizing the opportunity to redesign the database and clean up the data.

However, even if this is the case, at some point you will have to transfer data from Access to SQL Server. So, to

remain resolutely positive, the Upsizing Wizard can most likely help you in the following situations:





When the source data is simple and without complex data structures.







When the source data is not extensive.



33

www.it-ebooks.info



CHAPTER 1 ■ Sourcing Data from MS Office Applications







When you want a quick transfer of most—or all—of an Access database into SQL Server to

handle the data structures and the data itself.



The Access Upsizing Wizard can fail. The keys to a successful upsizing process are to do the following:





Work on a copy of the source database.







Alter all table and field names in the copy of the source database to conform to SQL

standards (remember to remove any special characters and possibly apostrophes)—and

use your SQL Server naming convention.







Do not transfer indexes, validation rules, defaults, and referential integrity—re-create

these in SQL Server. At the very least, you will be able to define constraint names using

your own naming convention. These areas seem to cause the Upsizing Wizard to fail

most often, in my experience. This mostly seems due to missing defaults or foreign key

relationships.



The Upsizing Wizard converts Microsoft Access primary keys to Microsoft SQL Server nonclustered, unique

indexes and sets them as primary keys in SQL Server. Removing primary keys from Access tables lets you specify

the index type (clustered, for instance, sorts in TempDB and other SQL Server index settings) and a Primary Key

constraint.



Hints, Tips, and Traps





You can create a new database during the process; but for greater control over where the

database files are created, and to define database properties precisely, it is probably wiser

to create the destination database first.







To upgrade data from a view, run a “create table” query in Access to create a table based

on the view first, and then upsize the resulting table.







Note that you can use the Upsizing Wizard to create table structures, and transfer the data

once you have tweaked and perfected the tables using SSIS. This approach also lets you

move tables to a schema other than dbo—the default for the Upsizing Wizard.







Autoincrement fields are not transferred as IDENTITY fields in SQL Server, but as INTs, so

you have to modify your SQL Server table structure to specify identity fields.







Upsizing the OLE object keeps OLE image data as an OLE object—remember, this is not

the binary image data!







Hyperlink fields are transferred as text fields.







To avoid date overflow errors with SQL Server 2008 and above, ensure that you are using

DATETIME2 fields.







When upsizing data to SQL Server 2005, you frequently see overflow errors. In this case,

query the source data in Access to ensure that any Access date fields do not contain data

outside the SQL Server date ranges (January 1, 1753, through December 31, 9999). A good

initial workaround is to set all dates greater than the upper limit (31 Dec 9999) and dates less

than the lower limit (1 Jan 1753) using an Access query before attempting the conversion.







When importing large data sets, you can get timeouts. To resolve this, use the Registry

editor to set HKEY_LOCAL_MACHINE\Software\Microsoft\Jet\4.0\Engines\

ODBC\QueryTimeout to 0 (for Access 97 to 2003) and

HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity

Engine\Engines\ODBC QueryTimeout for Access 2007–2010.



34

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS

Tải bản đầy đủ ngay(0 tr)

×