Tải bản đầy đủ - 0 (trang)
2-3. Automatically Determining Data Types

2-3. Automatically Determining Data Types

Tải bản đầy đủ - 0trang

Chapter 2 ■ Flat File Data Sources



1.



Follow steps 1 to 4 in Recipe 2-2.



2.



Click Advanced.



3.



Click Suggest Types. The Suggest Column Types dialog box appears, as shown in

Figure 2-6.



Figure 2-6.  Determining column types automatically in SSIS

4.



Alter any options you wish to tweak (for instance, the number of rows may be far too

small for an accurate sample of a large file). Click OK.



You will see that all the data types (and lengths, where appropriate) have been sampled and adjusted in the

advanced pane of the dialog box (Figure 2-5).



72

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



How It Works

You probably noticed that when importing text files, the Flat File connection manager assumes that all source columns

in a delimited text file are strings with a length of 50 characters. This can be too restrictive for several reasons:





Many fields are not strings and consequently require data type conversion as part of the

SSIS package.







The suggested length of 50 characters is insufficient for many string fields, and can cause

package errors at runtime.







Even if all the source columns can fit into 50 characters, this is frequently too wide, as it

reduces the number of records that can fit into the SSIS pipeline buffer, and consequently,

slows down the load process.



Frequently, of course, the problem is solved by the people who provide the source files, who will have

thoughtfully handed over a complete data type description of the file contents. However, there just may be times

when this is not the case, and you have to deduce, discover, or guess the field (or column if you prefer) types

and lengths in a source file. This can be a painful process, especially if it means a trial-and-error-based cycle of

loading and reloading a text file until you have finally found, for each source column, an acceptable data type.

Consequently, a much simpler solution is to ask SSIS to guess the data types and lengths for you. Admittedly, you

can open some files to look at the source data. A quick glance is rarely an accurate analytical sample, however.

What is more, there will be some source flat files that are so large that they either take forever to open or simply

crash your favorite text editor.

It helps to understand your options when determining the data types in the source file, as provided in Table 2-2.

Table 2-2.  Suggested Column Type Options



Option



Description



Number of rows



The number of rows to sample. There seems to be no upper limit in

SQL Server 2012. Up to SSIS 2008, it was limited to 1000 records.



Suggest the smallest integer data type



For columns that contain integers only, suggest the smallest integer

type that can accommodate the data without overflowing.



Suggest the smallest real data type



For columns that contain real (numeric data type) numbers, suggest

the smallest numeric type that can accommodate the data without

overflowing.



Identify Boolean columns using the

following values



Indicates that values can be interpreted as Boolean



Pad string columns / Percent Padding



Takes the length of the longest string ([n]varchar, [n]char) element

and extends the length by the percentage you specify to anticipate

longer strings in future source files.



Assuming that you know which column needs to be set to which data type, as well as setting a specific

column delimiter for one or more columns, you might wish to fine-tune the data types using the Advanced pane.

Table 2-3 describes the available options.



73

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Table 2-3.  Flat File Connection Manager Advanced Pane Options



Option



Description



Name



The column name can be set or overridden here. This is the name

that is used from this point on in the SSIS data flow.



Column delimiter



The specific column delimiter for one or more columns. You can

select from the list, or enter or paste the column delimiter used in

the source data.



Data Type



Select the data type from the list of SSIS data types.



Output Column width



The width of the output column. This is for single-byte characters.



Text Qualified



Allows you to specify if the text in this column is qualified by using

the text qualifier set in the General pane.



For example, if you wish to override the current settings for a column’s data type, you could set the data type,

as shown in Figure 2-5.

Should your source data change, you do not have to redo the entire column mapping structure from scratch,

as SSIS lets you add or remove columns to adjust the package to changes in the source data structure.

To add a column, do the following:

1.



Click the column that precedes (or follows) the column to insert.



2.



Click the arrow on the right of the New button. Select Insert Before (or after). A new

column is added to the mapping structure.



Clicking New or Add Column inserts a new column after the existing columns.

Removing a column is as simple as the following:

1.



Select the column to remove.



2.



Click Delete.



The Advanced pane of the Flat File connection manager also lets you specify, for each column, whether the

column is enclosed in quotes. All you have to do is set TextQualified to True, if this is the case, and False if there

are no quotes.



Hints, Tips, and Traps





The details of OLEDB Connections in SSIS are explained in Recipe 1-7.







You can use a .NET destination from SQL Server 2008. From SQL Server 2008 R2 and up

you have the “Use Bulk Insert whenever possible” option for this destination component

to accelerate the data load.







To have SSIS handle varying numbers of column delimiters, see Recipe 2-16.







Adjusting the data types for an existing package generates a warning triangle on the Flat

File source task. To have the warning disappear, just double-click this task and confirm

that you wish for SSIS to resynchronize the metadata.



74

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources







If you are not using SSIS 2012, then the maximum number of rows to sample is 1000;

in some cases, this is not a representative sample. If you are experiencing problems,

then it is probably best to set large data types (I8 for integers, large lengths for character

fields), and then check the maximum real data lengths in the table into which the data is

imported. Data length analysis like this is described in Chapter 8.







Personally, I find it unlikely that you will ever define Flat File connection managers

at package level in SQL Server 2012 (remember this is done in the Connection

Managers folder of the Solution Explorer). This is because they are essentially “one-off”

connections. A destination connection is another matter entirely; it would probably

benefit from being defined at package level.







If the source file is not Unicode, then do not attempt to set it to Unicode, or you risk losing

all your existing column type definitions.







SSIS will not verify that your modifications map to the source file’s structure—so you need

to know what you are doing.







The ways to handle column truncation and data type transformation are given in Chapter 9.







SSIS data types are described in Appendix A.







You might have to copy and paste unusual column separators into the Flat File

Connection Manager Editor, one row at a time, because multiple pasting is not offered.



2-4. Importing Fixed-Width Text Files

Problem

You want to import a fixed-width text file into SQL Server.



Solution

Create an SSIS package and define a Data Flow using a Flat File connection manager set to accept fixed-width

files. The following explains how you can do this.

1.



Create an SSIS package as described in Recipe 2-2, preparing a Flat File source and an

OLEDB destination.



2.



Create an OLEDB connection manager connecting to CarSales_Staging named

CarSales_Staging_OLEDB.



3.



Create a connection manager for the flat file, either as described earlier in Recipe 2-2

step 3, or by right-clicking the Connection Managers tab and selecting New Flat File

Connection, as shown in Figure 2-3.



4.



Once you have specified the connection name and the source file

(C:\SQL2012DIRecipes\CH02\StockFixedWidth.Txt), specify that the format be fixed

width, as shown in Figure 2-7.



75

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Figure 2-7.  Defining a fixed-width data source

5.



Display the Columns pane by clicking Columns in the left of the dialog box. You

should see something like Figure 2-8.



76

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2-3. Automatically Determining Data Types

Tải bản đầy đủ ngay(0 tr)

×