Tải bản đầy đủ - 0 (trang)
2-16. Pre-Parsing and Staging File Subsets in SSIS

2-16. Pre-Parsing and Staging File Subsets in SSIS

Tải bản đầy đủ - 0trang



In the Flat File Configuration Manager Editor, set the format to Delimited.


Click the Advanced tab. Remove all but two columns (or add columns until you have

two columns). Name the first one ColIdentifier and the second one ColData. Set

the DataType for the first column (ColIdentifier) to String, and the ColumnDelimiter

to hyphen (-). (This is what we are using in this example; you have to use whatever

separator is used in your source file). Set the DataType for the second (ColData) column

to Unicode text stream. The dialog box should look like Figure 2-17 in Recipe 2-15.


Confirm your configuration changes.


Create two new Flat File connection managers, named InvoiceHeader and

InvoiceLine, respectively. Configure each to point to a text file (I have imaginatively

named mine InvoiceHeader.Txt and InvoiceLine.Txt). For each, click the Advanced

pane and add a new column, which you set to the Unicode Text stream data type (or

the Text stream data type). Name the column Coldata. The Advanced pane will look

like Figure 2-19.

Figure 2-19. Advanced pane of the Flat File connection manager



Chapter 2 ■ Flat File Data Sources


Confirm your configuration changes.


Add a Data Flow task. Double-click to edit.


Add a Flat File source. Configure it to use the MainFrame configuration manager. It

should only have two columns—ColIdentifier and ColData.


Add a Conditional Split task. Configure it exactly as described in the previous recipe.


Add a Flat File destination to the Data Flow window. Connect the Conditional Split

task to it. Use the output named InvoiceHeader. Configure this destination to use the

Flat File connection manager named InvoiceHeader—and check the Overwrite Data

in the file box.


Click Mappings. Map the Coldata column between the available input and available

destination columns.


Repeat steps 10 and 11 for the invoiceLine output and Flat File connection manager.


Return to the Control Flow window. Add a Data Flow task named Import Headers.

Connect this to the previous Data Flow task.


Add a Data Flow task named Import Lines. Connect this to the previous Data Flow

task (Import Headers). The whole data flow should look like Figure 2-20.

Figure 2-20.  Importing a complex source text file while staging the data


Set up the two data flow tasks that you just created to import the InvoiceHeader.Txt

and InvoiceLines.Txt delimited files. This was described in detail in Recipe 2-2. The

script to create the two destination tables is




Chapter 2 ■ Flat File Data Sources

How It Works

There may be times when you wish to process, for example, a file sourced from a mainframe, or certain enterprise

resource planning software solutions. These files can contain multiple row formats. However, you want—or

need—to store the data in temporary text files as the first stage in the process. By this I mean that each record

type (header and line in this example) will first be separated from the single source file into a separate staging file;

each will then be loaded from the resulting staging file. There are several reasons why such an approach may be

preferable or even necessary. Among these are the following:

The record length exceeds 4000 characters, and so the method described in the previous

recipe cannot be used.

Relational integrity is in place on the destination files, and so they have to be loaded in

the correct (parent followed by child) order.

Writing extended parsing code using derived column transformations is impractical.

You want to examine, debug, profile, or pre-process the separate source files before

loading them.

These constraints mean that you need to use the approach given in this recipe. Essentially, it uses the row

prefix to separate each type of record and sends it to a staging file on disk. Each of the resulting files is then

loaded as a separate Data Flow task.

Hints, Tips, and Traps

You can set the data types (in the Flat File connection managers for both the input and

output files) to string or widestring data types if you are certain that no record is longer

than the 4000-character maximum.

As this approach writes text files to disk (and then reads them back in again afterward), it

will be quite a bit slower than a “pure” SSIS in-memory approach.

If you are loading multiple source files, then do not check the “Overwrite Data in the file”

box, but use an initial File System task to delete the contents of the staging files at the start

of this process.

Any records in the source file that do not begin with the row prefix(es) are ignored.

2-17. Handling Irregular Numbers of Columns

in the Source File Using SQL Server 2012


You want to load a source file that has irregular numbers of columns.


Use SSIS in SQL Server 2012.

You have nothing particular to configure when importing text files that have varying numbers of columns,

as it is the default as of SQL Server 2012. Merely create a Flat File connection manager and configure it to use the

relevant source file.



Chapter 2 ■ Flat File Data Sources

How It Works

Fortunately, SQL Server 2012 has come to the rescue of data integration developers everywhere by including in SSIS

the ability to handle flat files with multiple column separators. It is worth noting that this behavior is completely

different to that of previous versions. Before, if SSIS met with different numbers of column separators in a source file,

the row terminator would not be handled “correctly” and data would snake round into the next record, more often

causing the data load to fail. As of SQL Server 2012, record terminators will always force a new record to begin.

Hints, Tips, and Traps

To prevent SSIS from handling disparate numbers of columns in a source file, you can set

the property AlwaysCheckForRowDelimiters to false. This will cause SSIS to behave as it

did for previous versions.

The maximum number of column delimiters in a record can be in any row in the source

file. SSIS will guess the requisite number of columns and apply this to the entire data load.

The record terminator (or row delimiter if you prefer) must not be enclosed in quotes for

it to be recognized by SSIS.

2-18. Handling Embedded Qualifiers in SQL Server 2012


You want to load data from a text file that has embedded qualifiers.


Use SSIS in SQL Server 2012. That version of SSIS handles embedded qualifiers automatically, with no effort

needed on your part.

How It Works

Once again, SQL Server 2012 has ridden to the rescue of data integration developers by extending SSIS so that

it can handle embedded text qualifiers. This means that if a character (for instance, a single quote or a double

quote) is used for qualifying a column, the character must be escaped if it is to be used as a literal. This is done by

doubling up the qualifier character. So a quote (") in a flat file must be a double set of quotes ("") to be imported

correctly. Put another way, a double instance of that text qualifier, inside a field enclosed in the qualifier, is

interpreted as a literal single instance of that string.

Hints, Tips, and Traps

This behavior of handling embedded qualifiers is always active for text qualified

columns/files and cannot be disabled.

This behavior means that the number of qualifiers must conform to expectations. For

instance, if you have a field—“Adam “Aspin”—then the field cannot be loaded, as an

escaped double quote inside the qualified field is expected. If this is a problem, then the

best solution is not to use the field qualifier and to use a script component (extending the

techniques described earlier) to handle the qualifier.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2-16. Pre-Parsing and Staging File Subsets in SSIS

Tải bản đầy đủ ngay(0 tr)