Tải bản đầy đủ - 0 (trang)
2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008

2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008

Tải bản đầy đủ - 0trang

Chapter 2 ■ Flat File Data Sources



Column Name



Data Type



Length



ID



4-byte signed integer



InvoiceID



4-byte signed integer



StockID



String



Quantity



4-byte signed integer



SalePrice



Currency



Comment1



String



150



Comment2



String



500



50



The Inputs and Outputs pane will look like the one shown in Figure 2-21.



Figure 2-21.  Creating output columns for parsing string data



122

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



8.



Click Script to activate the Script pane. Set the ScriptLanguage to Microsoft Visual

Basic 2010. Click Edit Script.



9.



In the Script editor, type or copy the following script for the Input0_ProcessInputRow

method:

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Dim

Dim

Dim

Dim

Dim



NbCols As Integer

Delimiter As String = ","

StartPosition As Integer = 0

EndPosition As Integer = 0

RowContents As String = System.Text.Encoding.UTF8.GetString( 

Row.SourceData.GetBlobData 

(0, Convert.ToInt32(Row.SourceData.Length)))



' Count the number of delimiters - and so the No of columns

NbCols = (RowContents.Length - RowContents.Replace(Delimiter, "").Length)

' Set the output array as a function of the number of columns

Dim OutputCols(NbCols) As String

' Parse the data

For Ctr = 0 To NbCols + 1

If Ctr = 0 Then

If NbCols = 0 Then

OutputCols(Ctr) = RowContents

Exit For

Else

StartPosition = 0

EndPosition = RowContents.IndexOf(",", StartPosition)

OutputCols(Ctr) = Left(RowContents, EndPosition)

End If

Else

StartPosition = RowContents.IndexOf(",", StartPosition)

EndPosition = RowContents.IndexOf(",", StartPosition + 1)

If EndPosition = -1 Then

OutputCols(Ctr) = RowContents.Substring(StartPosition + 1)

Exit For

Else

OutputCols(Ctr) = RowContents.Substring(StartPosition + 1, 

(EndPosition - StartPosition) - 1)

End If

End If

StartPosition = EndPosition

Next

' Send parsed data to the appropriate output column

For ColIDOut As Integer = 0 To NbCols

If ColIDOut = 0 Then

Row.ID = OutputCols(0)

End If



123

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



If ColIDOut = 1 Then

Row.InvoiceID = OutputCols(1)

End If

If ColIDOut = 2 Then

Row.StockID = OutputCols(2)

End If

If ColIDOut = 3 Then

Row.Quantity = OutputCols(3)

End If

If ColIDOut = 4 Then

Row.SalePrice = OutputCols(4)

End If

If ColIDOut = 5 Then

Row.Comment1 = OutputCols(5)

End If

If ColIDOut = 6 Then

Row.Comment2 = OutputCols(6)

End If

Next

End Sub

10.



Close the Script window. Confirm your changes to the Script component.



11.



Add an OLEDB destination and connect the Script transform to it. Select the OLEDB

connection manager that you defined in step 2. Select the destination table. Click

Mappings. Ensure that all the fields except SourceData are mapped.



You can now run the process and import the data.



How It Works

One eventuality that you may have to face is a source text file that has a variable number of delimiters—and

consequently a varying number of columns—per row. As this is not something that SSIS handled very well until

SQL Server 2012 appeared, you have to use an SSIS script component to parse rows containing a variable number

of columns if you are using older versions of the Microsoft flagship database.

The script that parses each row works like this:





First, the text stream is converted to text—otherwise .NET string functions such as

Replace will not work.







Then, the number of columns is determined by comparing the original row length to the

length of the input text once all separators have been removed.







Next, the leftmost column is handled (or the entire record if there are no other columns)

and placed in an array.







Then, all other columns except the right-hand column are determined (using the start

and end positions between separators) and placed in an array.







Next, the rightmost column is handled and placed in an array.







Finally, all the contents of the array are mapped to the SSIS output columns.



124

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



This technique will nonetheless require you to define the maximum number of possible columns in the

source file, and will only map the columns on a simple left-to-right basis. It cannot guess that source column

should map to that output column if column separators are missing. It is possible to hard-code the input parsing

to the output columns (which you have to do for SSIS output), but that would be less elegant. Of course, you can

change the delimiter character used by the Delimiter variable.

If you are using an Unicode text source, you will need to do the following:





In the General pane of the Flat File Connection Manager Editor, check the Unicode box.







In the Advanced pane of the Flat File Connection Manager Editor, set the DataType for

the SourceData column to Unicode String (DT_WSTR).







In the SSIS Script task, replace the line that handles UTF8 encoding to the following:

ColText = System.Text.Encoding.Unicode.GetString(Row.SingleCol.GetBlobData 

(0, Convert.ToInt32(Row.SingleCol.Length)))



Hints, Tips, and Traps





Remember that (in versions 2005 and 2008) you can copy connection managers between

packages to save redefining them.







If your source file has very short records, you can avoid text streams entirely by setting the

DataType for the SourceData column (in the Advanced pane of the Flat File Connection

Manager Editor) to String, with a length of 8000, or Unicode string, with a length of 4000.

You can then either set the ColText variable simply to the source column (ColText = 

Row.SingleCol.ToString)—or avoid using a variable at all, and refer in the script to the

source column directly.



2-20. Determining the Number of Columns in a Source File

Problem

You want to determine the number of columns in a flat file.



Solution

Use a custom SSIS script to analyze the source file. The following explains how to do this.

1.



Set up an SSIS package with the Flat File connection manager, as described in Recipe 2-2.



2.



Add a Data Flow task and a Flat File source, also as described in Recipe 2-2.



3.



Add a Script task connected to the Flat File source using the techniques described in

Recipe 2-19. Do not create any output columns. Add the following script:

Public Class ScriptMain

Inherits UserComponent

Dim MaxNbCols As Integer = 0

Public Overrides Sub PreExecute()

MyBase.PreExecute()

End Sub



125

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Public Overrides Sub PostExecute()

MyBase.PostExecute()

MsgBox(MaxNbCols)

End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Dim NbCols As Integer = 0

Dim Delimiter As String = ","

Dim ColText As String = ""

'ANSI text

ColText = System.Text.Encoding.UTF8.GetString(Row.SingleCol.GetBlobData 

(0, Convert.ToInt32(Row.SingleCol.Length)))

NbCols = (ColText.Length - ColText.Replace(Delimiter, "").Length) + 1

If NbCols > MaxNbCols Then

MaxNbCols = NbCols

End If

End Sub

End Class

4.



Close the Script task. Confirm with OK.



How It Works

This simple script counts the number of delimiters (set using the Delimiter variable) in each record in the data

flow. When you run the package, a message box displays the maximum number of columns in the source file.

Clearly this approach is very “quick and dirty” because it uses a MessageBox to return the column count. So it is

more a development technique than anything else, but nonetheless a useful tool.



■■Note  You will not ever use SSIS packages with MessageBoxes in a production environment.



Hints, Tips, and Traps





Alternatively, you can use LogParser to do this, as described in the next recipe.







For more advanced techniques on profiling source files, see Chapter 10.







To be somewhat purist, you should also create an integer variable and then add a Row

Count task that uses this variable as a destination transform by connecting the Script task

to the new Row Count task.



126

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008

Tải bản đầy đủ ngay(0 tr)

×