Tải bản đầy đủ - 0 (trang)
9-9. Concatenating Data Using SSIS

9-9. Concatenating Data Using SSIS

Tải bản đầy đủ - 0trang

Chapter 9 ■ Data Transformation



Figure 9-11.  Defining the input columns for a Script transform

7.



Select Output 0.



8.



Click Inputs and Outputs on the left, then click Output Columns and add two new

columns—ListOutput and GroupOutput. Set both as DT_STR and of suitable length.



9.



Select Output 0 and set the SynchronousInputID to None. This pane should look like

Figure 9-12.



498

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Figure 9-12.  Configuring a Script task output to be asynchronous

10.



Click Script, select Microsoft Visual Basic 2010 as the script language, followed by

Edit Script.



11.



In the Script window, add the following directive to the Imports region:

Imports System.Text



12.



Replace the ScriptMain code with the following script

(C:\SQL2012DIRecipes\CH09\SSISConcatenation.vb):

Public Class ScriptMain

Inherits UserComponent



Dim InitValue As String = ""

Dim ControlValue As String = ""

Dim CurrentElement As String = ""

Dim ConcatValue As New StringBuilder

Dim ConcatCharacter As String = ","





www.it-ebooks.info



499



Chapter 9 ■ Data Transformation



Public Overrides Sub PreExecute()

MyBase.PreExecute()

End Sub



Public Overrides Sub PostExecute()

MyBase.PostExecute()

End Sub



Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer)



While Buffer.NextRow()

Input0_ProcessInputRow(Buffer)

End While



If Buffer.EndOfRowset Then



Output0Buffer.AddRow()

Output0Buffer.GroupOutput = ControlValue

Output0Buffer.ListOutput = ConcatValue.Remove(ConcatValue.Length - 1, 

1).ToString

Output0Buffer.SetEndOfRowset()



End If



End Sub



Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)



' If first record - initialise variables

If InitValue = "" Then

InitValue = Row.ClientID.ToString

ControlValue = Row.ClientID.ToString

End If



CurrentElement = Row.InvoiceNumber.ToString

ControlValue = Row.ClientID.ToString



'Process all records

If InitValue = ControlValue Then



ConcatValue = ConcatValue.Append(CurrentElement).Append(ConcatCharacter)



Else

' Write grouping element and concatenated string to new outputs

Output0Buffer.AddRow()

Output0Buffer.ListOutput = ConcatValue.Remove(ConcatValue.Length - 1, 

1).ToString

Output0Buffer.GroupOutput = InitValue





500

www.it-ebooks.info



Chapter 9 ■ Data transformation



InitValue = Row.ClientID.ToString

ConcatValue = ConcatValue.Remove(0, 

ConcatValue.Length).Append(CurrentElement). 

Append(ConcatCharacter)

End If

End Sub

End Class

13.



Close the Script window, and click OK to close the Script Task Editor.



14.



Add an OLEDB destination task, configure it to use the CarSales_Staging_OLEDB

connection manager.



15.



Click New to create a destination table, which you can rename if you wish. Click OK

to confirm the table creation. Next, click Mappings and map the source to destination

columns. Click OK to complete the destination task.



You can now output the data from the Script task directly to a Data Flow destination, or continue to process the

data with further transforms.



How It Works

You may also need to concatenate strings in SSIS, as well. This can be done in a few ways, but I will stick to

explaining a custom SSIS script task, which achieves the desired result quickly and painlessly.

The script performs essentially one task—it processes every record in the input, and checks whether the

“grouping” field (here the ClientID) has changed from the previous record. If this field has not changed, then

the field containing the value to be concatenated (InvoiceNumber) is added to the ConcatValue variable. If it has

changed, a new record is added to the output buffer, and the two fields (ListOutput containing the concatenated

fields, and GroupOutput containing the “grouping” field) are added to the output buffer.

Here I have used a comma as the separator—you may choose the separator used in your source data, and

modify the code accordingly. Should the data not be sorted before it is processed by the script task, you will

have to add a sort task and ensure that it is ordered by (at the very least) the “grouping” field (ClientID in

this example).

The script uses the ProcessInputRow overridden method, which you are probably used to, as it is used to

process all the records in the input buffer as they flow through the script task. You might be less familiar with

the ProcessInputoverridden method. This is required for an asynchronous function, detects the end of the

recordset being processed, and then finalizes any required processing.

You may prefer not to use a StringBuilder (hence the need to reference System.Text) than a string for

concatenation. Be warned, however that StringBuilder can be much faster and more efficient. This is because

that it will not be destroyed and re-created in memory each time that it is changed, as is the case with a string.

However, its efficiency will depend on the number of concatenations and the length of the result. So feel free to

compare and test the two possibilities.



 T

Note this is an asynchronous transform, and so it will not only block the data flow until it has finished, but it will

add considerable memory pressure to the whole ssis package.



501

www.it-ebooks.info



Chapter 9 ■ Data Transformation



9-10. Duplicating Columns

Problem

You need to create duplicate columns in your data.



Solution

Alias the same column multiple times in T-SQL or use the SSIS Derived Column transform.

In T-SQL, creating a duplicate column can be done using the following code snippet:

SELECT

ClientName, ClientName AS EsteemedVisitor

FROM CarSales.dbo.Client;

In SSIS, you can add a Derived Column transform to a data flow and connect it to the Source task (or any

task in the data flow). Double-click the Derived Column transform and add a new column name to the Derived

Column Name field in the grid. Then expand Columns in the top left of the dialog box and drag the appropriate

column name to the Expression field. You can then click OK to close the dialog box. The new column will appear

in the data flow from this task onward.



How It Works

At the risk of stating the obvious, you can duplicate any input column as part of a data transformation with

majestic ease. In T-SQL, all you need to do is alias the column. In SSIS, you use the Derived Column transform

as part of a data flow. Remember that in the latter case, you can replace an existing column or add a new column

to the data flow. You cannot change the data type, however, and will have to do this using a separate Data

Conversion transform.



9-11. Converting Strings to Uppercase or Lowercase

Problem

You wish to convert all or part of a string to uppercase or lowercase as part of a data flow or after data has been

loaded into an SQL Server table.



Solution

Use the UPPER and LOWER functions in T-SQL and SSIS to perform character case conversion.



502

www.it-ebooks.info



Chapter 9 ■ Data Transformation



How It Works

The solution is fortunately extremely simple. You convert strings to uppercase or lowercase with the following

functions:



T-SQL



SSIS



Uppercase

conversion



UPPER(Column or

Column alias)



UPPER(Column

name in dataflow)



Lowercase

conversion



LOWER(Column or

Column alias)



LOWER(Column

name in dataflow)



In SSIS, converting text to uppercase or lowercase can be carried out using a Derived Column transform. If

you take the example given in Recipe 9-12, you can use the SSIS UPPER or LOWER functions on the column selected

as the Expression.

While this may be considered a first step on the primrose path to the everlasting bonfire of data cleansing

(a subject that I wish in large part to avoid because it could easily become the subject of a separate tome that I

have no intention of attempting to write), it is nonetheless worth a short detour into the arena of simple character

transformation—if only to clarify basic techniques. So that is the UPPER and LOWER functions dealt with.



9-12. Converting Strings to Title Case

Problem

You wish to convert all or part of a string to title case, which is where Each First Character Is In Uppercase.



Solution

Title case in SSIS can be generated as part of a script transform. In T-SQL, you are probably best using a CLR

function. Next, I explain how to use both of these options. I presume in this recipe that an SSIS package with data

source and destination already exist.



Title Case Using SSIS

1.



Add a Script component to the Data Flow pane. Select Transformation as the type of

operation. Double click to edit.



2.



Click Input Columns in the left-hand pane. Select the column(s) to be converted to

Proper Case. In this example, it will be the InvoiceNumber column.



3.



Click Inputs and Outputs in the left-hand pane. Click Output Columns, and then

click Add Column to add a column. Name it ProperOut. Ensure that the data type

corresponds to the type and length of the input column.



4.



Click Script in the left-hand pane. Select Microsoft Visual Basic 2010 as the script

language, followed by Edit Script.



503

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9-9. Concatenating Data Using SSIS

Tải bản đầy đủ ngay(0 tr)

×