Tải bản đầy đủ - 0 (trang)
10-13. Performing Multiple Domain Analyses

10-13. Performing Multiple Domain Analyses

Tải bản đầy đủ - 0trang

Chapter 10 ■ Data Profiling

Figure 10-18.  Multiple domain analysis using the Aggregate transform


Confirm with OK.


Add a new Derived Column task to the Data Flow pane. Connect this to the Aggregate

task. Add the following columns:


Set the expression as Marques


Set the expression as Stock.Txt


Set the expression as (DT_WSTR,260)Marques



Chapter 10 ■ Data Profiling


Add an OLEDB destination task to the Data Flow pane. Connect the Derived Columns

task you just created to it. If there are multiple outputs suggested, ensure that you

select the Marques output.


Double-click the OLEDB destination task and configure it to connect to an SQL Server

database. Map to the DataProfilingDomainAnalysistable. Map the available input

columns to the available output columns in the Mappings pane, as was done previously

(but using the Marques derived column rather than Product_Type). Click OK.

Your data flow should look like Figure 10-19.

Figure 10-19.  Data flow of multiple domain analyses

How It Works

There is a slight layer of complication that is imposed when you wish to carry out Multiple Domain Analyses as

part of the same data flow. Essentially what you have to do is extend the Aggregate task Domain Analysis, so that

it provides multiple outputs—one for each domain for which you wish to return profile data.

Hints, Tips, and Traps

This way you can add as many domain analyses as you wish to a data flow. All will execute

in parallel to the attribute analysis.

It now becomes incredibly easy to profile data at the same time that you load it—or stage

it. As you already have a multicast task in the data flow, all you have to do is use another

output from this to load and transform your data.

In this recipe—and in Recipes 10-11 and 10-12—you can load the data as well as profiling

it. All you need to do is add an OLEDB destination task, which you configure to use

a destination table that maps to the input data columns, and connects the Multicast

transform to it.



Chapter 10 ■ Data profiling

10-14. Pattern Profiling in a Data Flow


You want to produce pattern profiles as part of a data flow.


Use an SSIS Script task to profile patterns in a data flow to see how the letters and numbers in a field are

represented and formatted, without looking at the exact text or numbers used. The following steps explain how.


Create a new SSIS package. Add a Data Flow task to the Control Flow pane.


Add an OLEDB connection manager named CarSales_OLEDB, which you configure

to connect to the CarSales source database.


Add an OLEDB Connection manager named CarSales_Staging_OLEDB, which you

configure to connect to the CarSales_Staging destination database.


Double-click to edit (or click the Data Flow tab).


Add a Data Source task. Configure to connect to the CarSales_OLEDB connection

manager. Connect to the Stock table.


Add an SSIS Script transformation task to the Data Flow pane of your SSIS task.

Connect it to the data source.


Double-click the Script transformation task. Select the column(s) that you will be

profiling. I will use Make in this example.


Click Inputs and Outputs in the left-hand column. Expand Output 0 and click Add



Rename the column appropriately (I am calling it Car here). Set its data type to String

in the Data Type Properties.


Select Script in the left-hand column. Click the Design Script button to enter the

Script editor.


Add the following line to the Imports Region:

Imports System.Text.RegularExpressions


Replace the Input0_ProcessInputRow method with the following code:

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Dim CarRow As String

Dim regexTxt As System.Text.RegularExpressions.Regex = 

New System.Text.RegularExpressions.Regex("[A-Z]", 


Dim regexNum As System.Text.RegularExpressions.Regex = 

New System.Text.RegularExpressions.Regex("[0-9]")



Chapter 10 ■ Data Profiling

CarRow = CStr(Row.Car)

CarRow = regexTxt.Replace(CarRow, "X")

CarRow = regexNum.Replace(CarRow, "N")

Row.CarPattern = CarRow

End Sub

13. Close the Script editor. Click OK to close the Script Transformation Editor.

14. Add an Aggregate transform task onto the Data Flow pane. Connect the Script task to

it. Double-click the Aggregate transform task to edit it.

15. Select the column you added in step 7. Make sure that Group By is selected as the


16. Click OK to close the Aggregation task.

17. Add a destination task onto the Data Flow pane. Connect the Aggregation task to it.

Configure this task to output the profile pattern data using the CarSales_Staging_

OLEDB connection manager. Create a new table named CarPatternProfile by

clicking New in the OLEDB destination editor, using the suggested columns. The final

SSIS data flow should look like Figure 10-20.

Figure 10-20.  The final SSIS data flow for pattern profiling



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

10-13. Performing Multiple Domain Analyses

Tải bản đầy đủ ngay(0 tr)