Tải bản đầy đủ - 0 (trang)
9-29. Handling Type 4 Slowly Changing Dimensions with SSIS

9-29. Handling Type 4 Slowly Changing Dimensions with SSIS

Tải bản đầy đủ - 0trang

Chapter 9 ■ Data Transformation



Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

);

GO

2.



Create a Type 4 history table, as described in Recipe 9-28, as well as the Client_SCD4

Type 1 table, also described in Recipe 9-28.



3.



Carry out steps 1 to 10 in the previous recipe, only the code for the temporary

table in step 4 will be as follows

(C:\SQL2012DIRecipes\CH09\tblTmpSessionSCD4.sql):

CREATE TABLE ##Tmp_SCD4

(

ID INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

);



4.



Ensure that the variable name for the temp table in step 2 is Tmp_SCD4.



5.



Add an OLEDB Destination and connect the Lookup transform to it using the Lookup

NoMatch output. Configure it as follows:

OLEDB Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



Table or View – Fast Load



Name of Table or View:



dbo.Client_SCD



6.



Once you have ensured that the columns are mapped, confirm with OK.



7.



Add a Derived Column transform, connect the Multicast transform to it, and add a

derived column, like this:

Derived Column Name:



ValidFrom



Expression:



@[User::ValidTo]



549

www.it-ebooks.info



Chapter 9 ■ Data Transformation



8.



9.



Add an OLEDB Destination and connect the Derived Column transform to it.

Configure it as follows:

OLEDB Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



Table or View – Fast Load



Name of Table or View:



dbo.Client_SCD4_History



Once you have ensured that the columns are mapped using the columns from the

dimension table (this is very important), confirm with OK. The column mapping

should look like Figure 9-25 (the source ID is the business key in this example).



Figure 9-25.  Dimension column mapping

10.



Add an OLEDB Destination and connect the Derived Column transform to it.

Configure it as follows:

OLEDB Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



Table or View – Fast Load



Name of Table or View:



Tmp_SCD4



550

www.it-ebooks.info



Chapter 9 ■ Data transformation



11.



Once you have ensured that the columns are mapped using the data from the data

source (this, too, is extremely important), confirm with OK.



12.



Return to the Control Flow pane and add an Execute SQL task. Name it Update

Dimension table. Connect the previous Data Flow task to it, and configure it as

follows:

Connection:



CarSales_Staging_OLEDB

UPDATE



DIM



SET

SCD2.ValidFrom = YEAR(DATEADD(d,-1,GETDATE())) * 100000

+ MONTH(DATEADD(d,-1,GETDATE())) * 1000

+ DAY(DATEADD(d,-1,GETDATE()))

,DIM.ClientName = TMP.ClientName

,DIM.Country = TMP.Country

,DIM.Town = TMP.Town

,DIM.County = TMP.County

,DIM.Address1 = TMP.Address1

,DIM.Address2 = TMP.Address2

,DIM.ClientType = TMP.ClientType

,DIM.ClientSize = TMP.ClientSize

FROM



dbo.Client_SCD4 DIM



INNER JOIN Tmp_SCD4 TMP

ON DIM.ClientID = TMP.ID



How It Works

As SCDs of types 2 and 4 are tougher, they require some adroit SSIS programming to be both efficient (especially

with large dimensions) and maintainable. As you can see, the approaches given in this recipe and the previous

one attempt to balance these two conflicting demands. Nonetheless, do not assume that these are the only

ways of dealing with the problem of SCDs in SSIS, and feel free to extend and tweak them to suit your specific

requirements.

Using SSIS to maintain a Type 4 slowly changing dimension is largely an extension of the techniques

described earlier (in Recipe 9-26) to maintain a Type 2 SCD. So I have not described every step, but merely

highlighted the differences. Figure 9-26 shows what the finished package looks like.



551

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Figure 9-26.  High-level data flow for an SSIS Type 4 SCD

And the Data Flow task looks like Figure 9-27.



Figure 9-27.  Data flow detail for an SSIS Type 4 SCD



552

www.it-ebooks.info



Chapter 9 ■ Data Transformation



The package does the following:





Gets the source data, maps to the dimension table using the business key, and sends any

non matched (new) records directly to the dimension table.







Detects any records that already exist in the dimension table that are different from the

source records. As both source and existing dimension data are in the data flow, changed

(historical) records can be sent directly to the dimension history table.







New (changed) data for existing dimension records are sent to a temporary table.







Finally, the dimension table is updated with new data for existing records.



Once the package has run, and you are happy with it, you can do the following:





Change the variable value for TempTable to ##Tmp_SCD4.







Alter the reference in the task Update SCD Type 4 table so that the temp table used is the

session-scoped temporary table. The code needs to be tweaked to use:

INNER JOIN ##Tmp_SCD4 TMP







Delete the dbo.Tmp_SCD4 table in the CarSales database.



9-30. Cleansing Data As Part of an ETL Process

Problem

You wish to cleanse data as part of an ETL process using SSIS.



Solution

Use SQL Server Data Quality Services in SSIS 2012, as follows.

1.



Ensure that Data Quality Services is installed and running on an SQL Server 2012

instance.



2.



Create a new SSIS package and add the following two connection managers (at

project or package level):



Name



Type



Data Source



Comments



CarSales_Staging_OLEDB



OLEDB



CarSales_Staging



The connection for the source data.



CarSales_OLEDB



OLEDB



CarSales



The connection for the destination database.



3.



Add a new Data Flow task and switch to the Data Flow pane.



553

www.it-ebooks.info



Chapter 9 ■ Data Transformation



4.



Add an OLEDB source and configure as follows:

Name:



Car Sales



Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



Table or view



Name of Table or View:



CarColoursForDQSInSSIS



5.



Add a DQS Cleansing task, name it DQS Cleansing, and connect the data source that

you just created to it. Double-click to edit.



6.



Click New to create a Data Quality connection manager. Select the DQS server name

from the pop-up list of available DQS servers, and then click OK.



7.



Select the DQS Knowledge Base containing the domain that you wish to use. The

dialog box should look like Figure 9-28.



Figure 9-28.  Configuring the DQS connection manager



554

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9-29. Handling Type 4 Slowly Changing Dimensions with SSIS

Tải bản đầy đủ ngay(0 tr)

×