Tải bản đầy đủ - 0 (trang)
9-23. Denormalizing Data by Referencing Lookup Tables in SSIS

9-23. Denormalizing Data by Referencing Lookup Tables in SSIS

Tải bản đầy đủ - 0trang

Chapter 9 ■ Data Transformation



5.



Add a Data Flow task and switch to the Data Flow pane.



6.



Add a Flat File source component and rename it ClientList. Double-click to edit.



7.



Configure it to point to the Flat File connection manager ClientList. Click OK to close

the Flat File Source Editor.



8.



Add a Sort task and connect the Flat File source component to it. Name it Sort

Clients. Double-click to edit. In Available Input Columns, check ID to sort on the ID.

Click OK to close the Sort Transformation Editor.



9.



Add a Lookup task and connect the Sort Clients task to it. Name it Lookup Country

Name. Double-click to edit.



10.



11.



12.



In the General tab, set the following options:

Cache Mode:



Full Cache



Connection Type:



OLEDB Connection Manager



Specify how to handle rows with no matching entries:



Redirect Rows to NoMatch output



In the Connection tab, set the following options:

OLEDB Connection Manager:



CarSales_OLEDB



Use the results of an SQL Query:



SELECT



CountryID, CountryName_EN



FROM



Countries



ORDER BY



CountryID



In the Columns tab, drag ClientCountry from the Available Input Columns to link to

CountryID in the available Lookup columns, as shown in Figure 9-18.



527

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Figure 9-18.  Lookup column in the Lookup Transformation Editor

13.



Click OK to confirm your modifications.



14.



Add a Derived Column task and connect the Lookup task to it. When prompted, select

the Lookup NoMatch output. Double-click to edit the Derived Column task.



15.



In the grid in the lower half of the Derived Column Transformation Editor dialog

box, add a Derived column named CountryName_EN. Set its Expression to N/A (in

double-quotes). Click OK to confirm your modifications.



16.



Add a Merge task to the Data flow pane. Connect the Lookup Country Name task to it.

When prompted, ensure that the Input is Merge Input 1 and that the Output is Lookup

Match Output. Click OK to confirm this.



17.



Connect the Derived Column transform to the Merge task.



18.



Add an OLEDB Destination task. Name it Client With Country. Double-click to edit

and configure as follows:



528

www.it-ebooks.info



Chapter 9 ■ Data Transformation



OLEDB Connection Manager:



CarSales_Staging



Data Access Mode:



Table or View – Fast Load



Name of Table or View:



dbo.ClientWithCountry



19.



Click Mappings on the left and ensure that each source field is mapped to the

corresponding destination field with the same name. Note that ClientCountry (the

source country ID field) is not mapped.



20.



Click OK to finish your modifications. The data flow should look like Figure 9-19.



Figure 9-19.  The completed data flow when denormalizing data using a Lookup task

You can now run the package and import and denormalize the data.



How It Works

This process uses a reference column (ClientCountry in the source file) and uses this to look up the

corresponding country in a reference table (CarSales.Dbo.Countries). As this approach will exclude any source

records without a corresponding reference element, the NoMatch output is also used, and merged with the

“match” output to ensure that all records are sent to the destination. In effect, this process replaces an ID for a

country with the country name.



529

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Hints, Tips, and Traps





When mapping a TINYINT data type in an SQL Server to SSIS, you must set the SSIS data

type to a single-byte unsigned integer.



9-24. Processing Type 1 Slowly Changing Dimensions (SCDs)

Using T-SQL

Problem

You need to ensure that all the source data in your process is updated in the destination table and any new

records added.



Solution

Use the T-SQL MERGE command to both add new records and update existing records. The following steps

describe how to do this.

1.



Suppose that you have a destination table containing all the values required, a

business key (the client ID from the source data), and a surrogate key that will

be used for data warehousing. The table’s DDL follows

(C:\SQL2012DIRecipes\CH09\tblClient_SCD1.sql):

CREATE TABLE CarSales_Staging.dbo.Client_SCD1

(

ClientID INT IDENTITY(1,1) NOT NULL,

BusinessKey INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

) ;

GO



2.



You can then run this snippet of T-SQLfrom the CarSales_Staging database

(C:\SQL2012DIRecipes\CH09\SCD1.sql):

USE CarSales_Staging;

GO



MERGE

CarSales_Staging.dbo.Client_SCD1

USING

CarSales.dbo.Client

AS SRC

ON

(SRC.ID = DST.BusinessKey)

WHEN NOT MATCHED THEN





530

www.it-ebooks.info



AS DST



Chapter 9 ■ Data transformation



INSERT (BusinessKey, ClientName, Country, Town, County, Address1, Address2,

ClientType, ClientSize)

VALUES (SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.County, Address1,

Address2, ClientType, ClientSize)

WHEN MATCHED

AND (

ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'')

OR ISNULL(DST.Country,'') <> ISNULL(SRC.Country,'')

OR ISNULL(DST.Town,'') <> ISNULL(SRC.Town,'')

OR ISNULL(DST.Address1,'') <> ISNULL(SRC.Address1,'')

OR ISNULL(DST.Address2,'') <> ISNULL(SRC.Address2,'')

OR ISNULL(DST.ClientType,'') <> ISNULL(SRC.ClientType,'')

OR ISNULL(DST.ClientSize,'') <> ISNULL(SRC.ClientSize,'')

)

THEN UPDATE

SET

DST.ClientName = SRC.ClientName

,DST.Country = SRC.Country

,DST.Town = SRC.Town

,DST.Address1 = SRC.Address1

,DST.Address2 = SRC.Address2

,DST.ClientType = SRC.ClientType

,DST.ClientSize = SRC.ClientSize

;



How It Works

Ever since SQL Server 2008 introduced the MERGE command, it has been possible to carry out UPSERTs—Inserts,

Deletes, and Updates—using a single command. It is also worth noting that the upsert techniques used to handle

slowly changing dimensions are not restricted to the world of data warehousing. A Type 1 SCD is nothing other

than an UPSERT, and so is useful in an unlimited range of scenarios.

Of all the SCD types, a Type 1 slowly changing dimension is by far the easiest to handle, as it consists of a

simple in-place update of existing data, with no attempt to track the evolution of the changes.

The SQL snippet used will map the two tables on the business key. It will also do the following:





Insert a new record into the destination table if the record referenced by the business

key if the key is not already present (WHEN NOT MATCHED), as well as adding an autoincremented surrogate key.







Update the other fields if any of them are different between the source and destination

tables (WHEN MATCHED AND. . .).



I am taking the dbo.Clienttable as the source data from the CarSales database, and updating the data,

suitably transformed, in the CarSales_Staging.dbo.Client table. This example presumes that the business key

is a unique primary key. I am presuming that we will not be using WHEN NOT MATCHED to delete dimension data,

and will only look at UPSERTing data—that is inserting and updating dimension data.



531

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9-23. Denormalizing Data by Referencing Lookup Tables in SSIS

Tải bản đầy đủ ngay(0 tr)

×