Tải bản đầy đủ - 0 (trang)
9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL

9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL

Tải bản đầy đủ - 0trang

Chapter 9 ■ Data Transformation



Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL,

ValidTo INT,

HistoricalVersion INT

) ;

GO



CREATE TABLE CarSales_Staging.dbo.Client_SCD4

(

ClientID INT IDENTITY(1,1) NOT NULL,

BusinessKey INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

) ;

GO

2.



Run the following code (C:\SQL2012DIRecipes\CH09\SCD4.sql):

USE CarSales_Staging;

GO



-- Define the dates used in validity - assume whole 24 hour cycles

DECLARE @Yesterday INT = CAST(CAST(YEAR(DATEADD(dd,-1,GETDATE())) AS CHAR(4))

+ RIGHT('0' + CAST(MONTH(DATEADD(dd,-1,GETDATE())) AS VARCHAR(2)),2)

+ RIGHT('0' + CAST(DAY(DATEADD(dd,-1,GETDATE())) AS VARCHAR(2)),2) AS INT)

DECLARE @Today INT = CAST(CAST(YEAR(GETDATE()) AS CHAR(4))

+ RIGHT('0' + CAST(MONTH(GETDATE()) AS VARCHAR(2)),2)

+ RIGHT('0' + CAST(DAY(GETDATE()) AS VARCHAR(2)),2) AS INT);



-- Drop temp table, if it exists



IF OBJECT_ID('Tempdb..#Tmp_Client') IS NOT NULL DROP TABLE #Tmp_Client;



CREATE TABLE #Tmp_Client

(

BusinessKey INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL,

) ;





545

www.it-ebooks.info



Chapter 9 ■ Data Transformation



-- Outer insert for Type 4 records if an existing record changes, using output from

MERGE into temp table

INSERT INTO #Tmp_Client (BusinessKey, ClientName, Country, Town, Address1, Address2,

ClientType, ClientSize)



SELECT BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType,

ClientSize

FROM



(

-- Merge statement

MERGE

CarSales_Staging.dbo.Client_SCD4

AS DST

USING

CarSales.dbo.Client

AS SRC

ON

(SRC.ID = DST.BusinessKey)



WHEN NOT MATCHED THEN



INSERT (BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType,

ClientSize)

VALUES (SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.Address1, SRC.Address2,

SRC.ClientType, SRC.ClientSize)



WHEN MATCHED

AND

ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'')

OR ISNULL(DST.Country,'') <> ISNULL(SRC.Country,'')

OR ISNULL(DST.Town,'') <> ISNULL(SRC.Town,'')

OR ISNULL(DST.Address1,'') <> ISNULL(SRC.Address1,'')

OR ISNULL(DST.Address2,'') <> ISNULL(SRC.Address2,'')

OR ISNULL(DST.ClientType,'') <> ISNULL(SRC.ClientType,'')

OR ISNULL(DST.ClientSize,'') <> ISNULL(SRC.ClientSize,'')



THEN UPDATE



SET

DST.ClientName = SRC.ClientName

,DST.Country = SRC.Country

,DST.Town = SRC.Town

,DST.Address1 = SRC.Address1

,DST.Address2 = SRC.Address2

,DST.ClientType = SRC.ClientType

,DST.ClientSize = SRC.ClientSize





OUTPUT DELETED.BusinessKey, DELETED.ClientName, DELETED.Country, DELETED.Town,

DELETED.Address1, DELETED.Address2, DELETED.ClientType, DELETED.ClientSize, $Action

AS MergeAction

) AS MRG

WHERE MRG.MergeAction = 'UPDATE'

;





546

www.it-ebooks.info



Chapter 9 ■ Data Transformation



-- Update history table to set final date, version number



UPDATE

TP4



SET

TP4.ValidFrom = @Yesterday



FROM

CarSales_Staging.dbo.Client_SCD4_History TP4

INNER JOIN #Tmp_Client TMP

ON TP4.BusinessKey = TMP.BusinessKey



WHERE

TP4.ValidFrom IS NULL;





-- Add latest history records to history table



INSERT INTO CarSales_Staging.dbo.Client_SCD4_History

(

BusinessKey

,ClientName

,Country

,Town

,County

,Address1

,Address2

,ClientType

,ClientSize

,ValidFrom

,HistoricalVersion

)



SELECT

BusinessKey

,ClientName

,Country

,Town

,County

,Address1

,Address2

,ClientType

,ClientSize

,@Today

,(SELECT ISNULL(MAX(HistoricalVersion),0) + 1 AS HistoricalVersion

FROM dbo.Client_SCD4_History WHERE BusinessKey = Tmp.BusinessKey)



FROM

#Tmp_Client Tmp;



How It Works

The final variation on this theme that I have shown you here, is a Type 4 SCD. This is, basically, a Type 2 table

with a separate history table for the previous versions of the data. Type 4 is a multitable split, where the most

recent data is stored in an “active” table, and older data in a History table.



547

www.it-ebooks.info



Chapter 9 ■ Data Transformation



This recipe requires:





A destination table containing all the values required, a business key (the client ID

from the source data) and a surrogate key that will be used for data warehousing. This

is identical in all respects to the Client_SCD1 table described earlier. Here, however,

I will duplicate it and name it Client_SCD4 for the sake of clarity. Once again, all the

destination tables are in the CarSales_Staging database, but could be in the same

database as the source or even on another server.







A destination table for the historical versions of the dimension data. This table is

identical to the dimension table, and contains two additional fields: ValidTo and

HistoricalVersion. The former stores the date when the data ceased to be current, the

latter provides a version number for the data that can help to track how frequently

data evolves.



The SQL snippet maps the source and main destination tables on the business key while taking the source

data from the CarSales_Staging database. It also does the following.





Inserts a new record if one does not exist for the business key.







If one exists:





Moves the old record to the historical table.







Adds a new record to the main destination table.



Note the use of the DELETED table to get the previous value of data from the table Client_SCD4, rather than

the current value that is returned by default. Also, rather than use the OUTPUT clause to return data from the

MERGE command to a session-scoped temporary table, you can insert the data into a table variable, as part of the

MERGE statement. However, temporary tables can be more efficient to use (especially if you index them) for larger

data sets, so I prefer to use them in ETL processes, unless I can be reasonably sure that there will only be a few

hundred records output at most.



9-29. Handling Type 4 Slowly Changing Dimensions with SSIS

Problem

As part of an ETL flow, you need to store current data in one table and historical data in a second table as part of

an SSIS data flow.



Solution

Use the SSIS Multicast and conditional split tasks to process a Type 4 SCD. The following steps go over how to

do it.

1.



Create a temp table on disk (for testing and debugging) that will eventually be

replaced by a session-scoped temporary table—as was done in Recipe 9-28. The DDL

for this is (C:\SQL2012DIRecipes\CH09\tblTmpSCD4.sql):

CREATE TABLE CarSales_Staging.dbo.Tmp_SCD4

(

ID INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,



548

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

);

GO

2.



Create a Type 4 history table, as described in Recipe 9-28, as well as the Client_SCD4

Type 1 table, also described in Recipe 9-28.



3.



Carry out steps 1 to 10 in the previous recipe, only the code for the temporary

table in step 4 will be as follows

(C:\SQL2012DIRecipes\CH09\tblTmpSessionSCD4.sql):

CREATE TABLE ##Tmp_SCD4

(

ID INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

);



4.



Ensure that the variable name for the temp table in step 2 is Tmp_SCD4.



5.



Add an OLEDB Destination and connect the Lookup transform to it using the Lookup

NoMatch output. Configure it as follows:

OLEDB Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



Table or View – Fast Load



Name of Table or View:



dbo.Client_SCD



6.



Once you have ensured that the columns are mapped, confirm with OK.



7.



Add a Derived Column transform, connect the Multicast transform to it, and add a

derived column, like this:

Derived Column Name:



ValidFrom



Expression:



@[User::ValidTo]



549

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL

Tải bản đầy đủ ngay(0 tr)

×