Tải bản đầy đủ - 0 (trang)
9-21. Normalizing Data into Multiple Relational Tables Using SSIS

9-21. Normalizing Data into Multiple Relational Tables Using SSIS

Tải bản đầy đủ - 0trang

Chapter 9 ■ Data Transformation



4.



Add an OLEDB source task and configure it as follows:

Name:



Denormalized Source



OLEDB Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



SQL Command



SQL Command Text:



SELECT



ClientName, Country, Town



FROM



dbo.DenormalisedSales



GROUP BY



ClientName, Country, Town



ORDER BY



ClientName, Country, Town;



5.



Click OK to confirm your modifications.



6.



Right-click the OLEDB source task and select Show Advanced Editor. Select Input and

Output properties. Set the IsSorted property of the OLEDB Source Output to True.

Expand the OLEDB Source Output/Output columns. Set the SortKeyPosition for the

columns that make up the unique key (ClientName, Country, Town) to 1, 2, and 3.

Confirm your changes with OK.



7.



Add a Lookup transform on to the Data Flow pane, and connect the OLEDB source

task to it. Configure it as follows:

General

Cache Mode:



Full Cache



Connection Type:



OLEDB Connection



Rows with no matching entries:



Redirect to NoMatch output



Connection

Connection Manager:



CarSales_OLEDB



Use the results of an SQL query:



SELECT



ClientName, Country, Town, ID



FROM



dbo.Client



ORDER BY



ClientName, Country, Town;



Columns:



Map the columns ClientName, Country, Town



Available Lookup Columns:



ID



8.



Confirm your changes with OK.



9.



Add an OLEDB destination task to the Data Flow pane. Connect the Lookup task to it

using the NoMatch output. Double-click to edit and configure as follows:

OLEDB Connection Manager:



CarSales_OLEDB



Data Access Mode:



Table or view – Fast Load



Name of Table or View:



dbo.Client



522

www.it-ebooks.info



Chapter 9 ■ Data Transformation



10.



Confirm your changes using OK.



11.



Return to the Control Flow pane and add a second Data Flow task. Name it Invoice

and double-click to edit.



12.



Repeat steps 3 to 6, but with the following modifications:



OLEDB source task (step 3):

SQL Command Text:



SELECT



InvoiceNumber, TotalDiscount, DeliveryCharge,

ClientName, Country, Town



FROM



dbo.DenormalisedSales



GROUP BY InvoiceNumber, TotalDiscount, DeliveryCharge,

ClientName, Country, Town

ORDER BY



ClientName, Country, Town;



Lookup task (step 6):

Rows with no matching entries:



Fail Component



Connection:



Use the results of an SQL query:

SELECT



ClientName, Country, Town, ID



FROM



dbo.Client



ORDER BY



ClientName, Country, Town;



Columns:



Map the columnsClientName, Country, Town



Available Lookup Columns:



ID – output aliasClientID



OLEDB destination task (step 8):

Name of Table or View:



dbo.Invoice



13. Return to the Control Flow pane and add a second Data Flow task. Name it

Invoice_Lines and double-click to edit.

14. Repeat steps 3 to 6, but with the following modifications:



OLEDB source task (step 3):

SQL Command Text:



SELECT



InvoiceNumber, SalePrice, StockID



FROM



dbo.DenormalisedSales



ORDER BY



InvoiceNumber;



523

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Lookup task (step 6):

Rows with no matching entries:



Fail Component



Connection:



Use the results of an SQL query:

SELECT



ID, InvoiceNumber



FROM



dbo.Invoice



ORDER BY



InvoiceNumber;



Columns:



Map the columns InvoiceNumber



Available Lookup Columns:



ID – output aliasInvoiceID



OLEDB destination task (step 8):

Name of Table or View:



dbo.Invoice_Lines



The final package should look like Figure 9-17.



Figure 9-17.  An SSIS data normalization package



How It Works

SSIS can perform essentially the same sequence of operations as T-SQL can to normalize a denormalized data

source. It also means starting from the highest-level entity (client in this example) and working down through

the related entities (via InvoicetoInvoice_Lines, here). Normalization always means understanding the way

that you will break down the source data into a relational structure and how you will map relationships between

the tables. When using SSIS, it is extremely important to have this clear in your mind before starting to create a

package, because it underpins the way that you will be using Lookup transforms to find the foreign keys for

each table.



524

www.it-ebooks.info



Chapter 9 ■ Data Transformation



At the top (Client table) level we are detecting existing clients by using a Lookup transform to read the

existing client data and only add those clients that do not already exist by using the Lookup NoMatch output. If

you are only adding new data, then this step is not necessary.

As in the T-SQL solution described previously, this process starts with the highest-level table in the relational

hierarchy (Clients) and works down through Invoices to Invoice_Lines. First, the client data is extracted.

This is done by selecting unique client data using a GROUP BY clause (step 3). Any new clients are added to the

destination table (Clients). New records are detected by using a Lookup transform where only records that are

not found are allowed through into the destination table. The second Data Flow task isolates any Invoice data

and uses a Lookup task to deduce the ClientID. This will be the foreign key in the relational schema. Finally a

similar process is applied to Invoice_Lines, only here it is the InvoiceID that is found using the Lookup task.



Hints, Tips, and Traps





You cannot use a lookup cache for the client IDs when detecting existing clients and

looking up client IDs for invoices because the data will change once new clients are

added, which makes the cache outdated.



9-22. Denormalizing Data by Referencing Lookup Tables

in T-SQL

Problem

You want to ensure that reference (or lookup) tables are used correctly when denormalizing data in a table that

has already been loaded into SQL Server.



Solution

Make careful use of basic JOINs to ensure that lookup tables are used properly in the data source.

Here I take Recipe 9-21 as an example. If you look at step 3 and presume that the Country field must be used

to obtain the ID of the country from the Countries table, the following T-SQL snippet can be used to replace the

“simple”SELECT that looks up only Client information (C:\SQL2012DIRecipes\CH09\LookupNormalisation.sql):

SELECT DISTINCT

FROM



WHERE NOT EXISTS



DNS.ClientName, DNS.Country, DNS.Town, C.ID AS CountryID

CarSales_Staging.dbo.DenormalisedSales DNS

INNER JOIN CarSales_Staging.dbo.Countries C

ON DNS.Country = C.CountryCode

(

SELECT ID

FROM dbo.Client

WHERE ClientName = DNS.ClientName

AND Country = DNS.Country

AND Town = DNS.Town

);



How It Works

Using this T-SQL instead of the original code in Recipe 9-21 will import the relevant country name instead of its

ID. This is a fairly “classic” requirement to avoid excessive normalization and the over-use of lookup tables when

you consider that the data architecture does not require such a finely-grained level of normalization.



525

www.it-ebooks.info



Chapter 9 ■ Data Transformation



Hints, Tips, and Traps





Should you need to allow NULLs, then use a LEFT OUTER JOIN instead of an INNER JOIN.







You can perform multiple lookups in a single T-SQL statement,of course. However, be

aware that too many of them will slow down the process.



9-23. Denormalizing Data by Referencing Lookup Tables

in SSIS

Problem

You want to ensure that lookup tables are properly referenced during an ETL process using SSIS to avoid

excessive normalization of data.



Solution

Use the Lookup component as part of the data flow, and map the data adequately to use the lookup data in place

of the reference code in the source. The following is an example.

1.



Create a new SSIS package.



2.



Add two new OLEDB connection managers. One named CarSales_OLEDB,which

you configure to connect to the CarSales database, the other named

CarSales_Staging_OLEDB, which you configure to connect to the CarSales_Staging

database.



3.



Add a new Flat File connection manager that you configure to connect to the

C:\SQL2012DIRecipes\CH09\ClientList.Csvfile. Name it ClientList. In the

Advanced tab, set the data types for the three columns as follows:



4.



ID:



Four-byte signed integer



ClientName:



string [DT_STR] – length of 50



ClientCountry:



Single-byte unsigned integer



Create the following destination table (in the CarSales_Staging database):

CREATE TABLE dbo.ClientWithCountry

(

ID numeric(20, 0) NULL,

ClientName VARCHAR(50) NULL,

CountryName_EN NVARCHAR(50) NULL

);

GO



526

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9-21. Normalizing Data into Multiple Relational Tables Using SSIS

Tải bản đầy đủ ngay(0 tr)

×