Tải bản đầy đủ - 0 (trang)
11-7. Detecting Data Changes with Limited Source Database Access

11-7. Detecting Data Changes with Limited Source Database Access

Tải bản đầy đủ - 0trang

Chapter 11 ■ Delta Data Management



1.



Create a new SSIS package. Add two OLEDB connection managers named CarSales_

Staging_OLEDB (which connects to the CarSales_Staging database) and CarSales_

OLEDB (which connects to the CarSales database). Set the RetainSameConnection

property to True for the CarSales_Staging_OLEDB connection manager.



2.



Add the following variables:



Variable Name



Type



Value



UpdateCounter



INT16



InsertCounter



INT16



DeleteTable



String



##TMP_Deletes



InsertTable



String



##TMP_Inserts



UpdateTable



String



##TMP_Updates



DeltaInserts



Object



DeltaUpdates



Object



InsertSQL



String



SELECT



ID, InvoiceID, StockID,

SalePrice, HashData



UpdateSQL



String



FROM



dbo.Invoice_Lines



SELECT



SRC.ID, SRC.InvoiceID,

SRC.StockID, SRC.SalePrice,

SRC.HashData



FROM

UpdateDataTable

3.



String



dbo.Invoice_Lines SRC



##Invoice_Lines_Updates



Add an Execute SQL task. Name it Create Temp tables on Destination. Set the

Connection to CarSales_Staging_OLEDB. Set the SQL Statement to the following

(C:\SQL2012DIRecipes\CH11\CreateTempTablesOnDestination.Sql):

IF OBJECT_ID('TempDB..##TMP_DELETES') IS NULL

CREATE TABLE TempDB..##TMP_DELETES (ID INT);

IF OBJECT_ID('TempDB..##Invoice_Lines_Updates') IS NULL

CREATE TABLE ##Invoice_Lines_Updates

(

ID INT NOT NULL,

InvoiceID INT NULL,

StockID INT NULL,

SalePrice NUMERIC(18, 2) NULL,

VersionStamp VARBINARY(8) NULL,

HashData VARCHAR(50) NULL

);



659

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



4.



Add a new Data Flow task and connect the previous task (Create Temp tables on

Destination) to it. Configure it so that it looks like Figure 11-16. See Recipe 11-1 for

details on how to do this.



Figure 11-16.  Data flow for detecting changes at the destination database

5.



Add two Row Count transforms. Connect one to the “Lookup matching IDs” Lookup

transform—using the NoMatch output—and name it Insert Count. Configure it to

use the User::InsertCounter variable. Connect the other to the Detect Hash Deltas

Conditional Split transform using the HashData Differences output. Configure it to

use the User::UpdateCounter variable. Name it Update Count.



6.



Add a Recordset destination to the data flow. Connect the Row Counter Update Count.

Double-click to edit and set the variable name to User::DeltaUpdates. Click the Input

Columns tab and select the ID column. This will direct the IDs to the ADO Recordset.



7.



Add a second Recordset destination to the data flow. Connect the Row Counter Insert

Count. Double-click to edit and set the variable name to User::DeltaInserts. Click

the Input Columns tab and select the ID column. This will direct the IDs to the ADO

Recordset. The Delta Detection Data Flow task should now look like Figure 11-17.



Figure 11-17.  Data flow when returning changes to the source database



660

www.it-ebooks.info



Chapter 11 ■ Delta Data ManageMent



8.



The Data Flow task is preparing all three data manipulation processes—Insert,

Update, and Delete—by isolating the IDs required in the source data for each of these

three operations. We will now see how they are used.



9.



Add an Execute SQL task, which you name Delete Data, and to which you connect the

“Delta Detection” Data Flow task. Configure it to use the CarSales_Staging_OLEDB

configuration manager. Set the SQLStatement to the following:

DELETE

FROM

WHERE



dbo.Invoice_Lines

ID IN (

SELECT

FROM



DST.ID

dbo.Invoice_Lines DST

LEFT OUTER JOIN ##TMP_Deletes TMP

ON DST.ID = TMP.ID

TMP.ID IS NULL



WHERE

)

10.



Set the ByPassPrepare property to True.



11.



Add a Sequence container. Name it Inserts and connect the Delete Data task to it. Inside

this container, add a Script task, which you name Set SELECT for Inserts. Set the read-only

variables to User::DeltaInserts. Set the read-write variables to User::InsertSQL.



12.



Click the Edit Script button.



13.



Add the following Imports directives:

Imports System.Data.OleDb

Imports System.Text



14.



Replace the Main method with the following script

(C:\SQL2012DIRecipes\CH11\11-7_WhereClauseInserts.vb):

Public Sub Main()

Dim SB As New StringBuilder

Dim

Dim

Dim

Dim



DA As New OleDbDataAdapter

DT As New DataTable

RW As DataRow

sMsg As String = ""



DA.Fill(DT, Dts.Variables("DeltaInserts").Value)

For Each RW In DT.Rows

SB.Append(RW(0).ToString & ",")

Next

Dts.Variables("InsertSQL").Value = Dts.Variables("InsertSQL").Value.ToString 

& " WHERE ID IN (" & SB.ToString.TrimEnd(",") & ")"

Dts.TaskResult = ScriptResults.Success

End Sub



661

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



15.



Close the script and click OK.



16.



Add a Data Flow task, name it Insert Data, and connect the Script task to it. Doubleclick the precedence constraint and configure it like Figure 11-18.



Figure 11-18.  The Precedence Constraint dialog box

17.



Edit the Data Flow task. Add an OLEDB source and configure it like this:

OLEDB Connection Manager:



CarSales_OLEDB



Data Access Mode:



SQL Command from Variable



Variable Name:



InsertSQL



18.



Set the ValidateExternalMetadata property to False. Click Columns and select all the

source columns.



19.



Add an OLEDB destination, configure it to use the CarSales_Staging_OLEDB connection

manager and to load data into the dbo.Invoice_Lines table. Map all the columns.



20.



Add a Sequence container, name it Updates and connect the Inserts Sequence

container to it. Inside this container, add a Script task, which you name Set SELECT for

Updates. Set the read-only variables to User::DeltaUpdates. Set the read-write variables

to User::UpdateSQL. Click the Edit Script button.



662

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



21.



Add the following to the Imports region:

Imports System.Data.OleDb

Imports System.Text



22.



Replace the Main procedure with the following code

(C:\SQL2012DIRecipes\CH11\11-7_WhereClauseDeletes.vb):

Public Sub Main()

Dim SB As New StringBuilder

Dim

Dim

Dim

Dim



DA As New OleDbDataAdapter

DT As New DataTable

RW As DataRow

sMsg As String = ""



DA.Fill(DT, Dts.Variables("DeltaUpdates").Value)

For Each RW In DT.Rows

SB.Append(RW(0).ToString & ",")

Next

Dts.Variables("UpdateSQL").Value = Dts.Variables("UpdateSQL").Value.ToString 

& " WHERE ID IN (" & SB.ToString.TrimEnd(",") & ")"

Dts.TaskResult = ScriptResults.Success

End Sub

23.



Close the script and click OK.



24.



Add a Data Flow task, name it Insert Data, and connect the Script task to it. Doubleclick the precedence constraint and configure it as in step 9, but with the expression

as @UpdateCounter > 0.



25.



Edit the Data Flow task. Add an OLEDB source and configure it like this:

OLEDB Connection Manager:



CarSales_OLEDB



Data Access Mode:



SQL Command from Variable



Variable Name:



UpdateSQL



26.



Set the ValidateExternalMetadata property to False. Click Columns and select all the

source columns.



27.



Add an OLEDB destination. Configure it as follows:

Connection Manager:



CarSales_Staging_OLEDB



Data Access Mode:



SQL Command from Variable



Variable Name:



UpdateDataTable



663

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



28.



Click Columns and map all the columns.



29.



Return to the Data Flow pane and add an Execute SQL task. Name it Carry out updates

and configure as follows:

Connection Type:



OLEDB



Connection:



CarSales_Staging_OLEDB



SQL Statement:



UPDATE DST

SET

DST.InvoiceID = UPD.InvoiceID

,DST.SalePrice = UPD.SalePrice

,DST.StockID = UPD.StockID

,DST.HashData = UPD.HashData

FROM



dbo.Invoice_Lines DST

INNER JOIN



##Invoice_Lines_Updates UPD



ON DST.ID = UPD.ID

30.



Set the ValidateExternalMetadata property to False. Click OK to finish. The completed

package should look like Figure 11-19.



664

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



Figure 11-19.  The complete process to isolate deltas at the destination and return

queries to the source for delta data



How It Works

There are times when no permissions are given on the source database except the right to read the source table(s)

from which you will be extracting data. This effectively rules out using temporary tables, or probably passing a

table-valued parameter to a stored procedure as described in the preceding recipe. So, are there any other valid

alternatives to extracting a huge dataset to a staging table and performing delta comparison in a staging database?

Fortunately, there is one viable alternative, but I must stress that it is only of any real use when dealing with

relatively small deltas. This technique consists of detecting the delta identifiers, and then passing them back as

part of the SELECT clause for the data extraction.

As this is largely an extension of the technique described in the Recipe 11-6, I will attempt to be succinct

in the description of techniques that are described more fully in that recipe in order to concentrate here on the

different way of passing back the delta IDs to the source database. This take on the problem will use temporary

tables on the destination server to hold the IDs for all records that need deleting, and all the data for records

that need updating. Delta data will be detected using a hash column. However, at this point, whereas delta IDs



665

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



for deletes will be used to populate the ##Tmp_Deletes table, IDs for insertions and updates will each populate

an ADO recordset. These IDs will then be used as the WHERE clause of the SQL SELECT statement sent back to the

source server to get the records for inserts (which will be loaded directly into the destination table), and also for

updates, which will be loaded into the temp table ##Tmp_Updates (used to perform updates in the destination

table as a single set-based operation).

Figure 11-20 illustrates a process overview.



Figure 11-20.  Process to detect delta data at the destination and return selective requests to the source

The approach described in this recipe is best used in the following circumstances:





When you have nothing other than SELECT permissions on the source data table.







When you expect a very small set of delta data.







When the source data set is large.



The core of the approach is that in both the Insert and Update data processes an SSIS object variable is used

to fill an ADO.NET Data Table. Each row in this data table is an ID that must be loaded from the source data, and

so each one is added to a stringbuilder to concatenate a list that will become part of the WHERE clause of the SQL

statement sent to the source data table.



Hints, Tips, and Traps





As the package described in this recipe will use temporary tables, you have to use the

“scaffolding” approach described in Recipe 11-5. That is, create normal tables in the

destination database for the temp tables ##TMP_Deletes, ##TMP_Inserts, ##TMP_Updates,

and ##Invoice_Lines_Updates while creating the SSIS package. Then replace the

references to the real tables with references to temp tables in the SQL and variables, and

delete the tables in the destination database.







It is important to use the counters for the Insert and Update flows, as otherwise invalid

SQL is sent back to the source database, causing the process to fail.







To repeat, this approach is not suited for enormous deltas. However, fast and efficient as the

stringbuilder is (at least compared to string variables), it is slow and heavy in a data flow.



11-8. Detecting and Loading Delta Data Using T-SQL

and a Linked Server When MERGE Is Not Practical

Problem

You want to detect delta data at a linked source server and transfer only the delta data to the destination without

using MERGE. This could be because using MERGE is slower than a table-based solution.



666

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



Solution

Detect delta identifiers at source, compare at destination, and then request only the delta data for upserts. This is

how it is done - presuming that you have a linked server named ADAMREMOTE containing the CarSales database

and the dbo.Invoice_Lines table. The code is in (C:\SQL2012DIRecipes\CH11\TableMergeReplacement.Sql).

IF OBJECT_ID('TempDB..#Upsert') IS NOT NULL DROP TABLE #Upsert

SELECT

INTO

FROM



ID, VersionStamp

#Upsert

ADAMREMOTE.CarSales.dbo.Invoice_Lines



-- Inserts

;

WITH Inserts_CTE

AS

(

SELECT

ID

FROM

#Upsert U

WHERE

ID NOT IN (

SELECT ID FROM dbo.Invoice_Lines WITH (NOLOCK))

)

INSERT INTO dbo.Invoice_Lines

(

ID

,InvoiceID

,SalePrice

,StockID

,VersionStamp

)

SELECT

SRC_I.ID

,InvoiceID

,SalePrice

,StockID

,VersionStamp

FROM

INNER JOIN



ADAMREMOTE.CarSales.dbo.Invoice_Lines SRC_I WITH (NOLOCK)

Inserts_CTE CTE_I

ON SRC_I.ID = CTE_I.ID



-- Updates

;

WITH Updates_CTE

AS



667

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

11-7. Detecting Data Changes with Limited Source Database Access

Tải bản đầy đủ ngay(0 tr)

×