Tải bản đầy đủ - 0 (trang)
12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework

12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework

Tải bản đầy đủ - 0trang

Chapter 12 ■ Change Tracking and Change Data Capture



4.



Add the extended property, LAST_SYNC_VERSION, to the table that you are

tracking (dbo.Client in this example), which will be used to store the number

of the last synchronized version of the data. You can do this with the following

T-SQL snippet—this can only be added once (C:\SQL2012DIRecipes\CH11\

LastSynchVersionProperty.Sql):

USE CarSales;

GO

EXECUTE sys.sp_addextendedproperty

@level0type = N'SCHEMA'

,@level0name = dbo

,@level1type = N'TABLE'

,@level1name = Client

,@name = LAST_SYNC_VERSION

,@value = 0

;



5.



Run the following T-SQL code to apply the relevant Inserts, Deletes, and Updates to

the destination table (C:\SQL2012DIRecipes\CH11\ChangeTrackingProcess.Sql):

USE CarSales;

GO

BEGIN TRY

DECLARE @LAST_SYNC_VERSION BIGINT

DECLARE @CURRENT_VERSION BIGINT

SET @CURRENT_VERSION = CHANGE_TRACKING_CURRENT_VERSION()

SELECT

FROM

WHERE



@LAST_SYNC_VERSION = CAST(value AS BIGINT)

sys.extended_properties

major_id = OBJECT_ID('dbo.Client')

AND name = N'LAST_SYNC_VERSION'



-- need test to ensure that an update will not be missing data..

DECLARE @MIN_VALID_VERSION BIGINT

SELECT @MIN_VALID_VERSION = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.

Client'));

IF @LAST_SYNC_VERSION >= @MIN_VALID_VERSION

BEGIN

-- Inserts

INSERT INTO



CarSales_Staging.dbo.Client_CT

(ID,ClientName,Country,Town,County, Address1,Address2,

ClientType,ClientSize)



683

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



SELECT

FROM

INNER JOIN

WHERE



SRC.ID,ClientName,Country,Town,County,Address1,Address2,

ClientType,ClientSize

dbo.Client SRC

CHANGETABLE(CHANGES Client, @LAST_SYNC_VERSION) AS CT

ON SRC.ID = CT.ID

CT.SYS_CHANGE_OPERATION = 'I'



-- Deletes

DELETE FROM

FROM



WHERE



DST

CarSales_Staging.dbo.Client_CT DST

INNER JOIN

CHANGETABLE(CHANGES Client, @LAST_SYNC_VERSION) AS CT

ON CT.ID = DST.ID

CT.SYS_CHANGE_OPERATION = 'D'



-- Updates

UPDATE

SET



FROM



WHERE



DST

DST.ClientName = SRC.ClientName

,DST.Country = SRC.Country

,DST.Town = SRC.Town

,DST.County = SRC.County

,DST.Address1 = SRC.Address1

,DST.Address2 = SRC.Address2

,DST.ClientType = SRC.ClientType

,DST.ClientSize = SRC.ClientSize

CarSales_Staging.dbo.Client_CT DST

INNER JOIN

dbo.Client SRC

ON DST.ID = SRC.ID

INNER JOIN

CHANGETABLE(CHANGES Client, @LAST_SYNC_VERSION) AS CT

ON SRC.ID = CT.ID

CT.SYS_CHANGE_OPERATION = 'U'



END

-- after an UPSERT/DELETE

EXECUTE sys.sp_updateextendedproperty

@level0type

= N'SCHEMA'

,@level0name

= dbo

,@level1type

= N'TABLE'

,@level1name

= Client

,@name

= LAST_SYNC_VERSION

,@value

= @CURRENT_VERSION

;

END TRY

BEGIN CATCH

-- Add your error logging here

END CATCH



684

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



How It Works

Fortunately for SQL Server developers and DBAs alike, in SQL Server 2008 Microsoft introduced a lightweight

solution to the problem of delta data management. This solution is called Change Tracking, and it does not

require any triggers or handcrafted tables. The only requirement is a primary key on the source and

destination tables.

Its advantages are as follows:





It is a lightweight solution, with little extra overhead on the server and minor disk space

requirements.







There are no changes required to the source table.







It is a nearly real-time solution.







It is largely self-managing in that you can set a retention period for delta data to be

tracked, and SQL Server handles cleansing this historical data. This does not mean,

however that data synchronization is automatic, just that managing the Change Tracking

objects is largely handled for you.







It is available in all versions of SQL Server.



There are several limitations, however. The main one is that Change Tracking does not support “pull” data

synchronization from a destination—the following error message informs you, if you try:

Msg 22106, Level 16, State 1, Line 3

The CHANGETABLE function does not support remote data sources.

So this requires some tweaking—and specifically stored procedures on the source server to isolate and

supply the data to Insert, Delete, and Update. Other disadvantages include:





Change Tracking must be disabled before a PRIMARY KEY constraint can be dropped on

the source table.







Data type changes of nonprimary key columns are not tracked.







The index that enforces the primary key cannot be dropped or disabled.







Truncating the source table requires re-initialization of Change Tracking. This essentially

means that Change Tracking must be disabled, and then re-enabled for the source table.



■■Note  Microsoft advises you to enable Snapshot Isolation, which means that TempDB will be used heavily,

so you need to ensure that your TempDB is configured correctly. There are alternatives, however, to using Snapshot

Isolation. See Books Online (BOL) for full details.

Nonetheless, Change Tracking is a robust solution that is well suited to many delta data situations. It is best

used in the following circumstances:





You wish to avoid triggers or user tables when isolating delta data.







When the (admittedly low) overhead that it adds to the source system is acceptable.



I will not be providing an exhaustive description of all that can be done using Change Tracking, nor will I

detail how it is managed in a production environment. Should you need this information, then BOL does a good



685

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



job of describing it, and there are many excellent references to this subject online and in print. After all, this book

is resolutely ETL-focused.

Setting up Change Tracking is easy. First, Change Tracking must be set up on the source database, and then

any tables whose DML you wish to follow must have Change Tracking enabled. Once this is done, SQL Server can

detect changed records, as well as the type of change (Insert, Update, or Delete) and carry out any required data

integration processes based on these changed records.

Once you have set up Change Tracking, you can start to use it. What you need to know is that you will have to





Get the last successful version number for a DML operation that was updated in the

destination database.







Check that you have the data, since that version number in your Change Tracking history

(using the command CHANGE_TRACKING_MIN_VALID_VERSION).







Perform the required Insert, Update, and Delete operations on the destination table.







Log (here to the extended properties of the table) the latest successful version number for

the DML operations that have been sent to the destination table.



In this recipe, I am using two databases in the same server. Of course, you may use linked servers to perform

this operation. Just remember to use the correct four-part notation if this is how you decide to implement

Change Tracking.

There could well be cases where you only wish to add or update data in a destination database when the

contents of certain columns changes only. For instance, when dealing with clients in a financial reporting

application, changes in address could be irrelevant, but changes in credit rating could be fundamental. So to

avoid updating data only when the contents of relevant columns has changed, Change Tracking will also let you

detect which columns have been modified. This makes the technique altogether more subtle as thus far any

change—including potentially irrelevant ones—have been notified by Change Tracking. It follows that if you

track only changes to specific columns then Change Tracking becomes less of a blunt instrument. Consequently,

the overhead caused by irrelevant updates on a destination server can be avoided because you will only be

performing essential DML operations.

So, if you wish to track changes only to certain columns . . .

1.



First, disable Change Tracking if it is enabled, using:

ALTER TABLE CarSales.dbo.Client DISABLE CHANGE_TRACKING



2.



Then, re-enable it using:

ALTER TABLE CarSales.dbo.Client

ENABLE CHANGE_TRACKING

WITH (TRACK_COLUMNS_UPDATED = ON)



3.



Next, alter any Update statements (you could also apply this to Insert and Delete

statements, but this is more rare) to include a WHERE clause like the following:

AND CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID

('CarSales.dbo.Client'), 'ClientName', 'ColumnId'),

CT.SYS_CHANGE_COLUMNS ) = 1



Note that you must define the object ID and column ID for each column to track individually—and using the

COLUMNPROPERTY function is the easiest way to obtain these. This code snippet will detect changes to the ClientName

column of the tracked table dbo.client. You then run the T-SQL code shown in step 5 of this recipe to port only the

changes from source to destination where the Client column has been modified. Indeed, the operation will be even

more efficient if you tweak the T-SQL UPDATE so that only the relevant column (Client) is updated.



686

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



■■Note Although Change Tracking does not add a lot of extra overhead to the source server, just exactly how much

this is, and whether it is acceptable, will depend on each set of circumstances. For a server with a very high volume

of transactions, even the slight overhead that Change Tracking adds might prove excessive. You have to test Change

Tracking in your specific environment to decide if it is the correct solution for your requirements.



Hints, Tips, and Traps





The reason for storing a property called LAST_SYNC_VERSION is that when detecting deltas,

you need to tell SQL Server back to which point in the history of DML modifications you

want to perform data upserts/deletes. Since Change Tracking uses sequential numbering

to identify every DML operation on the tracked table, you will need to know, for every

data extraction operation, what was the last version number used. I suggest storing this as

an extended property, although if you wish, you can store it in a logging table.







Change Tracking version numbering begins with 0.







The specified retention period must be at least as long as the maximum time between

data synchronizations.







If you set AUTO_CLEANUP to OFF, then you will have to reset it to ON at some point to clear

down the change history. Cleaning up Change Tracking cannot be done any other way.







There is an inevitable trade-off between the retention period for delta tracking (set using

CHANGE_RETENTION) and the efficiency of the database. The longer the retention period,

the easier it is to go back in time to detect changes—but the database will be the larger

and slower.







I prefer to wrap the DML operations that execute Change Tracking in a transaction,

as this will ensure that the Insert, Delete, and Update commands will be carried out

automatically and completely—or not at all. Of course, you should add sufficient error

trapping and logging to ensure that any errors are bubbled up to the DBA in time to

correct any bugs before the change retention period is past. Otherwise, only a complete

resynchronization of the source and destination tables will ensure that both data sets are

identical. However, transactions that work perfectly across databases on the same server

will require MSDTC when implemented using linked servers.







To resynchronize the source and destination tables, simply truncate the destination table

and INSERT.. SELECT the data (or any other table load technique that you prefer) from

the source data into it. This will have no effect on the Change Tracking. You must then log

(or set the extended property) for the Last_Synch_Version to the Current_Version of the

Change Tracking, so that only DML from this point is detected when the two sources are

synchronized.







To disable Change Tracking, simply use the following T-SQL snippets:

ALTER TABLE dbo.Client DISABLE CHANGE_TRACKING;

and then

ALTER DATABASE CarSales SET CHANGE_TRACKING = OFF;







Note that you must disable Change Tracking for all tables before you can disable it at the

database level.



687

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



12-2. Pulling Changes into a Destination Table

with Change Tracking

Problem

Using Change Tracking, you want to pull rather than push data changes detected into a destination table.



Solution

Apply “pull” delta data using Change Tracking. The following T-SQL can do this—where the source data is on a

linked server, named R2.

1.



Run steps 1 through 4 in Recipe 12-1 to enable Change Tracking on the source server

(R2) and place the destination table on the local server.



2.



Run the following T-SQL to synchronize the data from the remote server to the local

server (C:\SQL2012DIRecipes\CH11\PullChangeTracking.Sql):



DECLARE @LAST_SYNC_VERSION BIGINT

DECLARE @CURRENT_VERSION BIGINT

DECLARE @MIN_VALID_VERSION BIGINT

-- To get the LAST_SYNC_VERSION

SELECT

@LAST_SYNC_VERSION = CAST(SEP.value AS BIGINT)

FROM

R2.CarSales.sys.tables TBL

INNER JOIN

R2.CarSales.sys.schemas SCH

ON TBL.schema_id = SCH.schema_id

INNER JOIN

R2.CarSales.sys.extended_properties SEP

ON TBL.object_id = SEP.major_id

WHERE

SCH.name = 'dbo'

AND

TBL.name = 'client'

AND

SEP.name = 'LAST_SYNC_VERSION'

-- Gets maximum version in CHANGETABLE - so available for updating (use instead of

CHANGE_TRACKING_CURRENT_VERSION)

SELECT @CURRENT_VERSION = MaxValidVersion FROM OPENQUERY(R2,

'

SELECT

CASE WHEN MAX(SYS_CHANGE_CREATION_VERSION) > MAX(SYS_CHANGE_VERSION) THEN MAX(SYS_CHANGE_

CREATION_VERSION)

ELSE MAX(SYS_CHANGE_VERSION)

END AS MaxValidVersion

FROM CHANGETABLE(CHANGES Carsales.dbo.Client, 0) AS CT

')

-- Gets minimum version This one works over a linked server!

SELECT Min_Valid_Version FROM R2.CarSales.sys.change_tracking_tables

IF @LAST_SYNC_VERSION >= @MIN_VALID_VERSION

BEGIN



688

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



-- Get all data for INSERTS/UPDATES/DELETES into temp tables

-- Deletes

DECLARE @DeleteSQL VARCHAR(8000) =

'SELECT ID

FROM OPENQUERY(R2, ''SELECT

ID

FROM

CHANGETABLE(CHANGES Carsales.dbo.Client,' + CAST(@LAST_SYNC_VERSION AS VARCHAR(20)) + ')

AS CT

WHERE CT.SYS_CHANGE_OPERATION = ''''D'''''')'

IF OBJECT_ID('tempdb..#ClientDeletes') IS NOT NULL

DROP TABLE tempdb..#ClientDeletes

CREATE TABLE #ClientDeletes (ID INT)

INSERT INTO #ClientDeletes EXEC (@DeleteSQL)

-- Inserts

DECLARE @InsertsSQL VARCHAR(8000) =

'SELECT ID,ClientName,Country,Town,County,Address1,Address2,ClientType,ClientSize

FROM OPENQUERY(R2,

''SELECT SRC.ID,ClientName,Country,Town,County,Address1,Address2,

ClientType,ClientSize

FROM

Carsales.dbo.Client SRC

INNER JOIN CHANGETABLE(

CHANGES Carsales.dbo.Client, ' +

CAST(@LAST_SYNC_VERSION

AS VARCHAR(20)) + ') AS CT

ON SRC.ID = CT.ID

WHERE

CT.SYS_CHANGE_OPERATION = ''''I'''''')'

IF OBJECT_ID('tempdb..#ClientInserts') IS NOT NULL DROP TABLE tempdb..#ClientInserts

CREATE TABLE #ClientInserts

(

ID INT NOT NULL,

ClientName VARCHAR(150) NULL,

Country VARCHAR(50) NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL

)



689

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework

Tải bản đầy đủ ngay(0 tr)

×