Tải bản đầy đủ - 0 (trang)
11-10. Detecting Differences in Rowcounts, Metadata, and Column Data

11-10. Detecting Differences in Rowcounts, Metadata, and Column Data

Tải bản đầy đủ - 0trang

Chapter 11 ■ Delta Data Management



Metadata and Rowcounts

The following code, run in a Command window, will return the rowcounts for the two tables and indicate if that

the metadata is identical for the two tables.

"C:\Program Files\Microsoft SQL Server\110\COM\Tablediff.exe"

-sourceuser Adam

-sourcepassword Me4B0ss

-sourceserver ADAM02

-sourcedatabase CarSales

-sourceschema dbo

-sourcetable Sales

-destinationuser Adam

-destinationpassword Me4B0ss

-destinationserver ADAM02

-destinationdatabase CarSales_Staging

-destinationschema dbo

-destinationtable Sales

-q

It is the -q (quick) parameter that tells TableDiff only to compare record counts and metadata. Should the

two tables have differing columns, then TableDiff will return: “have different schemas and cannot be compared.”



Row Differences

To get details on any records that differ between the two tables (without the -q parameter and preferably sending

the output to an SQL Server table rather than the Command window), add the following:

-dt -et DataList

DataList is the name of the table (in the destination database) that will hold the list of anomalies.



Column Differences

To see all data differences at column level, add the -c parameter.

The resulting table has the columns listed in Table 11-3.

Table 11-3.  DataDiff Output



Column Name



Possible values



ID

Msdifftool-Errorcode



Comments

Contains the PK, GUID, or unique ID.



0, 1, or 2



0 for a data mismatch.

1 for a record in the destination table, but not

the source.

2 for a record in the source table, but not the

destination.



MSdifftool_Offendingcolumns



The name of the column where the data differs

between the two tables.



678

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



Create a T-SQL Script to Make the Destination Table

Identical to the Source Table

To make the tables identical, run the script created by the -f [filename] parameter. This is usually where TableDiff

comes in useful—when source and destination are out of sync and you want to “reset” them to allow one of the

delta data detection techniques to resume.

The SQL that is created is “row by row.” To get faster (set-based) synchronization, you can use the data in the

output table and dynamic SQL to write a less verbose script.



How It Works

It is difficult to discuss delta data management without at least a passing reference to TableDiff.exe. This is a Microsoft

tool that was originally designed to compare tables in a replication scenario, but which can also be useful to





Very quickly detect whether two tables’ metadata are identical or not.







Get rowcounts from two tables.







Isolate rows present in either of the two tables, but not in both.







Isolate all data changes between records in the two tables.







Create an SQL script to apply the changes to make the tables identical.



TableDiff.exe is a command-line tool, which requires a substantial set of parameters. However, once these are

set up as a .cmd file, then it becomes easier to handle and can be run from a scheduler, an SSIS Execute Process task,

or an SQL Server Agent job. You will need a primary key, Identity column, and rowguide or unique key column in at

least one of the tables, preferably in both. Incidentally, TableDiff is installed by default with SQL Server.



Hints, Tips, and Traps





If the fast comparison parameter (−q) is used, then no output table will be created.







The output table will not be not overwritten if the two tables are identical.







The output table structure varies according to the input parameters—the MSdifftool_

OffendingColumns column only appears if there are column level differences.







There are other parameters for TableDiff.exe, and I suggest that you consult BOL should

you need them.



Summary

This chapter contained many ways of detecting data changes at source and applying them to a destination

database using SQL Server. Table 11-4 presents my take on the various methods, including their advantages and

disadvantages.



679

www.it-ebooks.info



Chapter 11 ■ Delta Data Management



Table 11-4.  Advantages and Disadvantages of Delta Data Management Techniques in This Chapter



Technique



Advantages



Disadvantages



Full data transfer with temporary

tables used for update and delete

management.



Much faster than using the

OLEDB task.



Complex to set up.



Linked server and MERGE



Faster than transferring all

the data before using MERGE.



Requires a linked server.



Easy to set up.

Detecting delta data at source—

transferring the Delta Flag

column first in SSIS.



Can be considerably faster

than transferring all the data

before detecting deltas.



Complex to set up.

Requires a Delta Flag column

in the source data.

Needs temporary tables on the source.



Detecting delta data at the source

and passing back TVP of IDs.



Can be considerably faster

than transferring all the data

before detecting deltas.



Complex to set up.

Requires a Delta Flag column

in the source data.

Only suitable for tiny delta sets.



Detecting delta data at the source

and passing back custom SELECT

statements.



Can be considerably faster

than transferring all the data

before detecting deltas.



Complex to set up.

Requires a Delta Flag column

in the source data.

Only suitable for tiny delta sets.



Detecting delta data at

source—transferring Delta

Flag column first using T-SQL.



Can be considerably faster

than transferring all the data

before detecting deltas.



Complex to set up.

Requires a Delta Flag column

in the source data.

Needs temporary tables on the source.

Necessitates a linked server.



Trigger-based delta

data tracking.



Can be considerably faster

than full data transfer.



Requires source database modification.



TableDiff



Only suitable in certain

circumstances.



Requires learning the utility.



Necessitates management of tracking tables.



Delta data management is a horrendously complex challenge, and there are few easy solutions. Some

readers may find the techniques elaborated in this chapter too complicated to be applied. Others may find them

too simple. Yet others may be facing challenges which seem to make any solution impossible. However, with

patience and correct analysis, most delta data issues can be resolved. The art and science is to find the correct

approach to take. So do not hesitate to test various solutions, and also to “mix and match” techniques taken

from the various recipes in this chapter. If you do not have “flag” columns in the source table, try and get them

added. If you cannot add such columns, then try and use hashing techniques to identify changed rows. Do not be

afraid of using temporary or persisted tables as a staging technique. Above all - take your time when defining the

appropriate solution and test it thoroughly.



680

www.it-ebooks.info



Chapter 12



Change Tracking and Change

Data Capture

This chapter focuses on two core techniques that allow you to detect modifications to source data and

consequently update a destination table with the corresponding changes. Using their most simplistic definitions,

these techniques are





Change Tracking: Detects that a row has changed. Lets a process return the latest version

of the data from the source table.







Change Data Capture: Detects that a row has changed. Stores all the intermediate

changes, as well as the final state of a record.



The former uses a tracking table and simple versioning, while the latter reads the SQL Server transaction log

and provides sophisticated version tracking. Both techniques are relatively lightweight as far as the overhead that

they impose on the source system is concerned, and both are largely self-managing. Either can be implemented

as a T-SQL-based solution or using SSIS. Neither requires any schema changes to be made to the source table(s).

Both allow you to decide when the updates to the destination table(s) are to be applied, and so can easily be

integrated into a regular ETL process. In my experience, both are extremely robust and reliable. So if these are the

similarities, what are the differences—at least at a simple level suited to handling data integration challenges?





Change Tracking is a synchronous process that detects that rows were changed, but not

what data was modified. It is available in all versions of SQL Server.







Change Data Capture is an asynchronous process that reads the SQL Server transaction

log to detect changes to the tables that you are tracking. It can also allow you to see the

history of all changes to the data in a table. It is only available in the Enterprise version of

SQL Server.



While these two approaches are extremely efficient, they can seem, at first, a little complex to implement.

After a little time spent using them, I hope you will agree that this is only an impression. To dispel any initial

apprehension, I advise you to read a recipe thoroughly before you start to implement a solution, and to take a

good look at the process flow diagrams to get a clear view of how the technique can be applied.

The sample files for this chapter are available on the book’s companion web site, and once installed, will be

in the C:\SQL2012DIRecipes\CH11 folder.



681

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



12-1. Detecting Source Table Changes with Little Overhead

and No Custom Framework

Problem

You want a low-overhead solution that can detect changes in the source data and apply them to a destination

table without using triggers, altering the source table DDL, or any complex processes.



Solution

Activate Change Tracking on the source table, and then isolate the changes and apply them to the destination.

The following steps explain how you do this.

1.



Activate Change Tracking in the source database (which will be CarSales in

this example) using the following T-SQL snippet (C:\SQL2012DIRecipes\CH11\

ActivateChangeTracking.Sql):

USE CarSales;

GO

ALTER DATABASE CarSales SET ALLOW_SNAPSHOT_ISOLATION ON;

ALTER DATABASE CarSales SET CHANGE_TRACKING = ON

(CHANGE_RETENTION = 3 DAYS, AUTO_CLEANUP = ON);



2.



Create the destination table in the destination database (which will be

CarSales_Staging in this example). This is the table that will be updated using

Change Tracking in this recipe. What follows is the DDL to create the table

(C:\SQL2012DIRecipes\CH11\tblClient_CT.Sql):

CREATE TABLE CarSales_Staging.dbo.Client_CT

(

ID INT IDENTITY(1,1) NOT NULL,

ClientName NVARCHAR(150) NULL,

Country TINYINT NULL,

Town VARCHAR(50) NULL,

County VARCHAR(50) NULL,

Address1 VARCHAR(50) NULL,

Address2 VARCHAR(50) NULL,

ClientType VARCHAR(20) NULL,

ClientSize VARCHAR(10) NULL,

) ;

GO



3.



Enable Change Tracking for the table(s) that you wish to track, using the following

T-SQL snippet (C:\SQL2012DIRecipes\CH11\ChangeTrackingClient.Sql):

USE CarSales;

GO

ALTER TABLE CarSales.dbo.Client

ENABLE CHANGE_TRACKING ;

GO



682

www.it-ebooks.info



Chapter 12 ■ Change Tracking and Change Data Capture



4.



Add the extended property, LAST_SYNC_VERSION, to the table that you are

tracking (dbo.Client in this example), which will be used to store the number

of the last synchronized version of the data. You can do this with the following

T-SQL snippet—this can only be added once (C:\SQL2012DIRecipes\CH11\

LastSynchVersionProperty.Sql):

USE CarSales;

GO

EXECUTE sys.sp_addextendedproperty

@level0type = N'SCHEMA'

,@level0name = dbo

,@level1type = N'TABLE'

,@level1name = Client

,@name = LAST_SYNC_VERSION

,@value = 0

;



5.



Run the following T-SQL code to apply the relevant Inserts, Deletes, and Updates to

the destination table (C:\SQL2012DIRecipes\CH11\ChangeTrackingProcess.Sql):

USE CarSales;

GO

BEGIN TRY

DECLARE @LAST_SYNC_VERSION BIGINT

DECLARE @CURRENT_VERSION BIGINT

SET @CURRENT_VERSION = CHANGE_TRACKING_CURRENT_VERSION()

SELECT

FROM

WHERE



@LAST_SYNC_VERSION = CAST(value AS BIGINT)

sys.extended_properties

major_id = OBJECT_ID('dbo.Client')

AND name = N'LAST_SYNC_VERSION'



-- need test to ensure that an update will not be missing data..

DECLARE @MIN_VALID_VERSION BIGINT

SELECT @MIN_VALID_VERSION = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.

Client'));

IF @LAST_SYNC_VERSION >= @MIN_VALID_VERSION

BEGIN

-- Inserts

INSERT INTO



CarSales_Staging.dbo.Client_CT

(ID,ClientName,Country,Town,County, Address1,Address2,

ClientType,ClientSize)



683

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

11-10. Detecting Differences in Rowcounts, Metadata, and Column Data

Tải bản đầy đủ ngay(0 tr)

×