Tải bản đầy đủ - 0 (trang)
15-20. Linking the SSIS Catalog to T-SQL Logging

15-20. Linking the SSIS Catalog to T-SQL Logging

Tải bản đầy đủ - 0trang

Chapter 15 ■ Logging and Auditing



How It Works

Unfortunately, there does not seem to be a way to write to the SSIS catalog tables—and thus to centralize

logging when using stored procedures called from SSIS. However, if you log the steps and other information

from sprocs (as described in previous recipes), and pass in the two essential variables that allow you to link the

stored procedure to the SSIS logging (@SSISTaskGuid and @ExecutionID), you can then reconstitute the whole

process from a logging perspective, because your custom logging table(s) can then map back to SSISDB.catalog.

executables.



15-21. Baselining ETL Processes

Problem

You want to track the evolution of key metrics for a regular ETL process.



Solution

Extend your custom logging framework to include tables and stored procedures to enable baselining. The

following shows how to create and use the objects that you will need:

1.



Create the RunHistory table described previously, to provide a RunID.



2.



Create the following three reference tables using DDL(C:\SQL2012DIRecipes\CH15\

ReferenceTables.Sql):

CREATE TABLE CarSales_Logging.Log.RefTables

(

ID INT IDENTITY(1,1) NOT NULL,

DatabaseName VARCHAR(150) NULL,

SchemaName VARCHAR(150) NULL,

TableName VARCHAR(150) NULL,

TableMaxThreshold BIGINT NULL,

TableMinThreshold BIGINT NULL,

TablePercentAcceptableVariation NUMERIC(8, 4) NULL

)

CREATE TABLE CarSales_Logging.Log.RefBaselineProcessess

(

ID INT IDENTITY(1,1) NOT NULL,

ProcessName VARCHAR(150) NULL,

ProcessMaxThreshold BIGINT NULL,

ProcessMinThreshold BIGINT NULL,

ProcessPercentAcceptableVariation NUMERIC(8, 4) NULL

)

CREATE TABLE CarSales_Logging.Log.RefBaselineCounters

(

ID INT IDENTITY(1,1) NOT NULL,

CounterName VARCHAR(150) NULL,

CounterComments VARCHAR(4000) NULL,

CounterMinThreshold INT NULL,

CounterMaxThreshold INT NULL,

CounterPercentAcceptableVariance INT NULL

)



921

www.it-ebooks.info



Chapter 15 ■ Logging and Auditing



3.



Add the three baseline tables using the following DDL

(C:\SQL2012DIRecipes\CH15\BaselineTables.Sql):

CREATE TABLE CarSales_Logging.Log.TableSize

(

ID INT IDENTITY(1,1) NOT NULL,

TableSchema VARCHAR(50) NULL,

TableName VARCHAR(50) NULL,

SpaceUsedKB BIGINT NULL,

SpaceReservedKB BIGINT NULL,

Rowcounts BIGINT NULL,

RunID INT NULL,

DateUpdated DATE NULL

)

CREATE TABLE CarSales_Logging.Log.ProcessCounterBaseline

(

ID INT IDENTITY(1,1) NOT NULL,

RunID INT NULL,

CounterName VARCHAR(150) NULL,

CounterNumber BIGINT NULL

)

CREATE TABLE CarSales_Logging.Log.ProcessBaseline

(

ID INT IDENTITY(1,1) NOT NULL,

RunID INT NULL,

ProcessName VARCHAR(150) NULL,

ProcessDuration INT NULL

)



4.



922



Create the three stored procedures that update the baseline tables using the following

DDL(C:\SQL2012DIRecipes\CH15\BaselineSprocs.Sql):

-- pr_AuditCounters

CREATE PROCEDURE CarSales_Logging.Log.pr_AuditCounters



AS



DECLARE @RunID INT

SELECT @RunID=MAX(RunID) FROM CarSales_Logging.log.RunHistory



INSERT INTO CarSales.Log.ProcessCounterHistory (RunID, CounterName, CounterNumber)



SELECT TOP 1000

@RunID

,CounterName

,CounterNumber



FROM

CarSales_Logging.Log.ProcessCounterBaseline AS D

INNER JOIN CarSales_Logging.Log.RefBaselineProcessessAS S

ON D.CounterName = S.CounterName

WHERE

D.RunID = (SELECT MAX(RunID) FROM CarSales_Logging.log.RunHistory)







www.it-ebooks.info



Chapter 15 ■ Logging and Auditing



-- pr_AuditEvents

CREATE PROCEDURE CarSales_Logging.Log.pr_AuditEvents



AS



DECLARE @RunID INT

SELECT @RunID=MAX(RunID) FROM CarSales_Logging.log.RunHistory



INSERT INTO CarSales_Logging.Log.ProcessHistory (RunID, StageName, StageDuration)



SELECT TOP 1000

@RunID

,StageName

,DurationInSeconds



FROM

CarSales_Logging.Log.EventDetail D

INNER JOIN CarSales_Logging.Log.RefBaselineprocesses S

ON D.Step = S.StageName

WHERE

D.RunID = (SELECT MAX(RunID) FROM CarSales_Logging.log.RunHistory)



-- pr_AuditTableSize

CREATE PROCEDURE CarSales_Logging.Log.pr_AuditTableSize



AS



DECLARE @RunID INT

SELECT @RunID=MAX(RunID) FROM CarSales_Logging.log.RunHistory



DELETE FROM CarSales_Logging.Log.TableSize WHERE RunID = @RunID



INSERT INTO CarSales_Logging.Log.TableSize

(TableName, SpaceUsedKB, SpaceReservedKB, Rowcounts, RunID)



SELECT DISTINCT

SO.name AS TableName

,DPS.used_page_count * 8 AS SpaceUsedKB

,DPS.reserved_page_count * 8 AS SpaceReservedKB

,DPS.row_count AS RowCounts

,@RunID



FROM

CarSales.sys.dm_db_partition_stats DPS

INNER JOIN CarSales.sys.indexes SIX

ON DPS.object_id = SIX.object_id

AND DPS.index_id = SIX.index_id

INNER JOIN CarSales.sys.objects SO

ON DPS.object_id = SO.object_id

WHERE

SIX.type_desc = N'CLUSTERED'

AND SO.name IN (



SELECT TableName FROM CarSales_Logging.Log.RefTables



)



923

www.it-ebooks.info



Chapter 15 ■ Logging and auditing



How It Works

If you have patiently set up a logging infrastructure and captured a multitude of events that occur during an ETL

process, you have certainly given yourself a useful toolkit to check that your process runs and finds the source

of any errors, should they occur. Yet this mountain of data can quickly and easily become a source of useful

information for tracking the evolution of regular processes. With relatively little effort, you can extract the key

metrics from your log tables, store them separately (which allows you to prune the main log tables frequently

to avoid excessive growth), and then track the evolution of the main process timings and row counts that make

up your ETL process. This helps you foresee which parts of a data load and transformation could be potential

problems that make your ETL job extend beyond the agreed time window.

Once again, I am suggesting a simple framework of a few tables, views, and stored procedures that allow you

to extract the main information from your log tables. Feel free to extend this and use the front-end tool of your

choice to present the results. I am the first to admit that none of this is difficult, but I still prefer to include this

extension to the logging infrastructure to provide a starter kit for baselining and to prove that logging is worth

while beyond mere error tracking.

I suggest storing three main elements:





Timings for core processes







Counters for key steps







Table sizes for essential tables



Now, as this baselining infrastructure needs to be simple but easy to extend, I am going to avoid any hardcoding of the processes, counters, and tables that you will be tracking. So as well as the “baselining” tables that

will store the essential data culled from the log tables, I suggest three “reference” tables that will hold the names

of the processes, counters, and tables whose key data will be stored independently of the logs and will become

the basis for the baselining and tracking.

As you can see in the ref_BaselineCounters table, I have included fields for threshold levels and

percentages so that you can define limits to the acceptable counters, time (in seconds), or table sizes,

respectively, and set up alerts if you wish.

Once the DDL is run, you need to add any of the following to the reference tables:





Table names







Process names (the name used inside the stored procedure that will be logged, or the

name of the SSIS task that will be executed)







Core counter names



So, nothing very complicated here. Yet this simple structure can be extremely useful because it lets you

obtain a high-level view of process evolution over time. All you have to do is ensure that you have addedto the

reference tables the names of all the processes (or process steps), counters, and tables you wish to track, and

then ensure that the three stored procedures run at the end of your ETL process. Then you can (for instance) link

Excel to the baseline tables, and produce a simple dashboard of how long your processes take and how the main

counters evolve while you see the increase in table sizes. To add a final flourish, you can set thresholds for all the

timings, counters, and table sizes, and use these to set up proactive alerts should ever a threshold be breached.



15-22. Auditing an ETL Process

Problem

You want to check that an ETL process has run not only successfully, but also with no overruns or indicators of

potential future problems.



924

www.it-ebooks.info



Chapter 15 ■ Logging and Auditing



Solution

Audit the log data and isolate key metrics that you define.

1.



In all your T-SQL stored procedures, remember to add the following code snippet to

all data flows in T-SQL, which are INSERTS (both SELECT...INTO and INSERT INTO).

...

,GETDATE() AS DATE_PROCESSED

...



2.



For all T-SQL updates, simply remember to add:

...

,DATE_PROCESSED = GETDATE()

...



To add the processing date and time inside an SSIS data flow you will need to add a Derived Column task on

to the Data Flow pane between the Data Source and destination tasks.

3.



Join the Derived Column task to preceding an following tasks.



4.



Double-click the Derived Column task to edit it, add a name (how about

Last_Processed), and set the expression as GETDATE()—SSIS will set the datatype to

DT_DBTIMESTAMP automatically.



5.



Map the derived column to the LAST_PROCESSED column in the destination table in

the Destination task.



6.



You will need a table to store the most recent audit data. Suggested DDL is

(C:\SQL2012DIRecipes\CH15\tblTableAuditData.Sql):

CREATE TABLE dbo.TableAuditData

(

ID INT IDENTITY(1,1) NOT NULL,

QualifiedTableName VARCHAR(500) NULL,

LastUpdatedDate DATETIME NULL,

LastRunID INT,

LastRecordCount BIGINT

)



7.



The DDL for the stored procedure that captures the audit data is

(C:\SQL2012DIRecipes\CH15\pr_AuditETL.Sql):

CREATE PROCEDURE pr_AuditETL



AS



DECLARE @SQL AS VARCHAR(MAX)

DECLARE @TableToAudit AS VARCHAR(150)

DECLARE @SchemaToAudit AS VARCHAR(150)

DECLARE @DatabaseToAudit AS VARCHAR(150)





925

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

15-20. Linking the SSIS Catalog to T-SQL Logging

Tải bản đầy đủ ngay(0 tr)

×