Tải bản đầy đủ - 0 (trang)
13-11. Creating Self-Optimizing Parallel Bulk Inserts

13-11. Creating Self-Optimizing Parallel Bulk Inserts

Tải bản đầy đủ - 0trang

Chapter 13 ■ Organising and Optimizing data LOads



SSISVarName NVARCHAR (50) NULL,

SSISVarValue NVARCHAR (50) NULL,

LastUpdated DATETIME NULL,

CONSTRAINT PK_SSISVariables PRIMARY KEY CLUSTERED

(

ID ASC

)

) ;

GO

2.



Add one record to the SSISVariables table, using T-SQL like the following:

INSERT INTO dbo.SSISVariables (SSISPackageName, SSISVarName, SSISVarValue)

VALUES ('ParallelBulkInsertFile.dtsx', 'RecordRange', '1000000')



3.



Create an ADO.NET connection manager named CarSales_Staging_ADONET and

configured to connect to the database where the dbo.SSISVariables table is located

(CarSales_Staging).



4.



Add a new Execute SQL task named Get Range of Records. Configure it as follows:



5.



Connection Type:



ADO.NET



Connection:



CarSales_Staging_ADONET



SQL Statement:



SELECT



@RecordRange = CAST(SSISVarValue AS INT)



FROM



dbo.SSISVariables



WHERE



SSISVarName = 'RecordRange'



Click ParameterMapping and add a parameter, as follows:

Variable Name:



User::RecordRange



Direction:



Output



Type:



Int64



Parameter Name:



@RecordRange



Confirm all your modifications.

6.



Connect the new task to the existing task, “Prepare destination table”.



7.



Add a new Execute SQL task named Redefine Data Ranges. Connect the Sequence

container to it. Configure it as follows:

Connection Type:



ADO.NET



Connection:



CarSales_Staging_ADONET



SQL Statement:



UPDATE



dbo.SSISVariables



SET



SSISVarValue =

FLOOR(@RecordRange / @FileCounter)



WHERE



SSISVarName = 'RecordRange'



782

www.it-ebooks.info



Chapter 13 ■ Organising And Optimizing Data Loads



8.



9.



Click ParameterMapping and add a parameter, as follows:

Variable Name:



User::RecordRange



Direction:



Input



Type:



Int64



Parameter Name:



@RecordRange



Confirm all of your modifications.



The final package should look like Figure 13-25.



Figure 13-25.  Bulk loading data with dynamic ranges of records



783

www.it-ebooks.info



Chapter 13 ■ Organising And Optimizing Data Loads



How It Works

Should you face source files that can vary slightly or even moderately in size, you can calculate and store the most

recent load counter and use it as the basis for calculating the range of records to be loaded by each Bulk Insert

task. This is done, of course, in an attempt to produce the nirvana of all software development—a self-managing

program! The range of records to load is nothing more than the row count for the most recent load divided by the

number of load threads.



13-12. Loading Files in Controlled Batches

Problem

You want to load multiple varying files in specified batches within a defined time frame.



Solution

Use SSIS to create a batch control framework to load data in batches, and parameterize the input directory and

extension, as follows.

1.



Create an SQL Server table (BatchFileLoad) to store the batch metadata using the

following DDL (C:\SQL2012DIRecipes\CH13\tblBatchFileLoad.Sql):

CREATE TABLE CarSales_Staging.dbo.BatchFileLoad

(

ID int IDENTITY(1,1) NOT NULL,

FileName VARCHAR (250) NULL,

IsToload BIT NULL,

IsLoaded BIT NULL,

FileSize BIGINT NULL,

CreationTime DATETIME NULL,

FileExtension VARCHAR (5) NULL,

DirectoryName VARCHAR (250) NULL,

LastWriteTime DATETIME NULL

) ;

GO



2.



Create a new SSIS task and name it BatchFileLoad. Add two Connection

managers—an ADO.NET connection manager named CarSales_Staging_ADONET

and an OLEDB connection manager named CarSales_Staging_OLEDB, both of

which connect to the database where you will be both loading the data and persisting

the metadata.



3.



Add a new Flat File connection named Data Source File. Configure it to use any of

the files in the source directory.



784

www.it-ebooks.info



Chapter 13 ■ Organising And Optimizing Data Loads



4.



Create the following variables:



Variable Name



Scope



Type



Value/Comments



ADOTable



Package



object



n/a

The object variable that will contain the

list of files to process.



BatchQuantity



Package



Int32



50

The quantity of files to process per batch.



CreateList



Package



Boolean



True

A flag to indicate whether the list of files is

to be dropped and re-created or not.



FileFilter



Package



String



*.CSV

The file extension for all the files to be

processed.



FileSource



Package



String



C:\SQL2012DIRecipes\CH13

The source directory for the source files.



IsFinished



Package



Boolean



False

The flag used to indicate that the process

has finished.



ListConn



Package



String



CarSales_StagingADONET

The connection manager name



MaxFilesToProcess



Package



Int64



5000

The upper threshold for the maximum

number of files to process per batch.



MaxProcessDuration



Package



Int32



7200

The upper threshold for the maximum

number of seconds to run the batch before

ceasing processing.



ProcessFile



Package



String



n/a

The file currently being processed.



SortElement



Package



String



FileSize.

The indicator of how the list is sorted.



TotalFilesLoaded



Package



Int64



0

The process counter for the number of

files processed in the batch.



785

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

13-11. Creating Self-Optimizing Parallel Bulk Inserts

Tải bản đầy đủ ngay(0 tr)

×