Tải bản đầy đủ - 0 (trang)
10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources

10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources

Tải bản đầy đủ - 0trang

Chapter 10 ■ Data Profiling



Solution

Use a linked server as the source of the data to get around the SSIS limitation of using ADO.NET connection

managers only. The following steps explain how to do it.

1.



Use a linked server to connect to another data source (be it a CSV file or an Oracle

database, for instance).



2.



In SSMS, create a view using the linked server as the source of the data.



3.



Connect the SSIS Data Profiling task to the view, which becomes the data source (as

described in Recipe 10-6, step 4).



4.



Continue with Recipe 10-6 (for a quick profile) or 10-7 (for a custom profile) to create

the XML profile output.



How It Works

I realize that all the documentation states that the SSIS Data Profiling task will only work with an ADO.NET

connection and an SQL Server data source. And, what is more, the documentation is right. However, you can

“cheat” to some extent this way.

Be aware that this will probably be the slowest Data Profiling task that you have ever seen. I have tested this

on CSV files that took approximately five times longer to profile than the same data stored in SQL Server. So you

might not want to do this in production, but may perhaps want to do it as part of an initial data analysis step

when faced with a new data source.



■■Note As creating linked servers to Microsoft Office sources, text files, and third-party RDBMSs were covered in

Chapters 1, 2, and 4, respectively, I refer you to those chapters for the connection specifics.



10-9. Reading Profile Data

Problem

You want to read the XML profile data created using the SSIS Data Profiling task.



Solution

Use the Data Profile Viewer installed with SSIS, as follows.

1.



Run the following: "C:\Program Files (x86)\Microsoft SQL Server\110\DTS\

Binn\DataProfileViewer.exe"



2.



Click Open in the toolbar. Navigate to—and load—the profile XML file that you

created as the output destination when configuring the Profiling task.



3.



Select the profile element corresponding to the profile that you want to analyze

(see Figure 10-4). The Data Profile Viewer will show all the profile requests in a tree

view on the left.



578

www.it-ebooks.info



Chapter 10 ■ Data Profiling



Figure 10-4.  The Profile Viewer



How It Works

Rather than wading through reams of cryptic XML, you will probably prefer to run the Data Profile Viewer to see

the results of the data profiling. You will need to expand the tree and navigate down from Server, via Database, to

the table (or view) whose elements you wish to analyze.



Hints, Tips, and Traps





You can also run the Data Profile Viewer directly from inside an SSIS package by editing

the Data Profile task and clicking the Open Profile Viewerbutton. This will automatically

open the correct XML file—assuming that you have run the Data Profile task.







Another way of running the Data Profile Viewer is to click Start ➤ All Programs

➤ Microsoft SQL Server 2012 ➤ Integration Services ➤ Data Profile Viewer.



■■Note The output file might contain sensitive data about your database and the data that it contains. If this is the

case, you should consider storing it in a suitably protected folder.



579

www.it-ebooks.info



Chapter 10 ■ Data Profiling



10-10. Storing SSIS Profile Data in a Database

Problem

You want to store the profile data created using the SSIS Profiling task in an SQL Server database.



Solution

Use a custom XML task to read the XML file produced by the SSIS Profiling task and shred the profile data into

SQL Server tables. The following explains one way to profile the data from the CarSales sample database and

store the results in the CarSales_Staging database.

1.



Create the following tablesin the CarSales_Staging database

(C:\SQL2012DIRecipes\CH10\SSISProfileTables.Sql):

USE CarSales_Staging;

GO

CREATE TABLE dbo.DataProfiling_ColumnValueDistribution

(

ProfileRequestID NVARCHAR(255) NULL,

NumberOfDistinctValues INT NULL,

ColumnValueDistributionProfile_Id BIGINT NOT NULL,

DateAdded DATETIME NOT NULL DEFAULT GETDATE(),

ID int IDENTITY(1,1) NOT NULL

) ;

GO

CREATE TABLE dbo.DataProfiling_ColumnStatistics

(

ProfileRequestID NVARCHAR(255) NULL,

MinValue DECIMAL(28, 10) NULL,

MaxValue DECIMAL(28, 10) NULL,

Mean DECIMAL(28, 10) NULL,

StdDev DECIMAL(28, 10) NULL,

ID int IDENTITY(1,1) NOT NULL,

DateAdded DATETIME NOT NULL DEFAULT GETDATE()

) ;

GO

CREATE TABLE dbo.DataProfiling_ColumnNulls

(

ProfileRequestID NVARCHAR(255) NULL,

NullCount TINYINT NULL,

ID int IDENTITY(1,1) NOT NULL,

DateAdded DATETIME NOT NULL DEFAULT GETDATE()

) ;

GO



580

www.it-ebooks.info



Chapter 10 ■ Data profiling



CREATE TABLE dbo.DataProfiling_ColumnLength

(

ProfileRequestID NVARCHAR(255) NULL,

ColumnLengthDistributionProfile_ID BIGINT NULL,

MinLength TINYINT NULL,

MaxLength TINYINT NULL,

ID int IDENTITY(1,1) NOT NULL,

DateAdded DATETIME NOT NULL DEFAULT GETDATE()

) ;

GO

CREATE TABLE dbo.DataProfiling_ValueDistributionItem

(

Value NVARCHAR(255) NULL,

Count INT NULL,

ValueDistribution_Id BIGINT NOT NULL

) ;

GO

CREATE TABLE dbo.DataProfiling_LengthDistributionItem

(

Length tinyint NOT NULL,

Count bigINT NULL,

LengthDistribution_Id BIGINT NOT NULL

) ;

GO

CREATE TABLE dbo.DataProfiling_Join_ValueDistribution

(

ValueDistribution_Id BIGINT NOT NULL,

ColumnValueDistributionProfile_Id BIGINT NOT NULL

) ;

GO

CREATE TABLE dbo.DataProfiling_Join_LengthDistribution

(

LengthDistribution_Id BIGINT NOT NULL,

ColumnLengthDistributionProfile_Id BIGINT NOT NULL

) ;

GO

2.



Create a new SSIS package. Add two ADO.NET connection managers, the one named

CarSales_ADONET should point to the CarSales database; the one named

CarSales_Staging_ADONET should point to the CarSales_Staging database.



3.



Add an Execute SQL task. Configure it to use the connection manager for the profile

data. Name it Prepare Tables and set the SQL Statement to (C:\SQL2012DIRecipes\

CH10\TruncateSSISProfileTables.Sql):

TRUNCATE TABLE dbo.DataProfiling_ColumnLength;

TRUNCATE TABLE dbo.DataProfiling_ColumnNulls;

TRUNCATE TABLE dbo.DataProfiling_ColumnStatistics;



581

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources

Tải bản đầy đủ ngay(0 tr)

×