Tải bản đầy đủ - 0 (trang)
8-14. Obtaining SQL Server Metadata Using .NET

8-14. Obtaining SQL Server Metadata Using .NET

Tải bản đầy đủ - 0trang

Chapter 8 ■ Metadata



TableSchema VARCHAR(50) NULL,

TableName VARCHAR(50) NULL

) ;

CREATE TABLE dbo.ADOSQLServerMetadataIndexColumns

(

ID INT IDENTITY(1,1) NOT NULL,

ConstraintCatalog VARCHAR(50) NULL,

ConstraintSchema VARCHAR(50) NULL,

ConstraintName VARCHAR(50) NULL,

TableCatalog VARCHAR(50) NULL,

TableSchema VARCHAR(50) NULL,

TableName VARCHAR(50) NULL,

ColumnName VARCHAR(50) NULL,

OrdinalPosition VARCHAR(50) NULL,

KeyType VARCHAR(50) NULL,

IndexName VARCHAR(50) NULL

);

CREATE TABLE dbo.ADOSQLServerMetadataColumns

(

ID INT IDENTITY(1,1) NOT NULL,

TableCatalog VARCHAR(50) NULL,

TableName VARCHAR(50) NULL,

TableSchema VARCHAR(50) NOT NULL,

ColumnName VARCHAR(50) NULL,

OrdinalPosition INT NULL,

ColumnDefault VARCHAR(50) NULL,

IsNullable VARCHAR(5) NULL,

DataType VARCHAR(50) NULL,

CharacterMaximumLength VARCHAR(50) NULL,

CharacterOctetLength VARCHAR(50) NULL,

NumericPrecision VARCHAR(50) NULL,

NumericPrecisionRadix VARCHAR(50) NULL,

NumericScale VARCHAR(50) NULL,

DateTimePresision VARCHAR(50) NULL,

CharacterSetCatalog VARCHAR(50) NULL,

CharacterSetSchema VARCHAR(50) NULL,

CharacterSetName VARCHAR(50) NULL,

CollationCatalog VARCHAR(50) NULL,

IsFilestream VARCHAR(5) NULL,

IsSparse VARCHAR(5) NULL,

IsColumnSet VARCHAR(5) NULL

) ;

2.



Create a new SSIS package, and configure an ADO.NET connection manager that

points to the source SQL Server database.



3.



Add a Data Flow task, and switch to the Data Flow pane.



4.



Add a Script Component onto the Data Flow pane.



5.



Select Source as the Data Flow type, and click OK.



472

www.it-ebooks.info



Chapter 8 ■ Metadata



6.



Double-click the Script Component to edit it. Make sure that Inputs and Outputs is

selected in the left-hand pane. Rename Output 0 as MetadataOutput. Select Output

Columns.



7.



Click Add Column to add a new column. In the properties pane on the right of the dialog

box, rename the column TableCatalog. Set its data type to String, with a length of 0.



8.



Add the following columns, with the given data types and lengths:



Column



DataType and Length



TableCatalog



String, 0



TableSchema



String, 0



TableName



String, 0



TableType



String, 0



The dialog box should look something like that in Figure 8-14.



Figure 8-14.  The Script Transform Editor for metadata



473

www.it-ebooks.info



Chapter 8 ■ Metadata



9.



Click Connection Managers and add a connection manager named ADONETSource.

Select the ADO.NET connection manager that you created earlier.



10.



Click Script, set the script language to Visual Basic 2010, and then click the Design

Script button.



11.



Right-click References in the left-hand pane. Select Add Reference. Select System.

Data.SQLClient from the available .NET components. Click OK.



12.



Add the following code (C:\SQL2012DIRecipes\CH08\GetschemaCode.vb):

Public Class ScriptMain

Inherits UserComponent

Dim

Dim

Dim

Dim



ConnMgr As IDTSConnectionManager100

SQLConn As SqlConnection

SQLReader As SqlDataReader

DSTable As DataTable



Public Overrides Sub AcquireConnections(ByVal Transaction As Object)

ConnMgr = Me.Connections.ADONETSource

SQLConn = CType(ConnMgr.AcquireConnection(Nothing), SqlConnection)

End Sub

Public Overrides Sub PreExecute()

MyBase.PreExecute()

DSTable = SQLConn.GetSchema("Tables")

End Sub

Public Overrides Sub PostExecute()

MyBase.PostExecute()

End Sub

Public Overrides Sub CreateNewOutputRows()

For Each Row In DSTable.Rows

MetadataOutputBuffer.AddRow()

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.TableType = Row("TABLE_TYPE").ToString

Next

End Sub

End Class



474

www.it-ebooks.info



Chapter 8 ■ Metadata



13.



Close the Script screen.



14.



Click OK to close the Script Component dialog box.



15.



Add an OLEDB destination component to the Data Flow pane. Link the

Script Component to the OLEDB destination and configure it to use the

ADOSQLServerMetadataTables table, which was created as part of the prerequisites.



16.



Map the columns. You will note that the destination table columns correspond to the

columns that you added to the Script Component output buffer.



17.



Repeat steps 3-16 for the other tables. The column definitions are provided in Table 8-10.



Table 8-10.  Column Definitions for Returning GetSchema Metadata in SSIS



Table Name



Column



DataType and Length



ADOSQLServerMetadataViews



TableCatalog



String, 50



ADOSQLServerMetadataViews



TableSchema



String, 50



ADOSQLServerMetadataViews



TableName



String, 50



ADOSQLServerMetadataViews



CheckOption



String, 50



ADOSQLServerMetadataViews



IsUpdatable



String, 5



ADOSQLServerMetadataColumns



TableCatalog



String, 50



ADOSQLServerMetadataColumns



TableSchema



String, 50



ADOSQLServerMetadataColumns



TableName



String, 50



ADOSQLServerMetadataColumns



ColumnName



String, 50



ADOSQLServerMetadataColumns



OrdinalPosition



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



ColumnDefault



String, 50



ADOSQLServerMetadataColumns



IsNullable



String, 5



ADOSQLServerMetadataColumns



DataType



String, 50



ADOSQLServerMetadataColumns



CharacterMaximumLength



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



CharacterOctetLength



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



NumericPrecision



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



NumericPrecisionRadix



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



NumericScale



four-byte signed integer [DT_I4]



ADOSQLServerMetadataColumns



DateTimePrecision



String, 50



ADOSQLServerMetadataColumns



CharacterSetCatalog



String, 50



ADOSQLServerMetadataColumns



CharacterSetSchema



String, 50



ADOSQLServerMetadataColumns



CharacterSetName



String, 50



ADOSQLServerMetadataColumns



CollationCatalog



String, 50

(continued)



475

www.it-ebooks.info



Chapter 8 ■ Metadata



Table 8-10.  (continued)



Table Name



Column



DataType and Length



ADOSQLServerMetadataColumns



IsFilestream



String, 5



ADOSQLServerMetadataColumns



IsSparse



String, 5



ADOSQLServerMetadataColumns



IsColumnSet



String, 5



ADOSQLServerMetadataViewColumns



ViewCatalog



String, 50



ADOSQLServerMetadataViewColumns



ViewSchema



String, 50



ADOSQLServerMetadataViewColumns



ViewName



String, 50



ADOSQLServerMetadataViewColumns



TableCatalog



String, 50



ADOSQLServerMetadataViewColumns



TableSchema



String, 50



ADOSQLServerMetadataViewColumns



TableName



String, 50



ADOSQLServerMetadataViewColumns



ColumnName



String, 50



ADOSQLServerMetadataIndexes



ConstraintCatalog



String, 50



ADOSQLServerMetadataIndexes



ConstraintSchema



String, 50



ADOSQLServerMetadataIndexes



ConstraintName



String, 50



ADOSQLServerMetadataIndexes



TableCatalog



String, 50



ADOSQLServerMetadataIndexes



TableSchema



String, 50



ADOSQLServerMetadataIndexes



TableName



String, 50



ADOSQLServerMetadataIndexColumns



ConstraintCatalog



String, 50



ADOSQLServerMetadataIndexColumns



ConstraintSchema



String, 50



ADOSQLServerMetadataIndexColumns



ConstraintName



String, 50



ADOSQLServerMetadataIndexColumns



TableCatalog



String, 50



ADOSQLServerMetadataIndexColumns



TableSchema



String, 50



ADOSQLServerMetadataIndexColumns



TableName



String, 50



ADOSQLServerMetadataIndexColumns



ColumnName



String, 50



ADOSQLServerMetadataIndexColumns



OrdinalPosition



four-byte signed integer [DT_I4]



ADOSQLServerMetadataIndexColumns



KeyType



String, 50



ADOSQLServerMetadataIndexColumns



IndexName



String, 50



476

www.it-ebooks.info



Chapter 8 ■ Metadata



You will need to set the appropriate script for each source. The definitions are are as follows (all are in the file

C:\SQL2012DIRecipes\CH08\GetSchemaCodeProcessing.vb):

'Views:

DSTable = SQLConn.GetSchema("Views")

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.IsUpdatable = Row("IS_UPDATABLE").ToString

MetadataOutputBuffer.CheckOption = Row("CHECK_OPTION").ToString

'Table Columns:

DSTable = SQLConn.GetSchema("Columns")

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.ColumnName = Row("COLUMN_NAME").ToString

MetadataOutputBuffer.OrdinalPosition = Row("ORDINAL_POSITION").ToString

MetadataOutputBuffer.ColumnDefault = Row("COLUMN_DEFAULT").ToString

MetadataOutputBuffer.IsNullable = Row("IS_NULLABLE").ToString

MetadataOutputBuffer.DataType = Row("DATA_TYPE").ToString

MetadataOutputBuffer.CharacterMaximumLength = Row("CHARACTER_MAXIMUM_LENGTH").ToString

MetadataOutputBuffer.CharacterOctetLength = Row("CHARACTER_OCTET_LENGTH").ToString

MetadataOutputBuffer.NumericPrecision = Row("NUMERIC_PRECISION").ToString

MetadataOutputBuffer.NumericPrecisionRadix = Row("NUMERIC_PRECISION_RADIX").ToString

MetadataOutputBuffer.NumericScale = Row("NUMERIC_SCALE").ToString

MetadataOutputBuffer.DateTimePresision = Row("DATETIME_PRECISION").ToString

MetadataOutputBuffer.CharacterSetCatalog = Row("CHARACTER_SET_CATALOG").ToString

MetadataOutputBuffer.CharacterSetSchema = Row("CHARACTER_SET_SCHEMA").ToString

MetadataOutputBuffer.CharacterSetName = Row("CHARACTER_SET_NAME").ToString

MetadataOutputBuffer.CollationCatalog = Row("COLLATION_CATALOG").ToString

MetadataOutputBuffer.IsFilestream = Row("IS_FILESTREAM").ToString

MetadataOutputBuffer.IsSparse = Row("IS_SPARSE").ToString

MetadataOutputBuffer.IsColumnSet = Row("IS_COLUMN_SET").ToString

'View Columns:

DSTable = SQLConn.GetSchema("ViewColumns")

MetadataOutputBuffer.ViewCatalog = Row("VIEW_CATALOG").ToString

MetadataOutputBuffer.ViewSchema = Row("VIEW_SCHEMA").ToString

MetadataOutputBuffer.ViewName = Row("VIEW_NAME").ToString

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.ColumnName = Row("COLUMN_NAME").ToString



477

www.it-ebooks.info



Chapter 8 ■ Metadata



'Indexes:

DSTable = SQLConn.GetSchema("Indexes")

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.ConstraintCatalog = Row("CONSTRAINT_CATALOG").ToString

MetadataOutputBuffer.ConstraintSchema = Row("CONSTRAINT_SCHEMA").ToString

MetadataOutputBuffer.ConstraintName = Row("CONSTRAINT_NAME").ToString

'Index Columns:

DSTable = SQLConn.GetSchema("IndexColumns")

MetadataOutputBuffer.TableCatalog = Row("TABLE_CATALOG").ToString

MetadataOutputBuffer.TableSchema = Row("TABLE_SCHEMA").ToString

MetadataOutputBuffer.TableName = Row("TABLE_NAME").ToString

MetadataOutputBuffer.ConstraintCatalog = Row("CONSTRAINT_CATALOG").ToString

MetadataOutputBuffer.ConstraintSchema = Row("CONSTRAINT_SCHEMA").ToString

MetadataOutputBuffer.ConstraintName = Row("CONSTRAINT_NAME").ToString

MetadataOutputBuffer.ColumnName = Row("COLUMN_NAME").ToString

MetadataOutputBuffer.OrdinalPosition = Row("ORDINAL_POSITION").ToString

MetadataOutputBuffer.KeyType = Row("KEYTYPE").ToString

MetadataOutputBuffer.IndexName = Row("INDEX_NAME").ToString

18. The final package will probably look something like Figure 8-15.



Figure 8-15.  SSIS package to return SQL Server metadata

Run the package, and the selected metadata will be loaded into to the SQL Server tables.



478

www.it-ebooks.info



Chapter 8 ■ Metadata



How It Works

Another way to obtain metadata is to use the .NET GetSchema class. This technique is more complex to

implement than the techniques that you have seen so far in this chapter. It is, however, considerably more

extensible and powerful than using INFORMATION_SCHEMA views. The example in this recipe returns metadata for

the following:





Tables







Views







Table columns







View columns







Indexes







Index columns



Metadata for each of these object types is stored in its own specific output table. These tables can then be

joined and queried (a bit like the INFORMATION_SCHEMA views in SQL Server) to get an overview of the source

metadata.

The package looks much more complex than it really is, as all it does is use a script to obtain the source data

from each of the metadata schemas, and this requires creating multiple output columns for each one. For each

of the six metadata sources, you create a Script Component data source that will query a specific set of metadata

using the .GetSchema function. Each script task will have the requisite set of output columns that map to the

output of the .GetSchema function. The columns added to the data flow will then be sent to an OLEDB destination

task—and into a database.

To make the SSIS package slightly more reusable, you may want to set the table name, and/or the connection

string as variables. You can also specifically define the columns that you wish to store in SQL Server. The

important thing is to ensure that you have defined all the output columns that you need to map metadata source

columns to before writing the code to carry out the data mapping. Note that the index numbers of the columns

used to map the data table to the output buffer are the index numbers of the source data—not the buffer columns

that you created in the SSIS package.

Adding a data viewer to the data flow can also help tremendously. To do this, right-click the connection

between the Script Component data source and the OLEDB destination, and select Data Viewers. Now when you

run the package, the viewer is displayed for you to see the data as it is processed.



Hints, Tips, and Traps





The script component can define the source metadata. Notably, you can





Specify which type of connection that you wish to use.







Specify the schema collection that you wish to store.







Add any restrictions that you require.



Summary

This chapter contains many ways to profile source data using SQL Server. To give you a clearer overview,

Table 8-11 describes the various methods which we have looked at in this chapter and their advantages and

disadvantages.



479

www.it-ebooks.info



Chapter 8 ■ Metadata



Table 8-11.  Advantages and Disadvantages of Techniques Used in This Chapter



Technique



Advantages



Disadvantages



sp_columns_ex / sp_tables_ex



Simple to use.



Only work with linked servers.

Somewhat limited metadata.



System dictionary



Extremely complete.



Complex to query.

Requires experience and practice.

Can require ad hoc query rights.



INFORMATION_SCHEMA views



Simple to access.



Limited metadata.



Easy to query.



Not available for all databases.

Can vary between databases.



GetSchema



Requires coding.



Initially complex.



Easy to use once configured.



Clearly, the approach that you take to obtain metadata depends on the source data itself, as you cannot

gather metadata from Access in the same way that you can from flat files. Even when dealing with relational

databases, you may be required to apply very different approaches. Equally, the type of connectivity available can

shape your choice. For a linked server, sp_columns_ex may suffice. If you are using other means to connect to the

data source, then this approach is simply not available.

Finally, there is the level of detail in the metadata that you want to retrieve can be a deciding factor. For an

eminently reasonable amount of metadata, then INFORMATION_SCHEMA views could suffice. For the truly profound

levels of detail, you may have to query the system catalog of the source database.

So it is now up to you to decide the level of metadata to return—if any—and which is the most appropriate

method of doing this. Hopefully, this chapter has given you some ideas and approaches to try out. I hope even

more that they help you to build robust and trouble-free ETL solutions with SQL Server.



480

www.it-ebooks.info



Chapter 9



Data Transformation

As we all know, the “T” in ETL stands for “transformation”. Yes, you may be able to connect to a data source, and

yes, you may be able to pour data into an SQL Server destination database. Frequently, however, many changes

must be applied to the data as it progresses from source to destination. Consequently, this chapter will look at

some of the major data transformation challenges that face the ETL developer.

Inevitably, it is impossible to foresee every data transformation requirement and every twist and turn that

data can, or must, take as it flows through an ETL process. So, equally inevitably, this chapter cannot foresee

every need for data transformation nor provide every solution that could be required. Nonetheless, it will attempt

to provide an overview of the classic solutions to standard data transformation problems. These will include the

following:

Data deduplication—or the art of removing duplicates.

Denormalizing data—or unpivoting (also called transposing) columns of data

into rows.

Pivoting data.

Subsetting data—from fixed- and variable-length data into multiple columns.

Concatenating data.

Merging data.

Character-level data transformation—to apply the required UPPER, lower, or TitleCase.

A look at the main types of SCD (slowly changing dimensions).

Plus a few other tools for your ETL armory.

A few elements have been included that may seem far too elementary. Yet I felt that it was best to be

exhaustive (albeit briefly), and use this as an opportunity to provide a rapid overview of the fundamental set of

SSIS transforms that give the product its power and versatility. I will also stick to the philosophy of this book and

describe parallel T-SQL solutions where appropriate, as I am a firm believer in using the appropriate tool for the

job, and not shoehorning everything into SSIS.

Equally, I have steered clear of data cleansing except where Data Quality Services in SQL Server 2012

are concerned. Data cleansing is a heinously complex subject, which frequently goes beyond simple data

transformation, and requires, all too often, intensive manual labor, or third-party products—and so really is a

stand-alone subject outside the scope of this book.



481

www.it-ebooks.info



Chapter 9 ■ Data Transformation



The examples used in this chapter are available on the book’s companion web site, and can be found in

the C:\SQL2012DIRecipes\CH09 directory once you have downloaded them. Please note also that I will not be

explaining many times over how to use OLEDB connection managers to point to source data in SSIS since this

has been explained in detail throughout many of the recipes in Chapters 1–7. Please refer to the recipes in the

first part of the book—and specifically Chapters 4 (for SQL Server destinations) and 7 (for SQL Server sources)

to revise complete details on SSIS data source and data destination connections and tasks.



9-1. Converting Data Types

Problem

You need to convert between data types as part of an ETL process to ensure that source data types do not cause

the process to fail.



Solution

Use the SSIS Data Conversion task to change data types in the data flow and ensure that the destination data

types can accept the source data. The procedure is as follows:

1.



Create or open an SSIS package and add a Data Flow task. Switch to the Data

Flow pane.



2.



Add an OLEDB connection manager and configure it to connect to the CarSales

database.



3.



Add an OLEDB Source task and configure to use the OLEDB connection manager

defined in step 2 and the Clients table.



4.



Add a Data Conversion task to your SSIS package.



5.



Connect the Data Source task to it.



6.



Select the column(s) you wish to modify.



7.



For each column whose data type needs altering, choose the new data type, and if

necessary, its length in the grid in the lower part of the dialog box. It should look like

Figure 9-1.



482

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

8-14. Obtaining SQL Server Metadata Using .NET

Tải bản đầy đủ ngay(0 tr)

×