Tải bản đầy đủ - 0 (trang)
3-14. Validating an XML Document Against a Schema File in SSIS

3-14. Validating an XML Document Against a Schema File in SSIS

Tải bản đầy đủ - 0trang

Chapter 3 ■ XML Data Sources



XMLDoc.Validate(EvtHdl)

If IsValidateFailure Then

If File.Exists(LogFile) Then

File.Delete(LogFile)

End If

Dim FlOut As New System.IO.StreamWriter(LogFile)

FlOut.Write(SW.ToString)

End If

Dts.Variables("IsValidateFailure").Value = IsValidateFailure

Dts.TaskResult =  ScriptResults.Success

End Sub

Private Sub ValidationEventHandler(ByVal sender As Object, 

ByVal e As ValidationEventArgs)

SW.WriteLine(e.Message.ToString)

IsValidateFailure = True

End Sub

6.



Close the Script window and confirm your changes with OK.



7.



Add a Data Flow task (if you are loading the XML file using SSIS) or an Execute

Process task (if you are loading the XML file using SQLXML Bulk Load) to the Data

Flow pane, and connect the Script task to it.



8.



Double-click the constraint (the connector) and set it to

Evaluation Operation: Expression And Constraint

Value:

Success

Expression:

@IsValidateFailure == False



9.



Configure the XML load as described earlier in this chapter in Recipe 3-4.



10.



Add a Send Mail task to the Data Flow pane, and connect the Script task to it.



11.



Double-click the constraint (the connector) and set it to

Evaluation Operation: Expression And Constraint

Expression:

@IsValidateFailure == True



12.



Configure the Send Mail task to alert you to the good news that the XML cannot be

validated. The package should look like Figure 3-11.



176

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



Figure 3-11.  XML validation in SSIS



How It Works

This fairly simple SSIS Script task verifies an XML document against a schema file, and (in this example) directs

the process flow according to the result of the validation. That is, the data will load if validation is successful,

and will stop (and send an e-mail) if it is not. Moreover, a log file of all errors encountered will be created should

validation fail. It works in this way:





First, references are set to the XML and System.IO libraries.







Then the files to be processed (XML and XSD) are attributed to variables in the Script

task.







The source file is read and validated. Any errors are detected by the

ValidationEventHandler, and each error written to a StringWriter. The error flag is set.







Finally, the error flag is returned to SSIS, and the list of errors written to a file (which is

deleted first—if it already exists).



Hints, Tips, and Traps





Clearly, if the XML file and the schema file are massively out of sync, then the log file will

be not only very large, but also extremely repetitive.







Of course, you do not need to use a Send Mail task—indeed you can do nothing

whatsoever to indicate failure to validate the XML source data.



Summary

As this chapter has tried to demonstrate, there are a wide variety of methods available to take XML source files

and load them into SQL Server. In some cases, the choice will depend on what your objectives are—if you want to

load the file “as is” without shredding the data into its component parts, then clearly OPENROWSET (BULK) could

be the best solution. If, however, the source file is being used as a medium for data transfer, then you have a wider

set of options available. If you are basing your ETL process around T-SQL, you could find that using SQL Server’s

XQuery support is the way to go. If, on the other hand, you are more “SSIS-centric,” then the SSIS XML task can

be an excellent solution in many cases. For really large source files—or where speed is of the essence—then

SQLXML Bulk Loader is possibly the only viable option.



177

www.it-ebooks.info



s



All these techniques have their drawbacks as well as their positive points, however. So here, then, in Table 3-4

is a concise overview of these various methods, along with their advantages and disadvantages.

Table 3-4. Advantages and Disadvantages of Approaches Used in This Chapter



Technique



Advantages



Disadvantages



Store XML files directly

into SQL Server tables



Fast and easy.



Only LOB storage—not

shredded.



OPENXML



Fast to use and easy to set up.



Memory-intensive.



Handles complex XML shredding.



2GB file size limit.



Efficient and can handle even the most complex XML.



More XQuery than T-SQL.



SQL Server XQuery



XML can be queried using XQuery in SQL Server.



No memory limitations when using OPENROWSET (BULK).

SSIS XML Task



Easy for less complex files.



Can get complex.

Requires normalization.



SQLXML Bulk Load



Extremely fast.



Complex XSD definition.



Multiple parallel table loads.

Built-in XML verification.



178

www.it-ebooks.info



Chapter 4



SQL Databases

An awful lot of the data that you will be called upon to load into SQL Server is probably already stored in a

relational database. While you can certainly export from most of the currently available commercial SQL-based

databases into a text or XML file, and from there into SQL Server, there are also standard ways of connecting

directly to most RDBMSs (relational database management systems) and then extracting data from them into

the Microsoft RDBMS. In this chapter, therefore, we will be looking at some of the available ways of ingesting and

linking to data from the following:





Oracle







DB2







MySQL







Sybase







Teradata







PostgreSQL



As you will see, the techniques used to ingest data are very similar for all the databases that we will cover.

They include (depending on the source database):





SSIS







Ad hoc connections







Linked servers



The one thing that they all have in common is that you need to get a provider for the source database installed

and working on the SQL Server destination. In most cases, you can use either an OLEDB or an ODBC provider (or

for SSIS in some cases, a .NET provider). From then on, the differences are essentially minor. So we will look at

each database, and show you how to connect to the source database and to load data from there into SQL Server.

Once we have established these basic approaches, there are a few other interesting things to look at, notably,

how to use the SQL Server Migration Assistant (SSMA) for Oracle, MySQL, and Sybase.

As you are probably aware, the real world of interdatabase data connectivity has its fair share of challenges,

and you will have to deal with some or all of the following:





Installing and configuring OLEDB and ODBC providers







Network and firewall issues







Database security







Data type mapping



179

www.it-ebooks.info



Chapter 4 ■ SQL Databases



In an ideal universe, these elements are clear and well-documented. In the reality that most of us inhabit,

things are somewhat murkier, and you could end up requiring knowledge of multiple IT systems, or at least a

precise knowledge of certain extremely focused aspects of several systems. This is always a challenge, and can

require much delving into various sets of documentation.



Preamble: Installing and Configuring

OLEDB and ODBC Providers

If you are going to import data directly from another relational database, the essential thing is to have a working

and proven provider in place. This is easy to say, but by far can be the hardest part of the data load for several

reasons. Consider the following questions that you will have to answer:





Which type of provider do you install, and from which vendor?







What level of database compatibility does it provide?







Is it 64- and 32-bit?







What level of support is available?



Now, which provider you use, which type (OLEDB, ODBC, or .NET) and which supplier you prefer, is

entirely up to you. SQL Server comes with the following OLEDB providers to facilitate connection to certain other

relational databases:





The Microsoft OLEDB provider for Oracle (MSDAORA.1), which is part of an SQL Server

installation.







The Microsoft OLEDB provider for DB2 (provided with the Feature Pack, but requires the

Enterprise edition of SQL Server).







The Attunity OLEDB provider for Oracle (requires the Enterprise edition of SQL Server).







The Attunity OLEDB provider for Teradata (requires the Enterprise edition of SQL Server).



Then, of course, you have the OLEDB and .NET providers from the database vendors themselves. At the time

of writing—and for the data sources referred to in this chapter—these are some of the available providers:





The Oracle Provider for OLE DB 11.2.0.3.0 from Oracle Corporation.







The Oracle Data Provider for .NET 4 11.2.0.3.0 from Oracle Corporation.







IBM DB2 for I5/OS IBMDA400 OLEDB Provider







IBM DB2 for I5/OS IBMDARLA OLEDB Provider







IBM DB2 for I5/OS IBMDASQL OLEDB Provider







The IBM OLEDB Provider (IBMDADB2)







The IBM OLE DB .NET 7 Data Provider







The Sybase ASE OLEDB Provider







The Sybase ASE ODBC Provider







The Sybase ASE .NET Provider







The MySQL ODBC Provider







The PostgreSQL Native OLEDB Provider (PGNP)



180

www.it-ebooks.info



Chapter 4 ■ SQL Databases



Given the constant evolution in this area, I will not specify which version you should be using. Clearly, the

latest version is preferable in virtually all cases. Also, it is up to you to ensure that you have complied with any

licensing requirements if you are using these drivers in a production environment.

Of course, there are many other commercially-available providers, and I can only advise you to search the Web

for others. There are many that are available from many different sources. Equally varied are the claims to purported

superiority for each provider. I will certainly not be emitting any judgments here. I will only here be explaining those

available from either Microsoft or the suppliers themselves of the RDBMSs that we are looking at in this chapter. This

is in no way a criticism or a judgment, merely a voluntary limitation on the scope of this chapter. Once downloaded

and installed, these providers should be visible both in the SSIS connection manager list of OLEDB sources, and also

in SQL Server Management Studio when you expand Server Objects ➤ Linked Servers ➤ Providers.



Network and Firewall issues

It is imperative to ensure that you make friends with the network architects in your organization, as you will

need their help. Either that, or obtain full documentation about the network architecture. On reading that last

sentence, I imagine that most developers and DBAs will emit a hollow laugh, and come to the conclusion that

charm will be needed to ensure that their SQL Server can at least see the other database hosts, as this is the

starting point for establishing server-to-server database connectivity.



Database Security

Once you have made friends with the infrastructure people, the next charm offensive will doubtless concern

the source system DBAs. You will need logon and SELECT permissions (at a bare minimum) for the source

databases—and more wide-ranging permissions will be necessary if you are to examine the source database

metadata. Indeed, this is a prerequisite for using SSMA, as you will see.



Data Type Mapping

Fortunately, the SQL Server development team has, over the years, defined a robust set of data type correlations

for the major competitor databases. Many of these are given in Appendix A. SSMA also has predefined (and

configurable) data type mapping schemata that you can not only use when loading data with this tool, but also as

a reference for suggested mappings. Fortunately, in my experience, these suggestions are very robust and make

difficulties in this area the exception rather than the rule. When a problem does occur, you will probably have

little choice but to refer to the documentation of the source system.

Before starting on the recipes in this chapter, there is one thing that I have to make clear immediately. I

realize that talking about half a dozen major databases and ways of connecting them to SQL Server has the

potential to be a vast subject. Consequently, I am going to be extremely selective about which products and

which connection methods I discuss. As it is impossible to discuss all aspects of all the ways of loading data from

all the relational databases in the known universe, I have chosen to be deliberately succinct in this chapter, and

concentrate on the major players in the RDBMS market whose products I have had the pleasure of grappling with

over the years. Inevitably, much will not be covered, but I am afraid that a line has to be drawn somewhere.

Moreover, I will always use the sample databases supplied with each of the RDBMSs that we are looking at. If

there is no standard sample database for a data source, I will use INFORMATION_SCHEMA data, or any tables found

as standard in the source database.



181

www.it-ebooks.info



Chapter 4 ■ SQL Databases



■■Note The difficulty with importing data from other SQL RDBMSs is that in some situations you need to have

a certain level of basic knowledge about the database from which you are sourcing data. In other cases, SQL

knowledge is sufficient. In this chapter, therefore, I am presuming basic familiarity with the external database—or

at least a level of initial understanding that can be acquired fairly rapidly.

I will presume that you have downloaded the example files for this chapter from the book’s companion web

site, and installed them in the C:\SQL2012DIRecipes\CH02\ directory. Similarly, if you are following the examples,

you will need to have created the CarSales and CarSales_Staging databases as described in Appendix B.



4-1. Configuring Your Server to Connect to Oracle

Problem

You want to be certain that the SQL Server into which you will be importing Oracle data is correctly configured

and able to connect to an Oracle database.



Solution

Install either the 32-bit Oracle client on a 32-bit SQL Server, or a 64-bit Oracle client on a 64-bit SQL Server. This

is how to do it:

1.



Download the Oracle 11G full client. Install it by following Oracle guidelines. Make

sure that you install the Oracle OLEDB, .NET, and ODBC providers.



2.



Configure Oracle access by editing the TNSNames.ora file. I explain this in the “How it

Works” section.



3.



Reboot the SQL Server on which the Oracle drivers are installed.



How It Works

As it is the database with the largest market share on the planet, we will begin by looking at how to connect to

Oracle databases. Despite the fact that the instructions given in this recipe presume only a very basic knowledge

of Oracle, you might need input from an Oracle DBA when it comes to establishing connectivity to an Oracle

server. This is simply due to the wide range of potential scenarios that you could face when connecting to Oracle

databases. The subject is so vast, it precludes a detailed description here, so I am only showing how to use TNS

(Transparent Network Substrate) connectivity.

Note that in the TNSNames.ora file, the Address is the top element for a connection. You will need it when

connecting later. So, using a purely hypothetical example, the Address name for the following TNSNAMES entry is

MyOracle:

MyOracle =

(DESCRIPTION =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = aa.calidra.co.uk)(PORT = 1521))

)



182

www.it-ebooks.info



Chapter 4 ■ SQL Databases



(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = MyOracleReally)

)

)

However, I can only advise you to consider installing both the 32-bit client and the 64-bit client on a 64-bit

server, as this will allow you not only to connect to Oracle (in a 64-bit environment) but also to develop, tweak,

finalize, and test in SSDT (SQL Server Data Tools)/BIDS (Business Intelligence Development Studio)—both of

which, remember, run as a 32-bit application. The steps to do this are as follows:

1.



Remove any existing Oracle clients. Reboot.



2.



Download the Oracle 11G full client. Install the 32-bit client by following Oracle

guidelines. Make sure that you install the Oracle OLEDB, .NET, and ODBC providers.

Carefully define which Oracle home and directory path it is using. It could be

something like C:\Oracle\product\11.2.0\Client32.



3.



Select the Oracle Windows Interfaces 11.x.x component for OLEDB in Available

Product Components. You can add the .NET provider too, if you wish.



4.



Reboot the server.



5.



Download the Oracle 11G full client. Install the 64-bit client by following Oracle

guidelines. Make sure that you install the Oracle OLEDB, .NET, and ODBC providers.

Define which Oracle home it is using. The directory path could be something like

C:\Oracle\product\11.2.0\Client64.



6.



Select the Oracle Windows Interfaces 11.x.x component for OLEDB in Available

Product Components. You can add the .NET provider too, if you wish.



7.



Reboot the server.



8.



Configure Oracle access by editing TNSNames.ora for both the 32-bit and

64-bit environments. This means the TNSNames.ora file in both the

C:\Oracle\product\11.2.0\Client32\Network\Admin and

C:\Oracle\product\11.2.0\Client64\Network\Admin directories.



Hints, Tips, and Traps





When removing any old Oracle clients, you can only delete the old directory once the

server has been rebooted.







The simplest way to test client connectivity is to run SQL + (Start ➤ All programs ➤ Your

Oracle Home ➤ Application Development ➤ SQL Plus) and enter the Oracle username

and password, possibly like this: TheUserName@TheDatabase/PasswordHere.



4-2. Importing Data from Oracle As a Regular Process

Problem

You want to import Oracle data on a regular basis and can connect to the Oracle source database over your

network.



183

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3-14. Validating an XML Document Against a Schema File in SSIS

Tải bản đầy đủ ngay(0 tr)

×