Tải bản đầy đủ - 0 (trang)
3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed

3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed

Tải bản đầy đủ - 0trang

Chapter 3 ■ XML Data Sources



1.



Download and install SQLXML 4.0, unless it is already installed (see the upcoming

note on this).



2.



Locate an XML source data file. In this example, it is the

C:\SQL2012DIRecipes\CH03\Clients_Simple.Xml file.



3.



Create a destination table. The following is the one to use in this example

(C:\SQL2012DIRecipes\CH03\tbl Client_XMLBulkLoad.Sql):

CREATE TABLE CarSales_Staging.dbo.Client_XMLBulkLoad

(

ID int NULL,

ClientName NVARCHAR(1000) NULL,

Address1 NVARCHAR(1000) NULL,

Town NVARCHAR(1000) NULL,

County NVARCHAR(1000) NULL,

Country NUMERIC(18, 0) NULL

);

GO



4.



Create an XML schema file. Note the extensions that are part of the Microsoft mapping

schema (C:\SQL2012DIRecipes\CH03\SQLXMLBulkLoadImport_Simple.Xsd):


xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">














type = "xsd:integer" sql:field = "ID" />




type = "xsd:string" sql:field = "Address1" />


type = "xsd:string" sql:field = "Town" />


type = "xsd:string" sql:field = "County" />


type = "xsd:decimal" sql:field = "Country" />

















157

www.it-ebooks.info



s



5.



Create the VBScript used to invoke SQLXML Bulk Load and load the data, as follows

(name this file C:\SQL2012DIRecipes\CH03\SQLXMLBulkload.vbs):

Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0")

objBL.ConnectionString = "provider = SQLOLEDB;data

source = MySQLServer;database = CarSales_Staging; 

integrated security = SSPI"

objBL.ErrorLogFile = "C:\SQL2012DIRecipes\CH03\SQLXMLBulkLoadImporterror.log"

objBL.Execute "C:\SQL2012DIRecipes\CH03\SQLXMLBulkLoadImport_Simple.xsd", 

"C:\SQL2012DIRecipes\CH03\Clients_Simple.xml"

Set objBL = Nothing



6.



Double-click the C:\SQL2012DIRecipes\CH03\SQLXMLBulkload.vbs file to run the

bulk load. If all goes well, you should be able to open the Client_XMLBulkLoad table

and see the data from the XML source file correctly loaded.



How It Works

Although perhaps not a standard part of the XML toolkit when using SQL Server, SQLXML Bulk Load is a hidden

gem among the resources for the developer. Quite simply, it allows you to import extremely large data files into

SQL Server as multiple tables, preserving referential integrity if this is required—and at amazing speed. It is, quite

simply a COM (Component Object Model) object that allows you to load semistructured XML data into SQL

Server tables.





Note While SQLXML 4.0 is installed by default in editions of SQL Server up to and including 2005, for SQL

editions from 2008 onward, it needs to be downloaded and installed separately

(www.microsoft.com/en-us/download/details.aspx?id=3522).

Despite these advantages, many SQL Server developers either ignore the existence of this superb tool, or

fail to implement it because it is poorly explained in the SQL Server documentation. Perhaps as a consequence,

it is unfairly perceived as hard to implement. From SQL Server 2008, it even ceased to be part of the standard

installation, and has to be installed as part of the Feature Pack.

So this recipe attempts to put the record straight. This tool is best used in the following situations:





When the source XML file is large. What is large? Well, if you were planning on using

OPENXML, then that means over 2 gigabytes. The same upper limit is true of XML loaded

into a variable and shredded using XQuery. With SSIS, it depends on the memory

available. I have loaded XML files of tens of gigabytes using SQLXML Bulk Load.







When the XML source is relatively simple XML (essentially tables and fields, but nothing

too complex). The source data cannot be too complex (in XML terms), or it will not load.

It is not for nothing that this is called “semistructured” XML data.







When you wish to load multiple tables from the same source file.







When speed loading the data is important. In my tests, SQLXML Bulk Loadloaded data

at about 90 percent of (native) BCP speeds for separate tables without relational links—

which is fast by any standards!



158

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



The core part of getting SQLXML to load data is in the XSD file. As you can see, it contains (apart from the

Microsoft mapping schema) a few extra tidbits that allow it to perform its job so efficiently. The main mapping

elements are shown in Table 3-1.

Table 3-1.  SQLXML Mapping Attributes



XML Attribute



Explanation



Use



Sql:relation



The table into which the data is loaded.



Table-level



sql:field



The field in the destination table into which the data is loaded.



Field-level



Essentially, you have to extend the schema (suitably hand-crafted or initially created using one of the

methods described in Recipe 7-12) with the attributes that allow SQLXML to channel the source data into the

correct table(s) and fields. This is the hard part of this XML loading technique, and will probably be where you

spend the most time, so it is worth ensuring that you have understood the XSD extensions before attempting a

complex data load. You may even find that practicing on a simple XML file to start with can reap dividends.

The .vbs file that invokes SQLXML Bulk Loader can be extremely simple, and needs at a minimum,

the following.





The object creation statement—to invoke the SQLXML Bulk Load COM object.







A connection string, containing, at a minimum:





The server and, if required, instance name (Data Source).







The destination database name (CarSales_Staging in this example).







The SQL Server security information (Windows integrated security or SQL Server

security).







The Execute command, which provides paths to the XSD and XML files.







A SET command to dispose of the COM object.







An ErrorLog file. While this is not strictly essential, it is more than useful.



You will know if it has worked if there is no ErrorLog file—or if the old ErrorLog file is removed (assuming

that you have requested a log file). Oh, and the fact that the data loaded correctly.

Should the data not load, then your first port of call is the ErrorLog file that SQLXML Bulk Load created

(assuming that you used the objBL.ErrorLogFile = "Log File and path" parameter). So, while creating this

file is not compulsory, it is a practical necessity if you want to debug a data load operation using SQLXML Bulk

Load. Fortunately, these files are very explicit, and will doubtless prove to be an invaluable source of debugging

information. The ErrorLog file is optional—but it is invaluable for debugging the process (unless everything

works perfectly first time, and every time).

There are, however, some classic things to watch out for when creating the XSD file if you want to avoid

errors. In my experience, it is the schema file that is the most common source of problems, because it involves a

certain amount of hand-crafting. Potential problems include:





Making sure that there are no spaces immediately inside all the double-quotes that are

used in the xsd:element definition.







Ensuring that you do not inadvertently close elements such as xsd:sequence and

xsd:complextype—or even some of the xsd:element definitions.







Respecting case-sensitivity. So in the preceding example, having an element called

“Country”, and a schema mapping element named “country” would cause the process to fail.



159

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



This simple script assumes that the destination table already exists (you can, however, create the table as

part of the load shown in the next recipe). It also presumes integrated security. It is interesting that, somewhat

counter-intuitively, the XML schema (.xsd) file is passed as the first parameter to the XML Bulk Loader. The XML

file itself is passed in as the second parameter.



Hints, Tips, and Traps





This example uses XML elements, but attribute-centric XML source data can also be

loaded with SQLXML Bulk Loader.







A complex XML source file can be “flattened” using XSLT to produce a file amenable to

processing by SQLXML. The XML task to execute the XSLT could precede the Bulk Load

itself in SSIS.







Inline schemas are not supported. Indeed, if you have an inline schema in the source

XML document, XML Bulk Load will ignore the inline schema.







An XML document is checked for being well-formed, but it is not validated. If the XML

document is not well-formed, processing is cancelled.



3-9. Loading Multiple Tables at Once from a Single XML

Source File

Problem

You want to load data from an XML source file that contains data for multiple unrelated tables.



Solution

Use the SQLXML Bulk Load executable and a suitably-crafted XSD file.

You can load multiple tables from a single XML file as follows.

1.



Locate your XML file, which contains data from multiple tables. This recipe’s example

uses C:\SQL2012DIRecipes\CH03\SQLXMLSourceDataMultipleTables.xml, which

holds data for Invoice and Invoice_Lines tables. The contents look like this:







 3

 AA/1/2014-07-25

 250





 3

 5000





3

 12500







160

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



2.



Create the tables required to hold the data in SQL Server

C:\SQL2012DIRecipes\CH03\tblInvoiceMulti.Sql:

CREATE TABLE CarSales_Staging.dbo.Invoice_XML_Multi

(

ID INT NULL,

InvoiceNumber VARCHAR (50) NULL,

DeliveryCharge SMALLMONEY NULL

) ;

GO

CREATE TABLE CarSales_Staging. dbo.Invoice_Lines_XML_Multi

(

InvoiceID INT NULL,

SalePrice MONEYNULL,

) ;

GO



3.



Create an XSD file (stored as

C:\SQL2012DIRecipes\CH03\SQLXMLBulkLoadImportMultipleTables.xsd):


xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
















type = "xsd:integer"

sql:field = "ID" />


type = "xsd:string"

sql:field = "InvoiceNumber" />


sql:field = "DeliveryCharge" />



























161

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed

Tải bản đầy đủ ngay(0 tr)

×