Tải bản đầy đủ - 0 (trang)
3-3. Shredding an XML File into an SQL Server Table

3-3. Shredding an XML File into an SQL Server Table

Tải bản đầy đủ - 0trang

Chapter 3 ■ XML Data Sources



data using the .nodes() method will not require sp_xml_preparedocument to instantiate an XML object. This

technique is best used in the following situations:





When you want to load all or part of the contents of the source document into

table columns.







When setting up an SSIS package to shred the data is overkill.



In this recipe, we loaded the source file into a variable, from which it is shredded into a table using the

.nodes() method of an XML data type. The .value() method extracted values from the XML instance stored as

an XML type.

The “staging variable” approach used here requires lots of memory for large files. So, if you find that using

a staging variable seems somewhat old-fashioned, then there is a solution to avoid the staging variable—with a

judicious application of CROSS APPLY. So, using the same XML source file as earlier in this recipe, the code for this

approach is as follows

(C:\SQL2012DIRecipes\CH03\ShredXMLUsingOPENROWSETBulkAndCrossApply.Sql):

INSERT INTO XmlImport_Clients (ID, ClientName, Town, County, Country)

SELECT

SRC.Client.value('ID[1]', 'INT') AS ID

,SRC.Client.value('ClientName[1]', 'VARCHAR(50)') AS ClientName

,SRC.Client.value('Town[1]', 'VARCHAR(50)') AS Town

,SRC.Client.value('County[1]', 'VARCHAR(50)') AS County

,SRC.Client.value('Country[1]', 'INT') AS Country

FROM

(

SELECT

FROM



CAST(XMLSource AS XML)

OPENROWSET(BULK 'C:\SQL2012DIRecipes\CH03\Clients_Simple.xml', SINGLE_BLOB)

AS X (XMLSource)

) AS X (XMLSource)

CROSS APPLY XMLSource.nodes('CarSales/Client') AS SRC (Client);







This second CROSS APPLY approach is slightly more complex, but shreds the data directly into the

destination table.

Once again, as we are in the world of T-SQL, the code we have used so far can be extended to filter and sort

the data that you are loading. So, if we take the preceding example and decide only to load the records where the

country is “3”—and also to order by the Town element, the following is the code to use

(C:\SQL2012DIRecipes\CH03\ShredXMLUsingOPENROWSETBulkAndCrossApplyWithFilter.Sql):

INSERT INTO XmlImport_Clients (ID, ClientName, Town, County, Country)

SELECT

SRC.Client.value('ID[1]', 'INT') AS ID

,SRC.Client.value('ClientName[1]', 'VARCHAR(50)') AS ClientName

,SRC.Client.value('Town[1]', 'VARCHAR(50)') AS Town

,SRC.Client.value('County[1]', 'VARCHAR(50)') AS County

,SRC.Client.value('Country[1]', 'INT') AS Country

FROM

(

SELECT CAST(XMLSource AS XML)

FROM OPENROWSET(BULK 'C:\SQL2012DIRecipes\CH03\Clients_Simple.xml', SINGLE_BLOB) 

AS X (XMLSource)

) AS X (XMLSource)



142

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



CROSS APPLY XMLSource.nodes('CarSales/Client') AS SRC (Client)

WHERE SRC.Client.value('Country[1]', 'INT') = 3

ORDER BY SRC.Client.value('Town[1]', 'VARCHAR(50)');

This technique may be familiar, but it can rapidly become unusably slow for large files. Of course, this

depends on each person’s definition of “large,” but if you are waiting for several minutes—or even hours—for

SQL Server to finish the process, then perhaps you might be willing to consider another approach. So should you

be faced with XML data loads that are taking hours, there is one solution, which requires a little upfront work, but

can reduce hours of processing to seconds, and apparently will reduce memory pressure too. The trick is to use a

typed XML column in a staging table to hold the source file (or files), and then apply XML indexes to this. You will

also have to add a primary key first, or you will not be able to add the XML indexes.



■■Note Thanks to Dan on Stack Overflow (http://stackoverflow.com) for describing how to use XML indexes to

accelerate XML loads this way.

In this example, I have added secondary XML indexes too. These may not be completely necessary in many

cases, but as they only take a short time to generate, even for large files, I would suggest applying them in any

case. Once the XML data is loaded into the staging table and has been indexed, you use the indexed table as the

source for the XQuery and use CROSS APPLY to shred the data

(C:\SQL2012DIRecipes\CH03\ShredXMLWithIndexes.Sql):

-- Apply an XSD

DECLARE @XSD XML;

SELECT @XSD = CONVERT(XML, XSDDef) FROM OPENROWSET 

(BULK 'C:\SQL2012DIRecipes\CH03\Clients_Simple.xsd', SINGLE_BLOB) AS XSD (XSDDef)

CREATE XML SCHEMA COLLECTION XML_XSD AS @XSD;

GO

-- Staging table to hold XML data

CREATE TABLE dbo.Tmp_XMLLoad(

ID int IDENTITY(1,1) NOT NULL,

XMLData xml(CONTENT dbo.XML_XSD) NULL,

CONSTRAINT PK_Tmp_XMLLoad PRIMARY KEY CLUSTERED

(

ID ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

);

GO

-- XML index

CREATE PRIMARY XML INDEX XX_XMLData ON Tmp_XMLLoad (XMLData)

CREATE XML INDEX SX_XMLData_Property ON Tmp_XMLLoad (XMLData) 

USING XML INDEX XX_XMLData FOR PROPERTY

CREATE XML INDEX SX_XMLData_Value ON Tmp_XMLLoad (XMLData) 

USING XML INDEX XX_XMLData FOR VALUE

CREATE XML INDEX SX_XMLData_Path ON Tmp_XMLLoad (XMLData) 

USING XML INDEX XX_XMLData FOR PATH



143

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



-- Load data into table

INSERT INTO Tmp_XMLLoad (XMLData)

SELECT CAST(XMLSource AS XML) AS XMLSource

FROM OPENROWSET(BULK 'C:\SQL2012DIRecipes\CH03\Clients_Simple.xml', SINGLE_BLOB) 

AS X (XMLSource)

-- and to output:

INSERT INTO XmlImport_Clients (ID, ClientName, Town, Country)

SELECT

SRC.Client.value('ID[1]', 'INT') AS ID

,SRC.Client.value('ClientName[1]', 'VARCHAR(50)') AS ClientName

,SRC.Client.value('Town[1]', 'VARCHAR(50)') AS Town

,SRC.Client.value('Country[1]', 'INT') AS Country

FROM Tmp_XMLLoad

CROSS APPLY XMLData.nodes('CarSales/Client') AS SRC (Client) ;

Just to finish—and hopefully to start you on the road to XQuery—the following is an example of how you can

avoid T-SQL to filter data without the T-SQL WHERE clause used previously:

CROSS APPLY XMLSource.nodes('CarSales/Client[ID = 3]') AS SRC (Client);

Yes, if you know XQuery, you can use it instead to filter the data. Now, this is not the time or place for an

intensive XQuery course—but at least you know that it can be done.



Hints, Tips, and Traps





I am presuming that you have an XSD file to work with—if not, then you can use SSIS to

create one from the data, as described in Recipe 3-4.







Also, if you are testing the @XMLSource variable used in the examples, do not worry if it

only appears in a truncated form—if you use the PRINT statement, for instance. T-SQL

will only display the first 8000 characters of the 2 billion that it can handle. To reassure

yourself, you can always output a DATALENGTH(@XMLSource) to confirm that all the data is

in the variable. Incidentally, if you do wish to print the @XMLSource variable from T-SQL,

you will need to cast it back to a VARCHAR or NVARCHAR.



3-4. Importing XML Data as Part of an ETL Process

Problem

You want to import (and shred) XML data as part of a structured and controlled ETL process.



Solution

Use SSIS and the XML Source task as the data source.

1.



Open SSIS, create a new package and add a Data Flow task onto the Data Flow pane.



2.



Double-click the Data Flow task to open it. Add an XML source task onto the Data

Flow pane. Double-click the XML source task to open it.



144

www.it-ebooks.info



Chapter 3 ■ XML Data Sources



3.



Leaving the data access mode as XML File Location, browse to the XML source file

(C:\SQL2012DIRecipes\CH03\ComplexNested.XML in this recipe), then click Generate

XSD. Either keep the proposed file name (which will be the same as the XML file, but

with an XSD extension) or enter the name you require

(C:\SQL2012DIRecipes\CH03\ComplexNested.Xsd here). Click Save. The dialog box

should look something like Figure 3-1 (except with your file names if you are not using

the examples from this book).



Figure 3-1.  The XML Source Editor dialog box

4.



Click Columns. You see a table represented for each node in the XML hierarchy,

something like Figure 3-2.



145

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3-3. Shredding an XML File into an SQL Server Table

Tải bản đầy đủ ngay(0 tr)

×