Tải bản đầy đủ - 0 (trang)
2-6. Mapping a Source File

2-6. Mapping a Source File

Tải bản đầy đủ - 0trang

Chapter 2 ■ Flat File Data Sources



[Invoices2.txt]

Format = CSVDelimited

CharacterSet = OEM

ColNameHeader = False

Col1 = ID Integer

Col2 = InvoiceNumber Char Width 255

Col3 = ClientID Integer

Col4 = TotalDiscount Currency

Col5 = DeliveryCharge Currency



How It Works

If the source data, does not have a header record, or if you wish to provide data type information to the text driver,

then you may need to provide a schema information file to indicate to the Microsoft text driver how best to deal

with the source data structure. A schema information file is a text file, always named Schema.ini, and always kept

in the same directory as the text data source files. It can contain schema information for multiple source files,

because each file name must be given.

A schema information file can specify the following:





The text file name (as there is only one possible Schema.ini file, this can contain

information for multiple source files, each identified by the source file name in square

brackets)







The file format







The character set







Special data type conversions







Column header indicators







The field names, widths, and types



However, a Schema.ini file does not exist only to add column names. You can use it to override the existing

column names and alter data types when reading the source data, as shown in the following part of a schema

information file. This snippet of the C:\SQL2012DIRecipes\CH02\Schema.Ini Schema.ini file overrides the data

types and column names of the text file (C:\SQL2012DIRecipes\CH02\Invoices.Txt) that we used in Recipe 2-5.

[Invoices.txt]

Format = CSVDelimited

CharacterSet = OEM

ColNameHeader = True

Col1 = ID Integer

Col2 = BillNumber Char Width 255

Col3 = IDClient Integer

Col4 = TotalDiscount Integer

Col5 = DeliveryCharge Currency

These examples only show a few of a schema information file’s possibilities. Table 2-4 provides a fuller

description of many of the available parameters.



82

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Table 2-4.  Schema.ini File Options



Specifier



Comments



Format = CSVDelimited



Indicates to the driver that it is processing a CSV file.



Format = TabDelimited



Indicates to the driver that the records are in a tab-delimited format.



Format = Delimited(delimiter)



Lets you to specify a custom delimiter.



Format = FixedLength



Indicates to the driver that this is a fixed-length data file.



CharacterSet = ANSI



Tells the driver that the file consists of ANSI characters.



CharacterSet = OEM



Tells the driver that the file consists of non-ANSI characters.



ColNameHeader = True



Tells the driver that the first row contains column headers.



There are further options available, but in my experience, they are rarely used. Should you need

this information, you can find it at http://msdn.microsoft.com/en-us/library/windows/desktop/

ms709353(v = vs.85).aspx.

In this example, I named the schema information file C:\SQL2012DIRecipes\CH02\Schema.Ini. OPENROWSET

automatically uses the information in the schema information file for each source file that is referenced.

Rather than handcraft a schema information file, you can create one a using the ODBC Administrator to save

time and minimize the potential for error. This can be done at the same time you create a System DSN, like this:

1.



Open the ODBC Data Source Administrator (Control Panel ➤ Administrative Tools ➤

DataSources). Click System DSN. The dialog box looks like that shown in Figure 2-10.



Figure 2-10.  Creating a Schema.ini File using the ODBC Source Administrator



83

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



2.



Click Add. Select Microsoft Text Driver (as shown in Figure 2-11).



Figure 2-11.  Selecting the Microsoft text driver

3.



Click Finish, then Options in the ODBC Text setup dialog box.



4.



Enter a data source name.



5.



Uncheck Use Current Directory. Click Select Directory to browse to the directory

where the source text files are located.



6.



Uncheck Default and select or enter the required file extension if it is not in the

existing list. The dialog box should look something like Figure 2-12.



84

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Figure 2-12.  Creating the Schema.ini file

7.



Click Define Format. Then click each file you wish to create full schema information

for, followed by Guess.



8.



Click OK. Confirm any error message. Cancel out of the ODBC Data Source

Administrator.



■■Note On 64-bit machines where you are running the 32-bit version of the Microsoft text driver, you must run the

32-bit version of Odbcad32.exe (located in %systemdrive%\Windows\SysWoW64) for this recipe’s solution to work.

Otherwise, you will not see the (32-bit) Microsoft text driver.

You will find a Schema.ini file containing base elements for all the files of the selected type—and full column

specifications for all files where you requested the ODBC Data Source Administrator to guess the data structures.

You may now modify this file to suit your precise requirements. The resulting file is very dense and generally difficult

to read. You may want to consider editing it to remove any unwanted file specifications, and to generally clean it up.



85

www.it-ebooks.info



Chapter 2 ■ Flat File Data Sources



Hints, Tips, and Traps





Whether the column headers exist in the first row or not, you must refer to the columns as

Col1, Col2…Col’n’ , and so forth in the Schema.Ini file.







I have read on various web postings that there is a limit of 255 columns in a Schema.ini

file. Although as I have not had to import files this wide using OPENROWSET, I cannot say

that it has ever been a problem for me.



2-7. Importing Data Using T-SQL in Anticipation

of Using a Linked Server

Problem

You want to—eventually—use a linked server to import a text file. Until this is set up, you want to code your T-SQL

for an (eventual) linked server so that you don’t have to rewrite too much code when the linked server is set up.



Solution

Use OPENDATASOURCE. The following shows the code to do it

(C:\SQL2012DIRecipes\CH02\OpendatasourceAndDestinationTable.sql).

1.



Create a suitably structured destination table:

CREATE TABLE Text_OpenrowsetInsert

(

ID

,InvoiceNumber INT

,ClientID INT

,TotalDiscount NUMERIC (18,2)

,DeliveryCharge NUMERIC (18,2)

);

GO



2.



Run the following code snippet, which loads the

C:\SQL2012DIRecipes\CH02\Invoices.Txt source file:

INSERT INTO

SELECT

FROM



Text_OpenrowsetInsert (

ID, InvoiceNumber, ClientID, TotalDiscount,

DeliveryCharge)

F1,F2,F3,F4,F5

OpenDataSource

('Microsoft.ACE.OLEDB.12.0',

'Data Source = C:\SQL2012DIRecipes\CH02;

Extended Properties = "Text;HDR = NO;"'

)... Invoices#txt;



86

www.it-ebooks.info



s



How It Works

Should you consider using a linked server to connect to your text file, but want to “test the waters” first, then you

might want to use OPENDATASOURCE to read your flat file.

You might want to use OPENDATASOURCE in the following circumstances:





When you have a consistently-structured delimited CSV or text file.







When you wish to specify parameters such as the header rows, or specify columns to

select even if there are no column names.







When the source data file is reliably and consistently structured (or at least perceived as

error-free by the Microsoft text driver).



I use the ACE driver with OPENDATASOURCE whenever possible. There are a couple of good reasons for this:





It gracefully handles the absence of column headers in the first row by naming the

columns F1, F2…F’n’, and so forth.







There are generally few 32-bit/64-bit issues.



For the Jet driver (and so by definition in a 32-bit environment), the code is

INSERT INTO

SELECT

FROM



Text_OpenrowsetInsert (ID, InvoiceNumber, ClientID,

TotalDiscount, DeliveryCharge)

F1,F2,F3,F4,F5

OPENDATASOURCE( 'Microsoft.Jet.OLEDB.4.0',

'Data Source = C:\SQL2012DIRecipes\CH02;

Extended Properties = "Text;HDR = YES;"'

)...Invoices#txt;



You can easily specify if there are header rows by setting the extended property flag to either

“Text;HDR = YES;” or “Text;HDR = NO;”.





Note You have to use a hash/pound sign (#) in the file name instead of a dot/period.

If there are no column names in the first row, the ACE driver will name the columns F1, F2… and so forth.

These names can then be used to query the data, as shown in the following. In this case, NULLS are accepted in

the first row.

SELECT

FROM



F1

OPENDATASOURCE(

'Microsoft.Jet.OLEDB.4.0',

'Data Source = C:\SQL2012DIRecipes\CH02;

Extended Properties = "Text;HDR = NO;"'

)...Invoices#txt



If you have created a System DSN (described in Recipe 4-8), then you can use this with OPENDATASOURCE, as

follows:

SELECT

FROM



ID, InvoiceNumber, ClientID, TotalDiscount, DeliveryCharge

OPENDATASOURCE('SQLOLEDB', 'DSN = MyDSN;')...Invoices#txt



87

www.it-ebooks.info



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2-6. Mapping a Source File

Tải bản đầy đủ ngay(0 tr)

×