Tải bản đầy đủ - 0trang
4-17. Sourcing Data from PostgreSQL
Chapter 4 ■ SQL Databases
Figure 4-44. Configuring a PostgreSQL ODBC driver
Click Add. Select the PostgreSQL ODBC driver.
Click Finish. The ODBC Connector dialog box will appear. Configure the PostgreSQL
ODBC driver so that it contains the elements shown in Figure 4-44. You will use your
own specific parameters, of course.
Save your changes.
How It Works
There is an excellent and functional ODBC driver available to download from the PostgreSQL web site (www.
postgresql.org), which, once configured, allows you to use SSIS, linked servers, OPENROWSET, and OPENQUERY
without any difficulties. As was the case with DB2 and MySQL, no client software is required, which certainly
So, to avoid fruitless repetition, and assuming that you have downloaded the latest version of this driver, all you
have to do is to create a DSN as described for MySQL—only configured as in Figure 4-44 (step 4 for DSN setup).
The configuration elements are largely self-explanatory, but nonetheless a concise description is given
in Table 4-2.
Table 4-2. PostgreSQL ODBC Configuration
The name you choose to identify the ODBC DSN.
The database you are connecting to.
The server hosting the database.
The user with the required access rights to the data source.
An optional description of the DSN.
The SSL mode used. SSL disabled works perfectly fine.
The PostgreSQL port (here, the default is used).
The user password.
Once you have configured the DSN, you can use SSIS, linked servers, OPENROWSET, or OPENQUERY to import
data from PostgreSQL. This can be done exactly as described for MySQL—only using the DSN name that you gave
to the PostgreSQL driver, of course; so I will not repeat it all here, but refer you back to Recipe 4-9 for the details.
This chapter demonstrated many ways of importing source data from SQL databases. The subject is broad, and—
as I wrote initially—not everything can be covered given the enormous scope of the subject. However, in this
chapter you saw how to download data from many of the major relational databases that are currently available.
Specifically, you saw examples involving the following:
Table 4-3 gives you my take on the various methods outlined in this chapter, listing their advantages and
Chapter 4 ■ SQL Databases
Table 4-3. Comparison of the Methods Used in This Chapter
More complex to install and configure.
Easier to install and configure.
Fast data load.
Longer time to set up a package.
Easy to use for querying external databases.
Can be complex to configure.
Requires greater permissions.
SQL Server Migration
Rapid acquisition of source metadata.
Can only open entire tables and datasets.
Can build up a data load project over time.
I have to be fair, and warn you that cross-database data migration can truly be a minefield. All too often
it can be a “minor” detail about the source database that can hold you up for hours until the issue is resolved.
Even more frequently the source database DBA can prove reluctant to share their knowledge. Nevertheless, if
you are patient, and above all do not rush things, then there is nothing to stop you migrating source data from
the databases that we have looked at, and/or connecting to them to define and create a truly heterogeneous data
extraction and load process. All I am trying to say is that a little calm and some charm can be your greatest allies
in this particular corner of the ETL battlefield.
You will notice that I have not discussed the SQL Server Import Wizard in this chapter. Quite simply, if you
have configured the provider and/or client for an external database, then using the SQL Server Import Wizard
is exactly as described in Recipe 1-2 (among others). All you have to do is use an OLEDB or ODBC connection
(depending on the source database and your specific preferences) as the data source. So I will not waste time
here on pointless repetition, and let you use the SQL Server Import Wizard if you so desire.
SQL Server Sources
In this book, we look at importing data from several relational databases, and some of the ways in which they
can be used as data sources for SQL Server. Yet there is one relational database we have not talked about, and
that is SQL Server itself. So to continue our “data source tour,” let’s examine at some of the ways in which you can
transfer data between SQL Server databases.
This overview includes:
Ad hoc querying external SQL Server instances
SQL Server linked servers
Bulk loading of data from one SQL Server database to another SQL Server database
Loading data from older versions of SQL Server into SQL Server 2005, 2008, and 2012
Backup using COPY_ONLY
Copying and pasting tiny amounts of data between databases
Loading data into SQL Server Azure
The choice of SQL Server as a data source may seem surprising, but in many enterprises, there are dozens—
if not hundreds—of SQL Servers, often running different versions of the Microsoft RDBMS. So you may well need
to know what your options are as far as getting data between versions of SQL Server is concerned.
It is not possible to discuss every aspect of data transfer between SQL Server versions, and there are
inevitably certain technologies that fall outside the scope of this book. As my focus is on data integration with
a strong focus on ETL, I will not be examining any of the many High Availability options for SQL Server, nor
anything touching on Service Broker. Neither will I mention the Import/Export Wizard, as this has been covered
extensively in Chapters 1, 2 and 4.
I will look at migrating data to SQL Server Azure in this chapter, however. While the Microsoft database “in
the cloud” will doubtless replace onsite databases from many vendors, it seems most fruitful and comprehensible
to discuss it as a logical destination for data from Microsoft databases.
There are a few points to note as far as following the example given in this chapter is concerned. First, you
will need another SQL Server 2012 instance for many of the examples. This can be either a separate networked
server or a second installation of SQL Server with a defined instance name. I am using the ADAM02\AdamRemote
instance in the examples. You will need to replace this with the server and possibly instance that you are using.
You will also need to deploy the CarSales example database onto this second instance. All examples presume
that you are using the CarSales database unless another database is indicated. Any sample files used in this
chapter are found in the C:\SQL2012DIRecipes\CH05 directory—assuming that you have downloaded the
samples from the book’s companion web site and installed them as described in Appendix B.
Chapter 5 ■ SQL Server Sources
5-1. Loading Ad Hoc Data from Other SQL Server Instances
You want to load data on an ad hoc basis from another SQL Server instance quickly and easily.
Use OPENROWSET and OPENDATASOURCE. This allows you to connect quickly to the source data and select any data
subsets using T-SQL.
This is the code to use for using OPENROWSET (C:\SQL2012DIRecipes\CH05\OpenRowset.Sql):
FROM OPENROWSET('SQLNCLI', 'Server=ADAM02;Trusted_Connection=yes;',
ClientName') AS Lnk;
You can use OPENDATASOURCE like this (C:\SQL2012DIRecipes\CH05\OpenDataSource.Sql):
SELECT ID, ClientName, Town
FROM OPENDATASOURCE('SQLNCLI', 'Data Source=ADAM02\AdamRemote;
The source data is loaded into the destination table in both cases.
How It Works
Should you want to import data as a “one-off,” then a quick connection to another SQL server instance is,
fortunately, extremely easy. There are, as for most external relational sources, two ways of establishing the
connection. They are
OPENROWSET: for occasional queries.
OPENDATASOURCE: for occasional queries that could evolve into linked servers one day.
The following are the relevant prerequisites.
An OLEDB provider must be installed on every external SQL Server instance. Admittedly,
this is normally part of an SQL Server installation, but I prefer to state the obvious.
An OLEDB provider must be installed on every SQL Server that is part of a cluster.
Chapter 5 ■ SQL Server Sources
Ad hoc distributed queries must be enabled on the server from which you are
running the query. This is done using the following T-SQL snippet
EXECUTE master.dbo.sp_configure 'show advanced options', 1;
EXECUTE master.dbo.sp_configure 'ad hoc distributed queries', 1;
For an occasional ad hoc query, you may find that OPENROWSET is the easiest solution. To clarify, the
parameters for OPENROWSET are essentially in three parts:
The OLEDB provider
A provider string, containing server and security parameters
A T-SQL query to retrieve the data
As the provider string only specifies the server, you are probably best advised to use three-part notation to
specify the database, schema, and table or view from which you wish to source data. If the login defaults to that
database and schema, then of course, you will have no problems; but I advise this as a best practice habit. If you
wish to use SQL Server security rather than a trusted connection, then replace Trusted_Connection=yes with
logon and password details like this:
Note that the security information is all part of the second parameter, and the parameter elements are
separated by a semicolon. Also, at the risk of stating the obvious, leaving security information in clear text like
this is extremely risky. If you have no other choices, then you should consider wrapping the SELECT statement in
a stored procedure created using the WITH ENCRYPTION option, which hides the text of the stored procedure from
many—but not all—prying eyes. Alternatively, the stored procedure could reside on the remote server.
In this case, it would need to be created by the team that administers that server. A stored procedure is generally
the better option for security because you would not be passing details of your schema over the network.
Remember that you are using pure T-SQL, and so can extend the SELECT clauses (both that passed to the
external server and the code wrapping the OPENROWSET command) to include a WHERE, ORDER BY, and GROUP BY
clauses, as well as column aliases. These techniques are described in greater detail in Recipes 1-4 and 1-5.
If you are using OPENDATASOURCE, you can use SQL Server security, with all the caveats that leaving passwords
in clear text imply. Here is a snippet to show it:
'Data Source=ADAM02\ AdamRemote;User ID Adampassword=
Hints, Tips, and Traps
If you suspect that an ad hoc query may have to become part of something more
permanent one day, then setting it up using OPENDATASOURCE allows you to make the
change to a linked server more easily. Its use of the SQL Server four-part notation allows
you to replace the SQL snippet with a linked server reference at a later date.