Tải bản đầy đủ
pg_hba.conf – host-based network configuration

pg_hba.conf – host-based network configuration

Tải bản đầy đủ

Understanding the PostgreSQL Transaction Log

pg_ident.conf – ident authentication

The pg_ident.conf file can be used in conjunction with the pg_hba.conf file to
configure ident authentication.

pg_multixact – multi-transaction status data

The multi-transaction-log manager is here to handle shared row locks efficiently.
There are no replication-related practical implications of this directory.

pg_notify – LISTEN/NOTIFY data

In this directory, the system stores information about LISTEN/NOTIFY (the async
backend interface). There are no practical implications related to replication.

pg_serial – information about committed
serializable transactions

Information about serializable transactions is stored here. We have to store
information about commits of serializable transactions on disk to ensure that
long-running transactions will not bloat memory. A simple SLRU structure is
used internally to keep track of those transactions.

pg_snapshot – exported snapshots

This is a file consisting of information needed by the PostgreSQL snapshot manager.
In some cases, snapshots have to be exported to disk to avoid going to memory.
After a crash, those exported snapshots will be cleaned out automatically.

pg_stat_tmp – temporary statistics data

Temporary statistical data is stored in this file. This information is needed for most
pg_stat_* system views (and therefore also for the underlying function providing
the raw data).

pg_subtrans – subtransaction data

In this directory, we store information about subtransactions. pg_subtrans
(and pg_clog) directories are permanent (on-disk) storage of transaction-related
information. There is a limited number of pages of each kept in the memory, so
in many cases there is no need to actually read from disk. However, if there's a
long-running transaction or a backend sitting idle with an open transaction, it may
be necessary to be able to read and write this information from disk. They also allow
the information to be permanent across server restarts.
[ 32 ]

Chapter 2

pg_tblspc – symbolic links to tablespaces

The pg_tblspc directory is a highly important one. In PostgreSQL, a tablespace is
simply an alternative storage location, which is represented by a directory holding
the data.
The important thing here is: If a database instance is fully replicated, we simply
cannot rely on the fact that all servers in the cluster use the very same disk layout
and the very same storage hardware. There can easily be scenarios in which a master
needs a lot more I/O power than a slave, which might just be around to function as
backup or standby. To allow users to handle different disk layouts, PostgreSQL will
place symlinks into the pg_tblspc directory. The database will blindly follow those
symlinks to find those tablespaces, regardless of where they are.
This gives end users enormous power and flexibility. Controlling storage is both
essential to replication as well as to performance in general. Keep in mind that those
symlinks can only be changed ex post. It should be carefully thought over.
We recommend using the trickery outlined in this section only when
it is really needed. For most setups, it is absolutely recommended to
use the same filesystem layout on the master as well as on the slave.
This can greatly reduce complexity.

pg_twophase – information about prepared

PostgreSQL has to store information about two-phase commit. While two-phase
commit can be an important feature, the directory itself will be of little importance
to the average system administrator.

pg_XLOG – the PostgreSQL transaction log (WAL)
The PostgreSQL transaction log is the essential directory we have to discuss in
this chapter. pg_XLOG contains all files related to the so called XLOG. If you have
used PostgreSQL already in the past, you might be familiar with the term WAL
(Write Ahead Log). XLOG and WAL are two names for the very same thing. The
same applies to the term transaction log. All these three terms are widely in use
and it is important to know that they actually mean the same thing.
The pg_XLOG directory will typically look like this:
[hs@paulapg_XLOG]$ ls -l
total 81924
-rw------- 1 hs staff 16777216 Feb 12 16:29
[ 33 ]

Understanding the PostgreSQL Transaction Log
-rw------- 1 hs staff 16777216
-rw------- 1 hs staff 16777216
-rw------- 1 hs staff 16777216
-rw------- 1 hs staff 16777216
drwx------ 2 hs staff

Feb 12 16:29
Feb 12 16:29
Feb 12 16:29
Feb 12 16:29
Feb 11 18:14 archive_status

What you see is a bunch of files, which are always exactly 16 MB in size (default
setting). The filename of an XLOG file is generally 24 bytes long. The numbering is
always hexadecimal. So, the system will count "… 9, A, B, C, D, E, F, 10" and so on.
One important thing to mention is that the size of the pg_XLOG directory will not
vary wildly over time and it is totally independent of the type of transactions you are
running on your system. The size of the XLOG is determined by postgresql.conf
parameters, which will be discussed later in this chapter. In short: No matter if you are
running small or large transactions, the size of the XLOG will be the same. You can
easily run a transaction as big as 1 TB with just a handful of XLOG files. This might not
be too efficient, performance wise, but it is technically and perfectly feasible.

postgresql.conf – the central PostgreSQL
configuration file

Finally, there is the main PostgreSQL configuration file. All configuration parameters
can be changed in postgresql.conf and we will use this file extensively to set
up replication and to tune our database instances to make sure that our replicated
setups provide us with superior performance.
If you happen to use prebuilt binaries, you might not find postgresql.
conf directly inside your data directory. It is more likely to be located in
some subdirectory of /etc/ (on Linux/Unix) or in your place of choice
in Windows. The precise location is highly dependent on the type of
operating system you are using. The typical location for data directories
is /var/lib/pgsql/data. But postgresql.conf is often located
under /etc/postgresql/9.X/main/postgresql.conf
(as in Ubuntu and similar systems) or under /etc directly.

[ 34 ]

Chapter 2

Writing one row of data

Now that we have gone through the disk layout, we will dive further into
PostgreSQL and see what happens when PostgreSQL is supposed to write one line
of data. Once you have mastered this chapter, you will have fully understood the
concept behind the XLOG.
Note that, in this section about writing a row of data, we have simplified the process
a little to make sure that we can stress the main point and the ideas behind the
PostgreSQL XLOG.

A simple INSERT statement

Let us assume that we are doing a simple INSERT statement like the following one:
INSERT INTO foo VALUES ('abcd'):

As one might imagine, the goal of an INSERT operation is to somehow add a row
to an existing table. We have seen in the previous section about the disk layout of
PostgreSQL that each table will be associated with a file on disk.
Let us perform a mental experiment and assume that the table we are dealing with
here is 10 TB large. PostgreSQL will see the INSERT operation and look for some spare
place inside this table (either using an existing block or adding a new one). For the
purpose of this example, we simply just put the data into the second block of the table.
Everything will be fine as long as the server actually survives the transaction. What
happens if somebody pulls the plug after just writing abc instead of the entire data?
When the server comes back up after the reboot, we will find ourselves in a situation
where we have a block with an incomplete record, and to make it even funnier, we
might not even have the slightest idea where this block containing the broken record
might be.
In general, tables containing incomplete rows in unknown places can be considered
to be corrupted tables. Of course, systematic table corruption is nothing the
PostgreSQL community would ever tolerate, especially not if problems like that are
caused by clear design failures.
We have to ensure that PostgreSQL will survive interruptions at any
given point in time without losing or corrupting data. Protecting
your data is not a nice to have but an absolute must.

[ 35 ]