Lead Image © Nuno Andre, 123RF.com

Lead Image © Nuno Andre, 123RF.com

New in PostgreSQL 9.3

Database Tuning

Article from ADMIN 18/2013
By
The new PostgreSQL 9.3 release introduces several speed and usability improvements, as well as SQL standards compliance.

In version 9.3, the PostgreSQL community once again introduces several important new features in a major release. The work of many developers worldwide, this release went through a rigorous review process (Commitfest) that ensures the quality of the source code. Typically, Commitfests see up to 100 patches; because of the sheer volume, the feature freeze deadline was regularly postponed. To compensate, the list of new features is impressive [1]. In this article, I present the most important ones.

Parallel Dump

A feature long requested by users and developers is the ability to perform a simultaneous database dump using multiple threads or processes. PostgreSQL uses a multiprocess architecture, which means it does not use threads (not even on platforms such as Windows, which actually prefers them). To dump multiple objects in parallel with pg_dump, the program must therefore open multiple database connections. However, because pg_dump guarantees the consistency of a dump, even for concurrent transactions, the database connections that belong to pg_dump now need to be synchronized. This was what made developing this feature tricky.

The infrastructure implemented to handle this process is called "snapshot cloning," which makes it possible for mutual synchronization of several transactions to the same state, although they use separate database connections. If you were to back up tables in parallel, and if these pg_dump transactions were not synchronized, data consistency between the tables could not be guaranteed. In this case, it would be possible for the concurrent processes to store new data in one of the tables during backup. Enter snapshot synchronization.

Basically, a transaction must first be started and can then be synchronized. All other transactions then import the snapshot generated in the process. The transaction then synchronizes this with its "parent transaction." Both have the same view of the database state. Figure 1 illustrates the sequence. First, a transaction is started. This must have a REPEATABLE READ or SERIALIZABLE degree of isolation; READ COMMITTED is not possible, because the snapshot changes after each command at this isolation level and is thus not permanent for the transaction.

Figure 1: The second transaction (right) references a previously created snapshot; it thus sees the same data.

With the use of the pg_export_snapshot() function, a snapshot can be exported in any transaction. The return value is a datum of the TEXT type with the snapshot identifier. This identifier can be imported into another transaction with the SET TRANSACTION SNAPSHOT command.

It should be noted that this must be the first command in the other transaction or must precede each SELECT, INSERT, UPDATE, or DELETE command. Furthermore, a transaction that you want to import with a SERIALIZABLE isolation level cannot import a transaction from a lesser isolation level. Once the snapshot has been imported, as in Figure 1, both transactions see the same database state. The records added to the bar table up front are no longer visible because the imported snapshot had not seen them when it was created.

The pg_dump utility implements precisely this infrastructure to synchronize multiple dump processes. Parallel dump only works for the new directory output format. The pg_dump call for several synchronized dump processes uses the new parallel dump command-line parameter, -j:

pg_dump -j 4 -Fd -f /srv/backups/db/

The important thing to note is the output format specified by the -Fd command-line parameter (d = directory). The -f option specifies the output directory; it must not exist.

The pg_dump command in major new versions of PostgreSQL has always been backward-compatible with older versions to allow migrations. Parallel dump (-j) is no exception and can be used, for example, in version 9.2 of PostgreSQL. In this case, pg_dump cannot provide synchronized snapshots; thus, no changes should be permitted to the databases for the duration of the dump to avoid inconsistencies. Alternatively, synchronized snapshots can be disabled explicitly using a command-line parameter (--no-synchronized-snapshots) so that dumps can be created with multiple pg_dump processes in older versions.

Writable FDWs

PostgreSQL 9.1 already had a partial implementation of SQL/MED standards in place, which enabled the integration of external data sources in the form of a foreign table. This was previously only possible for read access, but in PostgreSQL 9.3, the API has been extended to include writes to these foreign tables.

External data sources are defined in such a way that they appear to the user to be local PostgreSQL tables. When queried, the records are converted at run time by means of a Foreign Data Wrapper (FDW) and displayed as a database row in the result set, in line with the table definition. A large number of FDWs for other database systems are already set up in PostgreSQL, including:

  • Oracle
  • MySQL
  • Informix
  • ODBC
  • JDBC
  • NoSQL sources: CouchDB, Redis, and MongoDB.

The FDW for PostgreSQL itself was introduced to the contrib branch of the database server in version 9.3; thus, tables from external PostgreSQL databases can be integrated. Furthermore, a PostgreSQL FDW already supports Data Modifying Language (DML) write queries with INSERT, DELETE, and UPDATE.

The various distributions usually provide these additional modules in the form of a postgresql-contrib package. If everything is installed properly, the PostgreSQL FDW can simply be enabled as an extension (Figure 2). The \dx command in psql returns all the currently installed extensions for a database.

Figure 2: Adding a PostgreSQL FDW as a database extension.

You don't have to install an FDW on the remote database server, but you must ensure that the local database is granted access to the database you want, using pg_hba.conf. You can also stipulate popular PostgreSQL keywords for the database connection (e.g., host, dbname, port, etc.). However, user, password, failback_application_name, and client_encoding are prohibited; the last two are set automatically by the FDW.

To use a PostgreSQL FDW to access a remote database server, you need to configure a matching data source with the CREATE SERVER command. The example I use here accesses a remote PostgreSQL server, archives.mynet.internal. First, however, the FDW must be loaded into the local database using CREATE EXTENSION:

CREATE EXTENSION postgres_fdw;

The data source for the remote database server can then be defined as in:

CREATE SERVER pg_archive_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(dbname 'archive', host 'archives.mynet.internal', port '5432');

To create a foreign table, you need to tell the data source which user can log in to the remote PostgreSQL instance and with what combination of role name and password. To do so, you need a user mapping. The CREATE USER MAPPING command takes care of this. The access credentials used here differ between individual FDWs; for PostgreSQL, the username is mandatory and the password can be optionally provided. The latter is imperative for a mapping for a user without superuser privileges. The CURRENT_USER keyword is automatically replaced by the role currently being used in the local database session.

CREATE USER MAPPING FOR CURRENT_USER
SERVER pg_archive_server
OPTIONS(user 'bernd', password 'bernd');

After creating a data source and a mapping for the access credentials for each data source, you can then create a foreign table. The local definition should correspond to the schema of the remote data source to the extent possible. In Listing 1, the local database is given a foreign table that stores email in an archive.

Listing 1

Foreign Table

01 CREATE FOREIGN TABLE mail(
02    id bigint not null,
03    label_id bigint,
04    mail_from text not null,
05    mail_to text not null,
06    msg text not null,
07    subject text not null,
08    msg_id text not null,
09    date timestamp)
10 SERVER pg_archive_server

This corresponds exactly to the definition on the archive server, but naturally without the keywords SERVER and OPTIONS, which are not required there. Now, the local database can easily use the archive table in local queries:

SELECT COUNT(*) FROM mail WHERE mail_from LIKE 'Tom Lane%';
  count
 -------
   6238
 (1 row)

When first run, the FDW is responsible for building an appropriate mapping for the user and setting up the target server database connection. This database connection is also cached for reuse for each user mapping in the local database session.

The PostgreSQL FDW not only supports read operations but also writes DML. If transactions or save points are used in the local database, the PostgreSQL FDW also links these with transactions or savepoints in the remote database. This means that a ROLLBACK of the local databases also rolls back any changes to a remote PostgreSQL data source.

New Features with Streaming Replication

Introduced in version 9.0, built-in replication using the streaming replication method has seen steady improvements from one major version to the next. PostgreSQL 9.3 also remains true to this tradition and improves the handling of streaming replication in the event of a primary server failover or recovery.

If the primary server is restored by means of an online backup or failed over to another standby streaming replication, the affected PostgreSQL instance changes the timeline, which the database server's transaction log adheres to.

Besides the possibility of promoting a PostgreSQL Hot Standby server to a full PostgreSQL instance using the trigger_file parameter, PostgreSQL also provides the option of doing this by running the pg_ctl promote command.

In version 9.3, you also have the option of making the standby a full, writable instance without waiting for a checkpoint. You need the -m fast command-line option for this, which can save you a longer wait before the instance is available for writing.

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • PostgreSQL 9.3

    The new PostgreSQL 9.3 release introduces several speed and usability improvements, as well as SQL standards compliance.

  • What's new in PostgreSQL 9.4
    The PostgreSQL Global Development Group recently introduced the new major version 9.4 of the popular free database, which includes innovative functions as well as a whole range of changes regarding speed and functionality.
  • PostgreSQL 9.5 – What's new; what's better?
    As expected, a further release of the PostgreSQL open source database system was launched last fall, offering a number of impressive new features.
  • PostgreSQL Replication Update
    High availability, replication, and scaling are everyday necessities in the database world. What features does PostgreSQL offer in this context, and how good are they?
  • Elephant Shed PostgreSQL appliance
    The Elephant Shed PostgreSQL appliance bundles and integrates the components needed to manage a PostgreSQL server.
comments powered by Disqus