Lead Image © Sergey Nivens, 123RF.com

Lead Image © Sergey Nivens, 123RF.com

PostgreSQL 9.5 – What's new; what's better?

An Eye for Details

Article from ADMIN 31/2016
By
As expected, a further release of the PostgreSQL open source database system was launched last fall, offering a number of impressive new features.

PostgreSQL 9.5 contains more than 100 changes, including dozens of performance improvements. The first highlight in PostgreSQL 9.5 is the extension of the INSERT command to include the ON CONFLICT DO UPDATE clause, which lets you stipulate that an UPDATE is performed instead of an INSERT if the line to be inserted already exists. This behavior is also known as "UPSERT."

UPSERT

In practical applications, the UPSERT case occurs quite often, but users of PostgreSQL previously had to implement the solution manually as a stored procedure or rely on the logic in the client application. The difficulty originates with the defaults in the SQL standard and other database systems, which cause considerable technical problems unacceptable to the PostgreSQL developers [1]. Although the logic of an UPSERT is quite simple, it becomes complicated if this logic needs to deliver high performance in parallel operations without generating deadlocks or sporadic, unique constraint errors, which is very much the case with other implementations.

For these reasons, the PostgreSQL project decided specifically to design a separate syntax with its own semantics. In this example, a table stores subscriptions from a service with a user ID, the subscription type, and the start and end dates:

CREATE TABLE subscriptions (
  user_id int NOT NULL,
  subscription_type text NOT NULL,
  started date NOT NULL,
  expires date NOT NULL,
  PRIMARY KEY (user_id, subscription_type)
);

An UPSERT operation is now required to either insert a new entry or update an existing entry. Previously, developers needed to rely on a loop that performed INSERT or UPDATE operations until one of them worked without error. You could do this, for example, with a PL/pgSQL function of the type shown in Listing 1. The loop is necessary in this example because another connection could add a conflicting line between the UPDATE and the INSERT.

Listing 1

UPSERT Replacement

01 CREATE FUNCTION set_subscription(in_user_id int, in_subscription_type text, in_started date, in_expires date) RETURNS void
02 LANGUAGE plpgsql
03 STRICT
04 AS $$
05 BEGIN
06 <<upsert>>
07 LOOP
08   UPDATE subscriptions
09     SET (started, expires) = (in_started, in_expires)
10     WHERE (user_id, subscription_type) = (in_user_id, in_subscription_type);
11   EXIT upsert WHEN FOUND;
12   BEGIN
13     INSERT INTO subscriptions (user_id, subscription_type, started, expires)
14       VALUES (in_user_id, in_subscription_type, in_started, in_expires);
15     EXIT upsert;
16   EXCEPTION WHEN unique_violation THEN
17   END
18 END LOOP;
19 $$;

The logic of Listing 1 can be shortened to a single command with an UPSERT, as shown in Listing 2. The command initially comprises a normal INSERT, followed by an ON CONFLICT clause describing the conflict and the action to be performed. In this typical case, the conflict relates to the potential ambiguity of the primary key as soon as an attempt is made to insert the same user with the same subscription once again. The action is an UPDATE command.

Listing 2

With UPSERT Command

01 INSERT INTO subscriptions (user_id, subscription_type, started, expires)
02   VALUES (in_user_id, in_subscription_type, in_started, in_expires)
03   ON CONFLICT (user_id, subscription_id)
04     DO UPDATE SET (started, expires) = (in_started, in_expires);

As you can see, the new command saves a great deal of code and offers better performance by avoiding the problem of DIY implementations, which need to be anticipated for cases in which the action fails in other connections.

If you want to avoid repeating the input values from the VALUES clause in the UPDATE command, you can also use the EXCLUDED alias to point to the value of the line whose insert failed (Listing 3). On the other hand, you can also write completely different values in the UPDATE than in the INSERT if you somehow want to log the fact that a conflict occurred.

Listing 3

With EXCLUDED

01 INSERT INTO subscriptions (user_id, subscription_type, started, expires)
02   VALUES (in_user_id, in_subscription_type, in_started, in_expires)
03   ON CONFLICT (user_id, subscription_id)
04     DO UPDATE SET (started, expires) = (EXCLUDED.started, EXCLUDED.expires);

Although UPSERT is more likely to be useful to developers of web applications and other transactional applications, PostgreSQL 9.5 also shows innovations in the field of analytics.

Group Theory

Listing 4 shows some sample data and a number of applications of the grouping function. PostgreSQL supports grouping operations known from other SQL systems, as well, such as CUBE and ROLLUP, which extend the well-known GROUP BY clause and support the execution of multiple grouping variants at the same time.

Listing 4

New Grouping Options

01 CREATE TABLE HR (
02   country text,
03   city text,
04   department text,
05   name text
06 );
07
08 INSERT INTO HR VALUES
09   ('DE', 'Berlin', 'Sales', 'Christian'),
10   ('DE', 'Berlin', 'Marketing', 'Sandra'),
11   ('DE', 'Frankfurt', 'Sales', 'Stefan'),
12   ('AT', 'Innsbruck', 'HR', 'Katrin'),
13   ('CH', 'Zurich', 'Sales', 'Alexander');
14
15 SELECT country, count(*) FROM HR GROUP BY country;
16 SELECT country, city, count(*) FROM HR GROUP BY country, city;
17 SELECT department, count(*) FROM HR GROUP BY department;
18 SELECT country, department, count(*) FROM HR GROUP BY country, department;

CUBE or ROLLUP let you combine the scan queries. Listing 5 shows an example using ROLLUP. The option groups on the specified columns – as a normal GROUP BY would – and by all prefixes of the column list, including the empty list. This is particularly useful for hierarchical structures. In the example, you can see figures at the local departmental level and at the superordinate organizational levels, as well as the total number for each query. Without ROLLUP, you need to do this with separate queries (or possibly glue them together with a UNION), which would be troublesome and slower.

Listing 5

An Example with ROLLUP

> SELECT country, city, department, count(*) FROM HR GROUP BY ROLLUP (country, city, department) ORDER BY 1, 2, 3;
+---------+-----------+------------+-------+
| country |   city    | department | count |
+---------+-----------+------------+-------+
| AT      | Innsbruck | HR         |     1 |
| AT      | Innsbruck |            |     1 |
| AT      |           |            |     1 |
| CH      | Zurich    | Sales      |     1 |
| CH      | Zurich    |            |     1 |
| CH      |           |            |     1 |
| DE      | Berlin    | Marketing  |     1 |
| DE      | Berlin    | Sales      |     1 |
| DE      | Berlin    |            |     2 |
| DE      | Frankfurt | Sales      |     1 |
| DE      | Frankfurt |            |     1 |
| DE      |           |            |     3 |

The CUBE option groups by other possible combinations of the grouping columns. In the results for the example in Listing 6, you can see the global totals for the various departments. However, not all of the lines from the CUBE results make sense in this example. The NULL, Berlin, NULL line, for example, groups all cities by the name of Berlin in all countries. This may have some attraction as a curiosity, but very little practical use.

Listing 6

An Example with CUBE

> SELECT country, city, department, count(*) FROM HR GROUP BY CUBE (country, city, department) ORDER BY 1, 2, 3;
+---------+-----------+------------+-------+
| country |   city    | department | count |
+---------+-----------+------------+-------+
| AT      | Innsbruck | HR         |     1 |
| AT      | Innsbruck |            |     1 |
| AT      |           | HR         |     1 |
| AT      |           |            |     1 |
| CH      | Zürich    | Sales      |     1 |
| CH      | Zürich    |            |     1 |
| CH      |           | Sales      |     1 |
| CH      |           |            |     1 |
| DE      | Berlin    | Marketing  |     1 |
| DE      | Berlin    | Sales      |     1 |
| DE      | Berlin    |            |     2 |
| DE      | Frankfurt | Sales      |     1 |
| DE      | Frankfurt |            |     1 |
| DE      |           | Marketing  |     1 |
| DE      |           | Sales      |     2 |
| DE      |           |            |     3 |
|         | Berlin    | Marketing  |     1 |
|         | Berlin    | Sales      |     1 |
|         | Berlin    |            |     2 |
|         | Frankfurt | Sales      |     1 |
|         | Frankfurt |            |     1 |
|         | Innsbruck | HR         |     1 |
|         | Innsbruck |            |     1 |
|         | Zürich    | Sales      |     1 |
|         | Zürich    |            |     1 |
|         |           | Marketing  |     1 |
|         |           | HR         |     1 |
|         |           | Sales      |     3 |
|         |           |            |     5 |

If you want even more control over the output, you can use a GROUPING SETS clause to specify the combinations to be grouped. After all, CUBE and ROLLUP themselves are simply abbreviations for specific GROUPING SETS. The example in Listing 7 shows a potential use case.

Listing 7

GROUPING SETS

> SELECT country, city, department, count(*) FROM HR GROUP BY GROUPING SETS
  ((country, city, department), (country, city), (country), (), (department),
  (country, department)) ORDER BY 1, 2, 3;
+---------+-----------+------------+-------+
| country |   city    | department | count |
+---------+-----------+------------+-------+
| AT      | Innsbruck | HR         |     1 |
| AT      | Innsbruck |            |     1 |
| AT      |           | HR         |     1 |
| AT      |           |            |     1 |
| CH      | Zürich    | Sales      |     1 |
| CH      | Zürich    |            |     1 |
| CH      |           | Sales      |     1 |
| CH      |           |            |     1 |
| DE      | Berlin    | Marketing  |     1 |
| DE      | Berlin    | Sales      |     1 |
| DE      | Berlin    |            |     2 |
| DE      | Frankfurt | Sales      |     1 |
| DE      | Frankfurt |            |     1 |
| DE      |           | Marketing  |     1 |
| DE      |           | Sales      |     2 |
| DE      |           |            |     3 |
|         |           | Marketing  |     1 |
|         |           | HR         |     1 |
|         |           | Sales      |     3 |
|         |           |            |     5 |

A Little More Partitioning

PostgreSQL has no built-in solution to partitioning, and this does not change in PostgreSQL 9.5. The typical workaround is to construct your own partitioning manually. An inheritance hierarchy is a useful solution (Listing 8).

Listing 8

DIY Partitioning

> CREATE TABLE logins (tx timestamp, ip_adress, ...);
> CREATE TABLE logins_201509 (CHECK (tx >= '2015-09-01' AND tx < '2015-10-01')) INHERITS (logins);
> CREATE TABLE logins_201510 (CHECK (tx >= '2015-10-01' AND tx < '2015-11-01')) INHERITS (logins);

Using a combination of triggers and constraints, you can divide the data up into various physical tables, and the user only needs to cite the underlying parent table when reading. The database automatically accesses the matching child tables. Some PostgreSQL extension modules can help establish and manage these structures with less manual overhead – pg_partman being one example.

One innovation in PostgreSQL 9.5 is that the subordinate child tables can also be foreign tables that reside on other computers. Foreign tables are tables the database accesses via plugins known as foreign data wrappers (FDWs), which can be data from other database systems, from files, and even from Internet services. Another PostgreSQL database is also an option.

Foreign tables can now have CHECK constraints. In partitioned tables, the planner looks at the constraints in the database to skip the partitions whose CHECK constraints rule out the query criteria. The combination of these two innovations thus makes it possible to use the same approach as previously for partitioning but also extend partitioning beyond computer boundaries.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=