Time and relative dimension in space

Time and relative dimension in space

The transactional model has been in PostgreSQL since the early versions. In PostgreSQL its implementation follows the guidelines of the SQL standard with some notable exceptions.

When designing an application it’s important to understand how the concurrent access to data happens in order to avoid unexpected results or even errors.

The MVCC

PostgreSQL implements atomicity, consistency and isolation with the MVCC. The acronym stands for Multi Version Concurrency Control which allows high efficiency in concurrent user access with minimal lock requirements.

Depending from the isolation level, when a query or a transaction access the database for the first time acquires a snapshot which includes the transactions that have been committed at that moment.

When a transaction starts it get an id called XID which is a 32 bit integer. The XID is used to determine the visibility of the other transactions using its relative position into an arbitrary timeline within the acquired snapshot.

All the committed data within the snapshot with an XID smaller than the current XID are considered in the past and therefore visible. All the transactions within the snapshot with an XID bigger than the current XID or not yet committed are considered in the future and therefore invisible.

The XID comparison is made at tuple level using two system fields called xmin and xmax which are stored into the tuple’s header.

When a new tuple is created the field xmin is set to the XID which created the tuple. This field is also known as the insert’s transaction id.

When a tuple is deleted then the field xmax is set to the XID which deleted the tuple. The xmax field is also know as the delete’s transaction id.

The deleted tuple is not physically removed but kept in place and marked as invisible for the future transactions. This approach will ensure that transactions with smaller XID are still able see the tuple within their snapshot.

The tuples that are visible to the current XID are called live tuples. Tuples that are no longer visible to the current XID are called dead tuples.

The PostgreSQL’s MVCC implementation doesn’t have a field dedicated for the update.

In PostgreSQL the update is performed with an insert/delete where the new row version is inserted and the previous row version is deleted within the same transaction.

The transaction XID of the update command is used either to set the new tuple version’s xmin and the old tuple version’s xmax.

Transaction isolation

How things behave with concurrent updates depends on the transaction isolation level.

The SQL standard defines four levels of the transaction’s isolation. Each level allows or prevents the specific phenomena that may happen during a concurrent access.

  • dirty read when a transaction can access the data written by a concurrent not committed transactions.
  • nonrepeatable read, when a transaction repeats a previous read and finds the data changed by another transaction which has committed since the initial read.
  • phantom read, when a transaction executes a previous query and finds a different set of rows with the same search condition because the results were changed by another committed transactions.
  • serialisation anomaly, when the result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.

PostgreSQL supports four isolation level, READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Each isolation level prevent a phenomena with increasing level of strictness.

Isolation Level Dirty Read Nonrepeatable Read Phantom Read Serialisation Anomaly
Read uncommitted Allowed, but not in PG Possible Possible Possible
Read committed Not possible Possible Possible Possible
Repeatable read Not possible Not possible Allowed, but not in PG Possible
Serializable Not possible Not possible Not possible Not possible

The isolation level can be set for any connecting client using the GUC default_transaction_isolation, however is not recommended to change it from its default value of READ COMMITTED. The isolation level can be set for a transaction using the command BEGIN TRANSACTION ISOLATION LEVEL.

BEGIN TRANSACTION ISOLATION LEVEL
  { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
;

Starting a transaction in READ UNCOMMITTED level will not generate an error. PostgreSQL will use implicitly the READ COMMITTED though.

Understanding how the isolation level affects the transactions may help to understand why a query may fail.

Let consider a table t_test structured with two fields, a primary key id of serial data type and a field value of type text.

The table is populated with some data which id range from 1 to 20. The field value is set to foo for all the rows.

Two sessions are connected to the database. The default transaction isolation level READ COMMITTED.

Level READ COMMITTED

In session #1 we start a transaction with isolation READ COMMITTED and then we update the field value for the row id=1.

--SESSION #1
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN
UPDATE t_test SET value='foobar' WHERE id=1;
UPDATE 1

In session #2 we start a transaction with isolation READ COMMITTED. Then we select the row with id=1. As we haven’t committed yet the change in session #1 we get the old value for the field value.

--SESSION #2
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT * FROM t_test WHERE id=1;
 id | value
----+-------
  1 | foo
(1 row)

In session #1 we commit the change and then we repeat the query in session #2.

--SESSION #2
SELECT * FROM t_test WHERE id=1;
 id | value
----+--------
  1 | foobar
(1 row)

The open transaction in session #2 sees the change made by transaction #1 causing the nonrepeatable read phenomena.

With the READ COMMITTED isolation level we can perform the update on the same row in session #2 successfully.

--SESSION #2
UPDATE t_test SET value='foo' WHERE id=1;
UPDATE 1
COMMIT;
COMMIT

Level REPEATABLE READ

With the isolation level REPEATABLE READ things work differently. This time we start both sessions with isolation REPEATABLE READ. In session #1 we update the row with id=1.

--SESSION #1
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
UPDATE t_test SET value='foobar' WHERE id=1;
UPDATE 1
COMMIT;

In session #2 we select the row with id=1.

--SESSION #2
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM t_test WHERE id=1;
 id | value
----+-------
  1 | foo
(1 row)

After we commit the change in session #1 the row doesn’t change in session #2.

--SESSION #2
SELECT * FROM t_test WHERE id=1;
 id | value
----+-------
  1 | foo
(1 row)

However if we try to update the row with id=1 in session #2 the update will fail because of a serialisation failure error.

--SESSION #2
UPDATE t_test SET value='foo' WHERE id=1;
ERROR:  could not serialize access due to concurrent update

The SQL standard guidelines allow the phantom reads in REPEATABLE READ. However in the PostgreSQL’s implementation REPEATABLE READ do not allow the phantom reads.

A transaction with REPEATABLE READ isolation sees a snapshot which is acquired at the start of the first non-transaction-control statement . That means that until the first query is issued after the transaction’s start the snapshot is not acquired. Therefore any committed transaction between the transaction’s start and the first query in the REPEATABLE READ will be visible to the transaction.

If we commit the update in session #1 before running the first select in session #2 we get the updated row version. And the update in session #2 will succeed as there is no serialisation failure.

Level SERIALIZABLE

A SERIALIZABLE transaction isolation level makes things more strict. The snapshot is acquired when the transaction is started. Any concurrent update is serialised and eventually is put in wait. Differently from the REPEATABLE READ isolation level, concurrent updates on the same rows will fail because of serialisation failure.

--SESSION #1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

--SESSION #2
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t_test SET value='foobar' WHERE id=1;
UPDATE 1

--SESSION #1
UPDATE t_test SET value='foo' WHERE id=1;
--THE SESSION WAITS FOR SESSION #2 TO COMMIT OR ROLLBACK THE TRANSACTION

The PostgreSQL’s log with log_lock_waits enabled shows something like this:

LOG:  process 5320 still waiting for ShareLock on transaction 550 after 1000.086 ms
DETAIL:  Process holding the lock: 5351. Wait queue: 5320.
CONTEXT:  while updating tuple (0,55) in relation "t_test"
STATEMENT:  UPDATE t_test SET value='foo' WHERE id=1;

As soon as session #2 close the transaction, the update in session #1 fails with a serialisation failure error.

--SESSION #1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE t_test SET value='foo' WHERE id=1;
--THE UPDATE FAILS AS SOON AS SESSION #2 CLOSE THE TRANSACTION
ERROR:  could not serialize access due to concurrent update

Aborted transactions

PostgreSQL is very strict about the transactions. The slightest error within a transaction block will put the transaction in aborted state. All the commands will be ignored until the end of the transaction which can be completed with a ROLLBACK or a COMMIT.

However PostgreSQL will always perform a ROLLBACK even if we send a COMMIT command.

BEGIN;
BEGIN
SELECT * FROM t_tesd;
ERROR:  relation "t_tesd" does not exist
LINE 1: SELECT * FROM t_tesd;
SELECT * FROM t_test;
ERROR:  current transaction is aborted, commands ignored until end of transaction block
COMMIT;
ROLLBACK

Wrap up

In PostgreSQL there is no update in place. Bearing in mind this simple concept during the design of a new application will avoid the risk of having the data bloat for unknown reasons. Frequent updates may result in tuples moving across the available free space in the table with consequent need of updating the indices.

Frequent VACUUM can keep the table bloat under control. However the structured nature of the indices makes more difficult for VACUUM to reclaim space back to the index.

PostgreSQL 12 adds the REINDEX CONCURRENTLY that help the DBA to rebuild the indices without blocking the database operations. Older versions don’t have this functionality and different strategies to keep the indices in good shape may be necessary.

The rule of thumb in this case is to design the data model correctly in order to limit the bloat, instead of wasting time on workarounds.

Finally, choosing the correct transaction isolation level is a difficult task as each one have its pros and cons.

A safe approach would keep the default level to READ COMMITTED unless a stricter isolation level is required (e.g. parallel read). Should REPEATABLE READ or SERIALIZABLE be needed it’s vital to manage correctly the concurrent updates. Otherwise the queries will fail.

picture: Time matter, copyright Federico Campoli