The SQL standard defines four levels of transaction isolation in terms of three phenomena that must be prevented between concurrent transactions. These undesirable phenomena are:
A transaction reads data written by a concurrent uncommitted transaction.
A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
The four transaction isolation levels and the corresponding behaviors are described in Table 12.1, “SQL Transaction Isolation Levels”.
Table 12.1. SQL Transaction Isolation Levels
Isolation Level | Dirty Read | Nonrepeatable Read | Phantom Read |
---|---|---|---|
Read uncommitted | Possible | Possible | Possible |
Read committed | Not possible | Possible | Possible |
Repeatable read | Not possible | Not possible | Possible |
Serializable | Not possible | Not possible | Not possible |
In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable. When you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you select. This is permitted by the SQL standard: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen. The reason that PostgreSQL only provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture. The behavior of the available isolation levels is detailed in the following subsections.
To set the transaction isolation level of a transaction, use the command SET TRANSACTION.
Read Committed
is the default isolation level in PostgreSQL.
When a transaction runs on this isolation level,
a SELECT
query sees only data committed before the
query began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions. (However, the
SELECT
does see the effects of previous updates
executed within its own transaction, even though they are not yet
committed.) In effect, a SELECT
query
sees a snapshot of the database as of the instant that that query
begins to run. Notice that two successive SELECT
commands can
see different data, even though they are within a single transaction, if
other transactions
commit changes during execution of the first SELECT
.
UPDATE
, DELETE
, SELECT
FOR UPDATE
, and SELECT FOR SHARE
commands
behave the same as SELECT
in terms of searching for target rows: they will only find target rows
that were committed as of the command start time. However, such a target
row may have already been updated (or deleted or locked) by
another concurrent transaction by the time it is found. In this case, the
would-be updater will wait for the first updating transaction to commit or
roll back (if it is still in progress). If the first updater rolls back,
then its effects are negated and the second updater can proceed with
updating the originally found row. If the first updater commits, the
second updater will ignore the row if the first updater deleted it,
otherwise it will attempt to apply its operation to the updated version of
the row. The search condition of the command (the WHERE
clause) is
re-evaluated to see if the updated version of the row still matches the
search condition. If so, the second updater proceeds with its operation,
starting from the updated version of the row. (In the case of
SELECT FOR UPDATE
and SELECT FOR
SHARE
, that means it is the updated version of the row that is
locked and returned to the client.)
Because of the above rule, it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands that affected the same rows it is trying to update, but it does not see effects of those commands on other rows in the database. This behavior makes Read Committed mode unsuitable for commands that involve complex search conditions. However, it is just right for simpler cases. For example, consider updating bank balances with transactions like
BEGIN; UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345; UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534; COMMIT;
If two such transactions concurrently try to change the balance of account 12345, we clearly want the second transaction to start from the updated version of the account's row. Because each command is affecting only a predetermined row, letting it see the updated version of the row does not create any troublesome inconsistency.
Since in Read Committed mode each new command starts with a new snapshot that includes all transactions committed up to that instant, subsequent commands in the same transaction will see the effects of the committed concurrent transaction in any case. The point at issue here is whether or not within a single command we see an absolutely consistent view of the database.
The partial transaction isolation provided by Read Committed mode is adequate for many applications, and this mode is fast and simple to use. However, for applications that do complex queries and updates, it may be necessary to guarantee a more rigorously consistent view of the database than the Read Committed mode provides.
The level Serializable provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, applications using this level must be prepared to retry transactions due to serialization failures.
When a transaction is on the serializable level,
a SELECT
query sees only data committed before the
transaction began; it never sees either uncommitted data or changes
committed
during transaction execution by concurrent transactions. (However, the
SELECT
does see the effects of previous updates
executed within its own transaction, even though they are not yet
committed.) This is different from Read Committed in that the
SELECT
sees a snapshot as of the start of the transaction, not as of the start
of the current query within the transaction. Thus, successive
SELECT
commands within a single transaction always see the same
data.
UPDATE
, DELETE
, SELECT
FOR UPDATE
, and SELECT FOR SHARE
commands
behave the same as SELECT
in terms of searching for target rows: they will only find target rows
that were committed as of the transaction start time. However, such a
target
row may have already been updated (or deleted or locked) by
another concurrent transaction by the time it is found. In this case, the
serializable transaction will wait for the first updating transaction to commit or
roll back (if it is still in progress). If the first updater rolls back,
then its effects are negated and the serializable transaction can proceed
with updating the originally found row. But if the first updater commits
(and actually updated or deleted the row, not just locked it)
then the serializable transaction will be rolled back with the message
ERROR: could not serialize access due to concurrent update
because a serializable transaction cannot modify or lock rows changed by other transactions after the serializable transaction began.
When the application receives this error message, it should abort the current transaction and then retry the whole transaction from the beginning. The second time through, the transaction sees the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.
Note that only updating transactions may need to be retried; read-only transactions will never have serialization conflicts.
The Serializable mode provides a rigorous guarantee that each transaction sees a wholly consistent view of the database. However, the application has to be prepared to retry transactions when concurrent updates make it impossible to sustain the illusion of serial execution. Since the cost of redoing complex transactions may be significant, this mode is recommended only when updating transactions contain logic sufficiently complex that they may give wrong answers in Read Committed mode. Most commonly, Serializable mode is necessary when a transaction executes several successive commands that must see identical views of the database.
The intuitive meaning (and mathematical definition) of
“serializable” execution is that any two successfully committed
concurrent transactions will appear to have executed strictly serially,
one after the other — although which one appeared to occur first may
not be predictable in advance. It is important to realize that forbidding
the undesirable behaviors listed in Table 12.1, “SQL Transaction Isolation Levels”
is not sufficient to guarantee true serializability, and in fact
PostgreSQL's Serializable mode does
not guarantee serializable execution in this sense. As an example,
consider a table mytab
, initially containing
class | value -------+------- 1 | 10 1 | 20 2 | 100 2 | 200
Suppose that serializable transaction A computes
SELECT SUM(value) FROM mytab WHERE class = 1;
and then inserts the result (30) as the value
in a
new row with class
= 2. Concurrently, serializable
transaction B computes
SELECT SUM(value) FROM mytab WHERE class = 2;
and obtains the result 300, which it inserts in a new row with
class
= 1. Then both transactions commit. None of
the listed undesirable behaviors have occurred, yet we have a result
that could not have occurred in either order serially. If A had
executed before B, B would have computed the sum 330, not 300, and
similarly the other order would have resulted in a different sum
computed by A.
To guarantee true mathematical serializability, it is necessary for
a database system to enforce predicate locking, which
means that a transaction cannot insert or modify a row that would
have matched the WHERE
condition of a query in another concurrent
transaction. For example, once transaction A has executed the query
SELECT ... WHERE class = 1
, a predicate-locking system
would forbid transaction B from inserting any new row with class 1
until A has committed.
[6]
Such a locking system is complex to
implement and extremely expensive in execution, since every session must
be aware of the details of every query executed by every concurrent
transaction. And this large expense is mostly wasted, since in
practice most applications do not do the sorts of things that could
result in problems. (Certainly the example above is rather contrived
and unlikely to represent real software.) For these reasons,
PostgreSQL does not implement predicate
locking.
In those cases where the possibility of nonserializable execution is a real hazard, problems can be prevented by appropriate use of explicit locking. Further discussion appears in the following sections.
[6] Essentially, a predicate-locking system prevents phantom reads by restricting what is written, whereas MVCC prevents them by restricting what is read.