The cluster in action - part 2
2.4 The background writer
Before the spread checkpoints the only solution to ease down the IO spike caused by the checkpoint was to tweak the background writer. This process were introduced with the revolutionary PostgreSQL 8.0. The writer, as the name suggests, works in the background searching for dirty buffers to write on the data files. The writer works in rounds. When the process awakes scans the shared buffer for dirty buffers. When the amount of buffers cleaned reaches the value set in bgwriter_lru_maxpages the process sleeps for the time set in bgwriter_delay.
2.5 The autovacuum
The routine vacuuming is an important task to prevent the table bloat and the dreaded XID wraparound failure. If enabled the autovacuum launcher starts one daemon for each relation with enough dead tuples to trigger the conditions set in autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. An autovacuum daemon is a normal backend and appears in the view pg_stat_activity. Because the XID wraparound failure is a really serious problem, the autovacuum to prevent wraparound starts even if the autovacuum is turned off.
2.6 The backends
The PostgreSQL backend architecture is the brilliant solution to a nasty
problem. How to guarantee the buffers are read only by one session at
time and avoid the bottleneck of a long waiting queue. When a backend
needs to access a particular tuple, either for read or write, the
relation’s pages are accessed to find the tuple matching the search
criteria. When a buffer is accessed then the backend sets a pin on the
buffer which prevents the other backends requiring the same page to
wait. As soon as the tuple is found and processed the pin is removed. If
the tuple is modified the MVCC enforces the tuple’s visibility to the
other backends. The process is fine grained and very efficient. Even
with an high concurrency rate on the same buffers is very difficult to
have the backends entangled.
A backend process is a fork of the main postgres process. It’s very
important to understand that the backend is not the connection but a
server process which interacts with the connection. Usually the backend
terminates when the connection disconnects. However, if a client
disconnects ungracefully meanwhile a query is running without signalling
the backend, the query will continue only to find there’s nothing
listening on the other side. This is bad for many reasons. First because
is consuming a connection slot for nothing. Also the cluster is doing
something useless consuming CPU cycles and memory.
Like everything in PostgreSQL the backend architecture is oriented to protect the data and in particular the volatile shared buffer. If for some reasons one of the backend process crashes then the postgres process terminates all the backends in order to prevent the potential shared buffer corruption. The clients should be able to manage this exception resetting the connection.
2.7 Wrap up
The cluster’s background activity remains most of the time unnoticed. The users and developers can mostly ignore this aspect of the PostgreSQL architecture leaving the difficult business of understanding the database heartbeat to the DBA, which should have the final word on any potential mistake in the design specs. The next chapters will explore the PostgreSQL’s architecture in details, starting with the memory.