A mechanical elephant
In the previous post we modified the apt role for controlling the setup in a declarative way. Then we added a ssh role for configuring the three devuan servers. The role is used to configure the server’s postgres process owner for ssh passwordless connection.
In this tutorial we’ll complete the setup for the postgres user and then we’ll configure the database clusters with a new role.
Before starting this tutorial please check out the branch 03_setup_pgsql
from the repository https://github.com/the4thdoctor/dynamic_duo.
Completing the os user setup
The apt role defined in the branch 02_ssh_setup
adds two variables, pg_osuser
and pg_osgroup
, for configuring the operating system’s user and group that will own the postgresql process. However this is not sufficient to have a complete working installation.
As PostgreSQL can be configured to work with SSL, this requires the process owner to be able to access the SSL certificates. So we need to give the process owner the memberships to te group that can read the certificate files.
Again we are updating the apt
role.
We’ll first add a variable into the file all
in group_vars
. The variable pg_additional_groups
is a list where we define all the additional groups where the postgres process owner should belong. In our example we add just ssl-certs
which have the read permissions on the SSL certificates.
pg_additional_groups:
- "ssl-cert"
As explained before all the variables defined in the all
file can be overridden using a specific group_vars
or per host configuration file.
We also need to change the main.yml
file for the apt
role’s. The operating system user creation now have another parameter where the additional groups are listed.
- name: create the {{ pg_osuser }} user and the home directory
user:
name: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
groups: "{{ pg_additional_groups }}"
create_home: yes
move_home: yes
home: "{{ pg_home_dir }}"
shell: "{{ pg_shell }}"
system: yes
state: present
The pgsql role
The role pgsql
is used to configure the PostgreSQL clusters present on our servers. All the default settings are configured in the wildcard group_vars
file all
leaving the reader the possibility to customise the setup via group and host variables.
The inventory
Our inventory requires some new variables.
The variables pg_super_user
and pg_super_password
are used for defining the database default super user and its password.
As the password is sensitive matter we save the value as a vault encrypted string which is reasonably safe to store on shared repositories.
Declaring the pg_super_user
makes simpler to comply on stringent rules like the PCI DSS where using the default vendor’s super user is forbidden.
The password encrypted in the all
file is flynn_lives
and the passphrase for the encrypted string is postgres
.
It’s strongly suggested to add at least a group_vars file for the group pgsql
to replace the example password and passphrase with a pair of stronger values.
pg_super_user: "postgres"
# The pg_super_password is "flynn_lives". The vault password is "postgres".
# You may want to change this with a pair of stronger database and vault passwords
# Check https://docs.ansible.com/ansible/2.5/user_guide/vault.html for vault's usage
pg_super_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
30313235613232653662306165306430383737646461333032303063373238646334363265326162
6136636134383332336562386464363735323233656337620a663961343239303131316266346431
39633032333035613331313136313962626634656637656431346436626532303266653333313435
3134633031373431620a623937386564346538663062643936353432643835383965616539623731
3639
The variables logrotate_file
and log_rotate
define the location of the logrotate configuration file and the default log rotate strategy.
logrotate_file: "/etc/logrotate.d/pgsql_custom"
log_rotate:
mode: "weekly"
keep: "10"
The dictionary pg_dirs
is used to store the default top level locations for the data_area
, log_directory
and wal_area
.
The variable default_locale
defines the default locale assigned to the cluster when initialised, if not differently specified.
pg_dirs:
data_area: "/var/lib/postgresql"
log_directory: "/var/log/postgresql"
wal_area: "/var/lib/postgresql/pg_wal"
default_locale: "en_GB.UTF-8"
All the variables described can be overridden in the cluster configuration.
The variable pg_clusters
defines the PostgreSQL clusters as a dictionary.
Each dictionary’s key is the cluster name used with pg_createcluster along with the version.
The variable params
enlists the Grand Unified Configuration (GUC) parameters that we want to apply to our cluster.
The variable pg_hba
defines the pg_hba.conf
entries for the specific cluster
In our example we have three clusters defined. tron
, clu
and sark
.
- tron is a PostgreSQL version 10 and listens on the port 5432, listening on all available addresses. The data_area and log_directory stored in custom locations.
The super user is alan1 and is configured to use the locale
en_US.UTF-8
. The log rotation is on a daily basis and the pg_hba is configured to accept md5 authentication on local and localhost connections, ipv4 and ipv6. - clu is a PostgreSQL version 9.6 and listens on the port 5433,listening on all available addresses and with the wal level set to logical. The super user is flynn.
- sark is a PostgreSQL version 9.6 and listens on the port 5434,listening on all available addresses and with the wal level set to replica. The super user is mcp.
pg_clusters:
tron:
pg_dirs:
data_area: "/pg_data"
log_directory: "/pg_log"
version: "10"
pg_super_user: "alan1"
locale: "en_US.UTF-8"
params:
port: "5432"
listen_addresses: "*"
log_rotate:
mode: "daily"
keep: "31"
pg_hba:
- type: "local"
database: "all"
user: "all"
address: ""
method: "md5"
- type: "host"
database: "all"
user: "all"
address: "127.0.0.1/32"
method: "md5"
- type: "host"
database: "all"
user: "all"
address: "::1/128"
method: "md5"
clu:
version: "9.6"
pg_super_user: "flynn"
params:
port: "5433"
listen_addresses: "*"
wal_level: "logical"
sark:
version: "9.6"
pg_super_user: "mcp"
params:
port: "5434"
listen_addresses: "*"
wal_level: "replica"
Pre flight check
The pg_clusters
dictionary comes with a possible race condition. What happens if there is a duplicate port within the dictionary?
As we are creating the clusters from scratch then two clusters will share the same port. Therefore the first to be started will be fine but the second cluster’s start-up will fail because of the port already in use.
This is not a big problem as the port can be fixed, but is not ideal.
In that case we can have the playbook to fail before executing any task within the role with the help of set_fact and assert.
We iterate over the clusters defined in pg_clusters
and using the the module set_fact we build a list of ports named pg_ports
.
On line 3 we assign to pg_ports
the value of the previous pg_ports
appending to it the value of the port for the cluster iterated.
If pg_ports
is not yet defined we default the assignment to an empty list.
On lines 7 to 11 we use assert to check that the count of pg_ports is the same of the count of unique pg_ports. If the check fails the run is interrupted. When the interruption happens is emitted the message on line 12.
1- name: Build the list of the PostgreSQL defined ports
2 set_fact:
3 pg_ports: "{{ pg_ports |default([]) + [item.value.params.port] }}"
4 with_dict: "{{ pg_clusters }}"
5
6- name: Check the port is unique within the clusters
7 assert:
8 that: >
9 pg_ports | count
10 ==
11 pg_ports | unique | count
12 msg: "Duplicated port detected in the cluster definition. Please check the variable pg_clusters in group_vars or in hosts_vars for any duplicated port."
Cluster preparation
Before initialising the cluster we need to ensure the correct locales and the cluster’s data directories are created.
The locale setup is requires two tasks.
We first execute a set_fact
looped over pg_clusters
to build a new list variable db_locale
.
We use the same variable to append the cluster’s optional locale variable or the default_locale
value.
As we start with an empty variable, db_locale
is not defined elsewere, we default it’s value to an empty list.
The list is then made of distinct values using the unique filter.
- name: build a list of encoding to be installed on the server
set_fact:
db_locale: "{{ (db_locale|default([]) + [item.value.locale|default(default_locale)])|unique }}"
with_dict: "{{ pg_clusters }}"
The second task loops over the elements of db_locale
and using the module locale_gen
configure the required locale.
- name: Ensure the locale required by the clusters exists
locale_gen:
name: "{{ item }}"
state: present
with_items: "{{ db_locale }}"
The database directories creation is a little tricky.
The variable pg_dirs
is a dictionary and should be combined with any pg_dirs
configured in each cluster defined in pg_clusters
. However if the key pg_dirs
is missing within the cluster then the content of the default pg_dirs
should be used.
We should use a nested loop for achieving this but with_nested
doesn’t work with dictionaries.
A workaround to this limitation is to loop over an include and put the creation loop inside the included file.
As we need to access the cluster’s configuration in the included file we define the db_cluster
variable with the clause loop_control
and the loop_var
assignment.
1- name: create the database directories
2 include: create_db_dir.yml
3 with_dict: "{{ pg_clusters }}"
4 loop_control:
5 loop_var: db_cluster
In the included file create_db_dir.yml
we define a task which loops the module file over a dictionary built combining
the default pg_dirs
with the value of pg_dirs
configured in the pg_clusters
, accessed with the loop_var db_cluster
.
If the value is missing then the combine defaults to pg_dirs.
The path, following the debian packaging convention, is built using the data directory value, the version’s subdirectory and the cluster’s name.
- name: creating the database directories for the cluster {{ db_cluster.key }}
file:
path: "{{ item.value }}/{{db_cluster.value.version}}/{{db_cluster.key}}/"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
state: directory
mode: 0700
with_dict: "{{ pg_dirs | combine(db_cluster.value.pg_dirs|default(pg_dirs)) }}"
Cluster initialisation
When creating a new PostgreSQL cluster with initdb some defaults are assumed if not differently specified.
In particular:
- Omitting
--encoding=
and--locale=
let initdb to use the default values derived from the environment that initdb runs in - Omitting
--username=
let initdb to use the operating system username for the default superuser name - Without providing a password for the superuser, with either
--pwfile=
or--pwprompt
, disables the password authentication method for the default superuser - Omitting
--waldir=
stores the wal directory within the data area (PostgreSQL before the version 10 uses the switch--xlogdir=
)
The debian packaging comes with an handy set of wrappers for the common database operations. The script pg_createcluster
helps to create and manage effectively the clusters inventory.
The wrapper comes with its specific options that don’t cover all the possible switches accepted by initdb. However is possible to append --
to the command line in order to pass all the extra switches required by initdb
to have full control on the cluster’s configuration.
Before running the pg_createcluster
command we generate a file with the super user password determined from the cluster’s pg_super_password
or the default pg_super_password
.
The file is created within the postgres process owner’s home and with minimal access level. Each cluster have its own dedicated super user password’s file.
- name: Create the files with the postgresql super user password
lineinfile:
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0600
dest: "{{ pg_home_dir }}/.pg_super_pwd_{{item.key}}"
line: "{{ item.value.pg_super_password|default(pg_super_password) }}"
create: yes
with_dict: "{{ pg_clusters }}"
For initialising the cluster we are using the ansible command module. This module sends and arbitrary shell command to the target machine. In our case we call pg_createcluster
looped over the dictionary pg_clusters
with the parameters. In order to avoid a long and difficult to read line, we used the block operator to split the command in multiple lines.
- In line 4 we define the locale which can be the one defined in the
pg_clusters
dictionary. - In line 5 with the switch
-u
we specify the process owner with the variablepg_osuser
. - In line 6 we specify where the log should be put using the convention
/<LOG_DIRECTORY>/<VERSION>/<CLUSTER>/postgresql-<VERSION>-<CLUSTER>.log
- In line 7 we specify the cluster’s port
- In line 8 we specify the data area using the convention
/<DATA_AREA>/<VERSION>/<CLUSTER>/
- In line 9 we specify the cluster version and the cluster name to create
- In line 10 we add the special marker
--
required to specify theinitdb
options not covered bypg_createcluster
- In line 11 we specify the
wal_area
using the switch -X which is valid for any PostgreSQL version - In line 12 we specify the
pg_super_user
- And in line 13 we specify the file where the
pg_super_password
is stored
The command module is not idempotent by default. However we can use the argument creates
to tell the module which file is created by a previous run and therefore skip the run if the file is present.
In line 15 we tell to check if the file PG_VERSION
, created by initdb
, is present.
1- name: Initialise the postgresql clusters
2 command: |
3 pg_createcluster
4 --locale {{ item.value.locale|default(default_locale) }}
5 -u {{ pg_osuser }}
6 -l {{ item.value.log_directory|default(log_directory) }}/{{item.value.version}}/{{item.key}}/postgresql-{{item.value.version}}-{{item.key}}.log
7 -p {{ item.value.params.port}}
8 -d {{ item.value.data_area|default(data_area) }}/{{item.value.version}}/{{item.key}}
9 {{item.value.version}} {{item.key}}
10 --
11 -X {{ item.value.wal_area|default(wal_area) }}/{{item.value.version}}/{{item.key}}
12 -U {{ item.value.pg_super_user|default(pg_super_user) }}
13 --pwfile={{ pg_home_dir }}/.pg_super_pwd_{{item.key}}
14 args:
15 creates: "{{ item.value.data_area|default(data_area) }}/{{item.value.version}}/{{item.key}}/PG_VERSION"
16 with_dict: "{{ pg_clusters }}"
After the clusters are created we don’t need the password file anymore, so we’ll remove them.
- name: Remove the the files with the postgresql super user password
file:
path: "{{ pg_home_dir }}/.pg_super_pwd_{{item.key}}"
state: absent
with_dict: "{{ pg_clusters}}"
Configuring the postgresql.conf and pg_hba.conf
PostgreSQL since version 9.4 supports the ALTER SYSTEM statement which allow to change the configuration using a normal database connection. However in our example we’ll use the the PostgreSQL configuration file’s include_dir directive.
To do so we need to use three tasks.
First with the file
module we create the directory conf.d in the PostgreSQL configuration folder which is located in /etc/postgresql/<VERSION>/<CLUSTER>
- name: Ensure the conf.d directory is present in the configuration folder
file:
path: "/etc/postgresql/{{item.value.version}}/{{item.key}}/conf.d"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0744
with_dict: "{{ pg_clusters}}"
Then with the lineinfile
module we ensure the conf.d
path is set as include in postgresql.conf
.
- name: Ensure the conf.d path is set as include in postgresql.conf
lineinfile:
path: "/etc/postgresql/{{item.value.version}}/{{item.key}}/postgresql.conf"
regexp: "^include_dir = 'conf.d'"
line: "include_dir = 'conf.d'"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0644
with_dict: "{{ pg_clusters}}"
Finally we ship the custom postgresql.conf in conf.d using the template postgresql.conf.j2 in the role’s templates folder. To the destination file we add the prefix 01 in order to determine the file’s load order.
- name: Ship the custom postgresql.conf in conf.d
template:
src: postgresql.conf.j2
dest: "/etc/postgresql/{{item.value.version}}/{{item.key}}/conf.d/01postgresql.conf"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0644
with_dict: "{{ pg_clusters}}"
The template combines the dictionary with the default parameters stored in the pgsql
role’s defaults/main.yml
and the params
dictionary provided by the cluster processed. The result is assigned to a new dictionary variable cluster_params
.
Then for each parameter in the dictionary cluster_params
the template sets the parameter in the form required by the postgresql.conf file.
{% set cluster_params = params|combine(item.value.params) %}
{%for parameter in cluster_params %}
{{ parameter }}='{{ cluster_params[parameter] }}'
{% endfor %}
Shipping the pg_hba.conf
follows the same method of the postgresql configuration.
We use a template which is put in the location /etc/postgresql/<VERSION>/<CLUSTER>
.
- name: Ship the pg_hba.conf in cluster's config directory
template:
src: pg_hba.conf.j2
dest: "/etc/postgresql/{{item.value.version}}/{{item.key}}/pg_hba.conf"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0644
with_dict: "{{ pg_clusters}}"
This time the template pg_hba.conf.j2
stores in the variable cluster_hba
the dictionary stored in the cluster definition without combining the dictionaries.
If there is a pg_hba defined for the cluster will use it. Otherwise will default to the variable pg_hba
defined in the role’s defaults/main.yml
.
The variable is defined as a standard pg_hba.conf created by pg_createcluster.
For each hba defined in cluster_hba
a new line is added to the pg_hba.conf
file.
{% set cluster_hba = item.value.pg_hba|default(pg_hba) %}
{%for hba in cluster_hba %}
{{ hba.type }} {{ hba.database }} {{ hba.user }} {{ hba.address }} {{ hba.method }}
{% endfor %}
Logrotate Configuration
Having the logs in a custom location requires a proper logrotate setup otherwise the filesystem will fill up.
We just need to ship a logrotate configuration in the location pointed by the variable logrotate_file
.
- name: Ship the logrotate configuration to the servers
template:
src: logrotate_pgsql.j2
dest: "{{ logrotate_file }}"
owner: "{{ pg_osuser }}"
group: "{{ pg_osgroup }}"
mode: 0644
The template contains a for loop over all the clusters configured in pg_clusters
.
Inside the loop the default logrotate configuration, stored in the variable log_rotate
is combined with the optional cluster’s log_rotate
and assigned to a new variable rotate_conf
.
The variable cluster_dirs
is assigned in a similar way using the default variable pg_dirs
and the cluster’s specific configuration.
The logrotate entry is then built using the cluster_dirs.log_directory
location, the cluster’s version and the cluster’s name.
The rotate_conf
mode
and keep
values are used to set the logrotate strategy and number of logs to keep after the rotation.
{%for cluster in pg_clusters %}
{% set rotate_conf = log_rotate|combine(pg_clusters[cluster].log_rotate|default(log_rotate)) %}
{% set cluster_dirs = pg_dirs|combine(pg_clusters[cluster].pg_dirs|default(pg_dirs)) %}
# {{cluster}}
{{ cluster_dirs.log_directory }}/{{pg_clusters[cluster].version}}/{{cluster}}/*.log {
{{ rotate_conf.mode }}
rotate {{ rotate_conf.keep }}
copytruncate
delaycompress
compress
notifempty
missingok
su root root
}
{% endfor %}
Local authentication
The final step before starting the postgres service is to configure a local connection using the pgpass and pg_service files.
With this configuration, if allowed by pg_hba.conf
, it will be possible to login into the database clusters with just specifying the service name.
This configuration applies only to the pg_osuser
for security reasons.
We are using again the module blockinfile
.
We first populate the file {{ pg_home_dir }}/.pgpass
accessible only from the PostgreSQL’s process owner.
The file is build looping over the dictionary pg_clusters
creating a line per each configured cluster.
Each block consists of two lines, one for the localhost another for any other host or local socket.
- name: populate the pgpass file with the postgres super users connection data
blockinfile:
dest: "{{ pg_home_dir }}/.pgpass"
create: yes
state: present
owner: "{{ pg_osuser }}"
mode: 0600
block: |
localhost:{{ item.value.params.port }}:*:{{ item.value.pg_super_user|default(pg_super_user) }}:{{ item.value.pg_super_password|default(pg_super_password) }}
*:{{ item.value.params.port }}:*:{{ item.value.pg_super_user|default(pg_super_user) }}:{{ item.value.pg_super_password|default(pg_super_password) }}
marker: "# {mark} ANSIBLE MANAGED BLOCK SUPER USER FOR cluster {{ item.key }}:{{ item.value.params.port }}"
with_dict: "{{ pg_clusters }}"
In a similar way we populate the service file in {{ pg_home_dir }}/.pg_service.conf
looping over pg_clusters
.
- name: populate the pg_service file with the postgres super users connection data
blockinfile:
dest: "{{ pg_home_dir }}/.pg_service.conf"
create: yes
state: present
owner: "{{ pg_osuser }}"
mode: 0600
block: |
[{{ item.key }}]
host=localhost
port={{ item.value.params.port }}
user={{ item.value.pg_super_user|default(pg_super_user) }}
dbname=template1
marker: "# {mark} ANSIBLE MANAGED BLOCK SUPER USER FOR cluster {{ item.key }}:{{ item.value.params.port }}"
with_dict: "{{ pg_clusters }}"
Starting the postgres service
The final step of the role is to start the postgresql service with the service
module.
- name: Start the postgresql service
service:
name: postgresql
state: started
The rollback role
The rollback role adds a new include file rollback_pgsql.yml
which is activated when the variable rbk_pgsql
or rbk_all
are set to True
.
The include file drops the configured clusters after stopping it, then removes the database directories with a looped include destroy_db_dir.yml
in a similar way the directories are created. Then the logrotate configuration is removed.
---
- name: Drop the postgresql clusters
command: "pg_dropcluster {{item.value.version}} {{item.key}} --stop"
with_dict: "{{ pg_clusters}}"
ignore_errors: Yes
- name: remove the database directories
include: destroy_db_dir.yml
with_dict: "{{ pg_clusters }}"
loop_control:
loop_var: db_cluster
- name: remove the logrotate files
file:
path: "{{ logrotate_file }}"
state: absent
Contents of destroy_db_dir.yml
- name: removing the database directories for the cluster {{ db_cluster.key }}
file:
path: "{{ item.value }}/{{db_cluster.value.version}}/"
state: absent
force: yes
with_dict: "{{ pg_dirs| combine(db_cluster.value.pg_dirs|default(pg_dirs)) }}"
The playbook in action
This asciinema shows the playbook in action. We are using an extra parameter to tell ansible-playbook to ask for the vault password, necessary to decrypt the super user password.
ansible-playbook –vault-id default@prompt setup.yml –extra-vars=“no_dns=True”
The final part of the recording shows how each cluster have different super users and how the cluster tron is using the en_US.UTF-8
character encoding.
Wrap up
Configuring PostgreSQL requires a lot of steps and great attention. The complexity of the pgsql
role shows how using a tool like ansible is a far better approach rather doing everything by hand.
On the next post we’ll setup pgbackrest with ansible for taking periodical backup of our newly configured clusters.
Thanks for reading!