The maze of the elephant

Wednesday, 2018-05-09

In the previous post we introduced the PostgreSQL’s dependency system.

At first sight the implementation can look like a maze where the succession of relationships are not clear.

This post will try to give a practical example to show how pg_depend can act like an Ariadne’s thread in order to resolve the dependencies.

The scenario presented is very simple but can be used as a starting point for more complex requirements.

Speeding up your data load

The general audience perceive the indices as a good thing. They improve the speed when reading the data by limiting or removing the need of expensive table scans. But this is a double edge knife. Differently from the tables the indices are usually made of ordered structures. Any update on the index is usually more expensive than the corresponding table’s change.

The bottleneck becomes clear in a specific case, when there is a bulk data load. Tables without indices or keys perform much better during the data load.

However, because of the PostgreSQL’s dependency system removing an index or a constraint is not always a simple operation. In particular if we need to remove a primary or unique key which is referenced by a foreign key the drop will fail.

Of course we can still use the CASCADE clause but this will cost us losing the key’s definition and the risk of removing objects that we don’t want to remove.

Let’s use a practical example where we have three tables with some relationships established.

db_test=# CREATE TABLE people (id_people serial, forename text, surname text, CONSTRAINT pk_people PRIMARY KEY(id_people));
CREATE TABLE

db_test=# CREATE TABLE address (id_address serial, address text, CONSTRAINT pk_address PRIMARY KEY(id_address));
CREATE TABLE


b_test=# CREATE TABLE people_address (id_people integer NOT NULL,id_address integer NOT NULL, CONSTRAINT pk_people_address PRIMARY KEY (id_people,id_address));
CREATE TABLE

db_test=# ALTER TABLE people_address ADD CONSTRAINT fk_people FOREIGN KEY (id_people) REFERENCES people(id_people) ON DELETE RESTRICT ON UPDATE RESTRICT;
ALTER TABLE

db_test=# ALTER TABLE people_address ADD CONSTRAINT fk_address FOREIGN KEY (id_address) REFERENCES address(id_address) ON DELETE RESTRICT ON UPDATE RESTRICT;
ALTER TABLE

In this simple schema we have two tables used for storing people’s names and addresses. A third relation people_address have two foreign keys enforcing a relation one to many to the tables people and address.

All three tables have a PRIMARY KEY which have an implicit unique index associated.

The presence of the foreign keys, when loading bulk data, can also cause speed issues, because of the foreign key validation or abort the load in the case of foreign key violation.

Dropping the constraints before the data load and recreating them will ensure our data is rapidly loaded without the risk of abort. As we are removing the constraints which preserve the data consistency we must ensure that the loaded data is consistent at the source.

But, if we try to drop one of the primary keys, the dependent FOREIGN KEY blocks the operation.

db_test=# ALTER TABLE people DROP CONSTRAINT pk_people;
ERROR:  cannot drop constraint pk_people on table people because other objects depend on it
DETAIL:  constraint fk_people on table people_address depends on index pk_people
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

One approach, quick and dirt, could be to write a procedure with the names of the keys hard coded. This is reasonable if the procedure is a one shot. However, if we need to load the data on a regular basis, using hard coded names gives no flexibility when, for example, the key names are changed.

The Ariadne’s thread

Our loading procedure should consist of three steps.

Dropping the foreign and primary keys without using the CASCADE clause
Loading the data as fast as possible
Create the primary and foreign keys

For point 2 using the copy command is a very good idea. The data is loaded efficiently and in a single transaction.

For points 1 and 3 all we need is to write a view with some Swiss army knife SQL inside.

CREATE OR REPLACE VIEW create_drop_cons
AS
	SELECT DISTINCT
		format('ALTER TABLE %I.%I ADD CONSTRAINT %I %s ;',sch.nspname,tab.relname ,conname,pg_get_constraintdef(con.oid)) AS concreate,
		format('ALTER TABLE %I.%I DROP CONSTRAINT %I ;',sch.nspname,tab.relname ,conname)	AS condrop,
		CASE
			WHEN contype='p'
			THEN 0
			WHEN contype='f'
			THEN 1
		END AS create_order,
		CASE
			WHEN contype='p'
			THEN 1
			WHEN contype='f'
			THEN 0
		END AS drop_order,
		tab.relname as tablename,
		sch.nspname as schemaname

	FROM
		pg_class tab
		INNER JOIN pg_namespace sch
			ON sch.oid=tab.relnamespace
		INNER JOIN pg_depend dep
			ON tab.oid=dep.refobjid
		INNER JOIN pg_constraint con
			ON
				con.oid=dep.objid
			AND	con.connamespace=tab.relnamespace
			AND	con.conrelid=tab.oid
	WHERE 	

			dep.classid::regclass='pg_constraint'::regclass
		AND 	con.contype in ('p','f')
		AND	dep.deptype IN ('n','a')
;

Let’s analyse the view definition in the details.

The FROM clause joins the relations pg_class,pg_namespace,pg_depend,pg_constraint.
The WHERE condition filters the pg_depend’s records only for class pg_constraint and only for normal and automatic dependencies. Another filter on pg_constaint ensures we are searching only primary and foreign keys.
The SELECT clause, using the format function builds the constraint drop and create statements. In the create statements the constraint definition is generated using the function pg_get_constraintdef.
Two CASE constructs are used to build the order for the drop and the create statements.
The fields relname and nspname are exposed by the view in order to allow us to filter by table and schema.
The SELECT clause requires a DISTINCT as multi column constraints in pg_depend will generate a Cartesian product.

Once we have created the view generating the drop and create statements requires just a couple of queries.

db_test=# SELECT
        condrop
FROM
        create_drop_cons
WHERE
                tablename IN ('people','address','people_address')
        AND     schemaname='public'
ORDER BY drop_order
;
                                condrop                                
-----------------------------------------------------------------------
 ALTER TABLE public.people_address DROP CONSTRAINT fk_address ;
 ALTER TABLE public.people_address DROP CONSTRAINT fk_people ;
 ALTER TABLE public.address DROP CONSTRAINT pk_address ;
 ALTER TABLE public.people DROP CONSTRAINT pk_people ;
 ALTER TABLE public.people_address DROP CONSTRAINT pk_people_address ;
(5 rows)


db_test=# SELECT
        concreate
FROM
        create_drop_cons
WHERE
                tablename IN ('people','address','people_address')
        AND     schemaname='public'
ORDER BY create_order
;
                                                                          concreate                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 ALTER TABLE public.address ADD CONSTRAINT pk_address PRIMARY KEY (id_address) ;
 ALTER TABLE public.people ADD CONSTRAINT pk_people PRIMARY KEY (id_people) ;
 ALTER TABLE public.people_address ADD CONSTRAINT pk_people_address PRIMARY KEY (id_people, id_address) ;
 ALTER TABLE public.people_address ADD CONSTRAINT fk_address FOREIGN KEY (id_address) REFERENCES address(id_address) ON UPDATE RESTRICT ON DELETE RESTRICT ;
 ALTER TABLE public.people_address ADD CONSTRAINT fk_people FOREIGN KEY (id_people) REFERENCES people(id_people) ON UPDATE RESTRICT ON DELETE RESTRICT ;
(5 rows)

It’s important to understand that we need to generate both create and drop scripts before dropping the constraints, otherwise we’ll lose the definitions after the drop.

A simple way to automate everything is using the psql facilities.

\a
\t
\o drop_constraints.sql
SELECT
        condrop
FROM
        create_drop_cons
WHERE
                tablename IN ('people','address','people_address')
        AND     schemaname='public'
ORDER BY drop_order
;
\o create_constraints.sql
SELECT
        concreate
FROM
        create_drop_cons
WHERE
                tablename IN ('people','address','people_address')
        AND     schemaname='public'
ORDER BY create_order
;

Executing the script with the client psql will generate the scripts automatically.

Wrap up

This simple scenario explains how to take advantage of the PostgreSQL’s dependency system. The view can be improved in many ways.

It should return the CREATE/DROP statements of the eventual indices and views depending on the affected tables. Also writing a recursive query could make the view capable of generating multiple levels of dependencies.

Finally, as a source of inspiration I warmly suggest to have a look to the source code of pg_dump which uses the dependency system to determine correct order to use when generating the schema database dumps.

Thanks for reading.