Post Archive

This is the archive of the old blog hosted on blogger. The old blog is still available on the url https://4thdoctordba.blogspot.com

This is almost the entire chapter 11. I’m still writing the final section, I’d like to put into a separate post though. I’ve also almost finished the restore’s performance. After this the book is complete. I will start a review to make it a decent writing before publishing onto lulu.com and amazon kindle.

I’m not sure amazon permits to sell books for free I’ll find a solution anyway.

A couple of things to know before start coding…

This chapter is completely different from the rest of the book. It’s dedicated to the developers. PostgreSQL is a fantastic infrastructure for building powerful and scalable applications. In order to use al its potential there are some things to consider. In particular if coming from other DBMS there are subtle caveats that can make the difference between a magnificent success or a miserable failure.

Foreign keys

A foreign key is a constraint enforced using the values another table’s field. The classical example is the tables storing the addresses and cities. We can store the addresses with the city field, inline.

Being the city a duplicated value over many addresses, this will cause the table bloat by storing long strings, duplicated many and many times, alongside with the the address. Defining a table with the cities and referencing the city id in the addresses table will result in a smaller row size.

I’ve started the sixth chapter, the one on the data integrity I’ve forgotten. There are the first two parts alongside with the introduction. I’ve also updated the book on slideshare with the new cover and the last incomplete chapter for the developers. The beautiful cover is made by Chiaretta & Bon. Kudos and many thanks.

I’ve also uploaded the latex sources on github for anybody to fork and review my crappy english. My former colleague and friend Craig Barnes already started reviewing the tex files, many thanks for the priceless help.

Here’s the github repository url : https://github.com/the4thdoctor/pgdba_books

Data integrity

There’s just one thing worse than losing the database. Having the data set full of rubbish. The data integrity has been part of PostgreSQL since the beginning. It offers various levels of strength ensuring the data is clean and consistent. In this chapter we’ll have a brief look to the various constraints available. The PostgreSQL’s constraints can be grouped in two kind. The table constraints and the column constraints. The table constraints are defined on the table’s definition after the field’s list. The column constraints appear in the field’s definition after the data type. Usually for the primary keys and the unique keys the definition is written as table constraint.
The constraint applies the enforcement to any table’s row without exclusion. When creating a table constraint on a fully populated table the data is validated first. Any validation error aborts the constraint creation. However, the foreign keys and check constraints accept the clause NOT VALID. With this clause the database assumes the data is valid and skips the validation. The cration is almost immediate. The new constraint is then enforced only for the new data. When using this option the data must be consistent.

The three binary formats supported by pg_dump are the custom, the directory and the tar format. The first two can be accessed randomly by the restore program and have the parallel restore support, being the best choice for having a flexible and reliable restore. Before the the 9.3 the only format supporting the parallel restore was the custom. With this version the directory format accepts the -j switch. This feature, combined with the parallel dump seen in 9.3 is a massive improvement for saving big amount of data. The tar format does have the limit of 12 GB in the archive’s file size and doesn’t offer the parallel restore nor the selective restore.

Finally I’ve found some time to complete a working prototype of the new library pg_chamelion.

The github repos is here, https://github.com/the4thdoctor/pg_chameleon, please fork it if you want to debug or give me some feedback.

The library exports the metadata from mysql using sqlalchemy. The informations are used by the PostgreSQL library to rebuild the schema in a PostgreSQL database. Finally the data is dumped to multiple files in CSV format and reloaded into PostgreSQL using the copy_expert command.

Post Archive

Chapter 11 - A couple of things to know before start coding...

I’m not sure amazon permits to sell books for free I’ll find a solution anyway.

A couple of things to know before start coding…

Chapter 6 final parts. Foreign, check and null constraints

Foreign keys

The missing chapter 6 part 1 and two, data integrity

I’ve also uploaded the latex sources on github for anybody to fork and review my crappy english. My former colleague and friend Craig Barnes already started reviewing the tex files, many thanks for the priceless help.

Here’s the github repository url : https://github.com/the4thdoctor/pgdba_books

Data integrity

Chapter 10 part 2 - The binary formats

Pg chamelion and dba reactions