Adventures in Scaling, Part 3: PostgreSQL Streaming Replication

In my last post in this series, I overviewed how to get PostgreSQL setup, running and properly tuned on your system. This works well to setup a primary database for your system but depending on your needs, you may find yourself in a position where you would like to start taking advantage of the Streaming Replication and High Availability built into PostgreSQL 9.

When running a database such as PostgreSQL on a server, a single primary master database that performs all reads and writes for your applications will likely not be able to sustain your traffic forever. At a certain point, your application will generate too many concurrent reads and writes, you’ll want to be able to recover from database failure, or for whatever reason your database will need help scaling.

Hot Standby? High Availability?

On a high level, the challenge with scaling out a database is a problem of synchronization. With a single database, there is a single source of truth for your application. All records are in one place which acts as the authority for your data. However, once you begin scaling your database, you have multiple servers all responsible for your data. As data is created, updated and destroyed, this information must necessarily be propagated as quickly as possible to all other servers as well. If servers are out of sync, data can be mis-reported or outright lost. Therefore, each database must be in a consistent state with the others with little margin for error. Since no particular approach solves this problem perfectly for all needs, instead there are many strategies for scaling a database.

The approach you should take as load increases has a lot to do with the specific characteristics of your application. Is your application read-heavy? or write-heavy? If your application has many more reads by volume, then the most common strategy is to setup secondary databases that synchronize data with the primary database. This strategy of setting up read-only databases that track changes to the primary is only part of the scaling story. There will be also be cases where you have too many writes for a single box (multi-master) or you need to distribute data across many boxes (sharding), but this post will focus on solving the problem of simply scaling reads. Look for a future post on more advanced scaling and replication strategies.

Fortunately, the strategy for scaling reads is fairly straightforward. The primary database that accepts writes is called the “master” and all other databases that can only help with the reads are called “slaves”. Typically, you can setup as many slaves as you’d like to help distribute the querying load on the database. In addition, in some database setups, these slaves can also help in cases of database/server failure. In the event that the master database fails or becomes unavailable, a slave in a good position can elect to take over as the master so that your application can still accept and retrieve data without downtime. In this scenario, the slaves that can be elected to be masters are called hot standbys and this concept is a strategy contributing towards achieve high availability which means that your database is robust against failure.

Understanding Replication

To understand how to achieve high availability and how multiple PostgreSQL databases keep synchronized, one must understand the concept of “continuous archiving” and “write-ahead logs”. The key concept of high availability has to do with the database being resilient to server failure. Specifically, in the event that one database server goes down, another one should be ready to step up and assume the responsibilities. This secondary server that must be ready to step up at any time is called a “standby”.

In order to achieve this synchronization, the primary and standby servers work together. The primary server is continuously archiving the data coming in or being modified, while each standby server operates in continuous recovery mode, reading the archives from the primary. This “continuous archive” is a comprehensive log of all activity coming into the database (every table created, every row inserted, et al). This comprehensive log is called the “write-ahead log” or WAL and is essential to the replication process.

WAL is described well in the PostgreSQL user’s guide:

Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. WAL’s central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records.

As logs are being written by the master, these log (WAL) records are then transferred to other database servers that act as standbys. Generally the best way for this process to happen is to streams WAL records to the standby as they’re generated, without waiting for the WAL file to be completely filled. This process of moving the WAL records (in small chunked segments) in this way is known as “Streaming Replication”. One important aspect to note about this process is that transferring happens asynchronously. This means that the WAL records are shipped after a transaction has been committed so there is a small delay between committing a transaction in the primary and for the changes to become visible in the standby. However, “streaming replication” has delays generally under one second unlike older transfer strategies.

Setting This Up

Great, now that we have covered the high-level concepts, let’s jump into how to setup hot standby streaming replication step-by-step. This guide assumes you have a UNIX server provisioned (Debian-based) and have already followed our guide to setup PostgreSQL on that machine.

The first thing to do to setup a standby is to provision another server (preferably same specs as the primary) and follow the steps to setup the basic PostgreSQL server again as similarly to the primary database as possible. Ideally, the standby is essentially a clone of your first server, so if you can duplicate the server, feel free to do that as a time saver. These database servers should ideally be part of the same local network and communicate through their private IPs.

Master Setup

Next, we need to setup your master such that the database can connect to your standby machines in /etc/postgresql/9.1/main/pg_hba.conf:

# /etc/postgresql/9.1/main/pg_hba.conf
host   replication      all             SLAVEIP/32              trust

Be sure to replace SLAVEIP with the correct IP address for the standby. Next, it is a good practice to create the empty WAL directories on both the master and the slaves:

master$ sudo mkdir /var/lib/postgresql/9.1/wals/
master$ sudo chown postgres /var/lib/postgresql/9.1/wals/
slave$ sudo mkdir /var/lib/postgresql/9.1/wals/
slave$ sudo chown postgres /var/lib/postgresql/9.1/wals/

You should also be sure that the master has easy keyless SSH access to the slave:

su - postgres
ssh-keygen -b 2048
ssh-copy-id -i ~/.ssh/

Now that your master can easily access your slave, you will need to enable streaming replication and WAL logging for your database in /etc/postgresql/9.1/main/postgresql.conf:

# /etc/postgresql/9.1/main/postgresql.conf
# ...
# Listen allows slaves to connect
listen_addresses = 'MASTERIP, localhost'
wal_level = hot_standby
wal_keep_segments = 32 # is the WAL segments to store
max_wal_senders = 2 #  is the max # of slaves that can connect
archive_mode    = on
archive_command = 'rsync -a %p </dev/null'

Now create a user that the slave can use to connect to master:

> SELECT pg_switch_xlog();
> CREATE USER replicator WITH SUPERUSER ENCRYPTED PASSWORD 'secretpassword12345';

At this point you should probably restart PostgreSQL:

sudo /etc/init.d/postgresql restart

In order for the standby to start replicating, the entire database needs to be archived and then reloaded into the standby. This process is called a “base backup”, and can be performed on the master and then transferred to the slave. Let’s create a snapshot to transfer to the slave by capturing a binary backup of the entire PostgreSQL data directory.

su - postgres
psql -c "SELECT pg_start_backup('backup', true)"
rsync -av --exclude --exclude pg_xlog /var/lib/postgresql/9.1/main/
psql -c "SELECT pg_stop_backup()"

This will transfer all the binary data from the PostgreSQL master to the slave. Next, time to setup the standby machine to begin replication.

Standby Setup

First, let’s shutdown the database on the standby:

sudo /etc/init.d/postgresql stop

Let’s enable the slave as a hot standby:

# /etc/postgresql/9.1/main/postgresql.conf
# ...
wal_level = hot_standby
hot_standby = on

and modify the recovery settings:

# /var/lib/postgresql/9.1/main/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=MASTERIP port=5432 user=replicator password=secretpassword1234'
trigger_file = '/tmp/pgsql.trigger'
restore_command = 'cp -f /var/lib/postgresql/9.1/wals/%f %p </dev/null'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/9.1/wals/ %r'

One interesting option in the file above that is worth noting is the trigger_file parameter which tells the standby when to become the master. In the event of a machine failure and that file becomes present, the slave will be triggered and become the master. In the event that this standby is one day elected as the master, you need to ensure certain files are in place. Let’s clone the connections listed on master:

# /etc/postgresql/9.1/main/pg_hba.conf
host   replication      all             MASTERIP/32              trust
host   replication      all             SLAVEIP/32              trust

At this point, the slave should be ready to start accepting WAL records and replicating all data from the primary. Let’s restart the database:

sudo /etc/init.d/postgresql start


Check the logs to make sure that the connection was properly established on the slave (and/or master):

# /var/log/postgresql/postgresql-9.1-main.log
LOG:  entering standby mode
LOG:  streaming replication successfully connected to primary

If you see those two lines, then this means replication has started successfully. If not, follow the error messages and fix the configuration until you can restart and see these lines.

You can also check certain processes on the machines to ensure replication is established:

master$ ps -ef | grep sender
slave$ ps -ef | grep receiver

If both machines have the expected process, then that means that replication is running along happily.

Wrapping Up

At this point, replication should be established on the standby machine and you can repeat this process for any number of additional standbys. Once a standby is established, the WAL records should be continuously streamed to the machine keeping the data synchronized with the primary. In the event of a disconnection, the standby is generally resilient enough to backfill data and catch back up to the primary.

Once the standbys are in place, since they have been configured as ‘hot’ they can actually be used easily as read slaves. Simply setup your application to read from these slaves but only write to the master. If you are using Rails, one good gem to support this concept of standbys is called Octopus.

Hopefully you have found this post useful as a straightforward guide to understanding and setting up hot standby streaming replication. In a future post, we hope to cover more advanced replication strategies as well as a detailed guide from how to do a large data migration from MySQL to PostgreSQL. If you run into any problems or spot errors in this guide, please let us know so we can update the steps.

11 thoughts on “Adventures in Scaling, Part 3: PostgreSQL Streaming Replication”

  1. Great Article!

    Two questions… in the master configuration file do you really put SLAVEIP on the listen_address as you have above?
    The server isn’t using that IP (it’s on the slave server for obvious reasons..)

    Is there a way to make the failover from master to slave without needing to change the IP address on the clients connecting to Postgres?


  2. @AM

    I think that SLAVEIP is probably a typo as you said.

    As for failing over automatically, you could have your clients connect to a floating IP that you move to the slave when the master dies (see keepalived or heartbeat or similar). I think HAProxy and pgpool-II may also be options.

  3. Great article! I’m starting implementing now streaming replication in our system, and there is just a pair of things that I’d like to understand better: in case I have more than one slave, all set in hot standby, how postgres decides which will replace the master? And in case one master will not be enough, does streaming replication support master-master (each one with attached multiple slaves)?
    One last thing: is there any good tool to manage the machines?
    Thanks a lot!

  4. Everything worked like a charm, very good step-by-step tutorial. With one exception, however. On Ubuntu 12.04 server, I had to put the replication.conf in /var/lib/postgresql/9.1/main and not in /etc as I expected. I think this is Ubuntu-specific, but can be annoying if you don’t know about it.

  5. Thank you for the tutorial, had only a couple of small problems, root user should be replaced with postgres (for which I first needed to set up a password) and when doing rsync, I had to remove “–exclude pg_xlog” part from rsynv command, otherwise the standby server would complain about it missing.

  6. Using /tmp/pgsql.trigger allows any user on the system to trigger a switch to master mode. I would be a little uncomfortable with that on my systems.

    Perhaps the trigger would be better placed in a location without global write permission.

Comments are closed.