Cary's Blog

Replication-Failover-with-pg_rewind-in-PG12

1. Overview

In the previous blog, we have discussed how to correctly set up streaming replication clusters between one master and one slave in Postgres version 12. In this blog, we will simulate a failover scenario on the master database, which causes the replica (or slave) database cluster to be promoted as new master and continue the operation. We will also simulate a failback scenario to reuse the old master cluster after the failover scenario with the help of pg_rewind.

Normally it is quite easy to do a failback to the old master after slave gets promoted to master but if there is data written to the old master after slave promotion, we will have an out-of-sync case between them both and we will have to use the pg_rewind tool to synchronize the two data directories in order to bring the old master to match the state of the new master. Please note that the pg_rewind tool will remove transactions from the old master in order to match up with the new, so certain pre-caution is needed to use this tool.

Here’s a brief overview of list of actions we are going to perform:

simulate failover by promoting slave cluster, so it becomes a new master
simulate data insertion to master cluster, also referred as old master after promotion
shutdown the old master cluster and set it up as a standby server
run pg_rewind on old master to synchronize transaction states with new master
bring up old master as a standby server to synchronize with the new master

This blog assumes you already have streaming replication setup between one master and one slave from previous blog. If you have not checked out the previous blog titled “Streaming Replication Setup in PG12 - How to Do it Right”, it is recommended to give that a read first.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04