1. Overview
In the previous blog, we have discussed how to correctly set up streaming replication clusters between one master and one slave in Postgres version 12. In this blog, we will simulate a failover scenario on the master database, which causes the replica (or slave) database cluster to be promoted as new master and continue the operation. We will also simulate a failback scenario to reuse the old master cluster after the failover scenario with the help of pg_rewind.
Normally it is quite easy to do a failback to the old master after slave gets promoted to master but if there is data written to the old master after slave promotion, we will have an out-of-sync case between them both and we will have to use the pg_rewind tool to synchronize the two data directories in order to bring the old master to match the state of the new master. Please note that the pg_rewind tool will remove transactions from the old master in order to match up with the new, so certain pre-caution is needed to use this tool.
Here’s a brief overview of list of actions we are going to perform:
- simulate failover by promoting
slave
cluster, so it becomes anew master
- simulate data insertion to
master
cluster, also referred asold master
after promotion - shutdown the
old master
cluster and set it up as a standby server - run pg_rewind on
old master
to synchronize transaction states withnew master
- bring up
old master
as a standby server to synchronize with thenew master
This blog assumes you already have streaming replication setup between one master and one slave from previous blog. If you have not checked out the previous blog titled “Streaming Replication Setup in PG12 - How to Do it Right”, it is recommended to give that a read first.
The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04
2. Simulate a Failover Case
We will simply promote the slave database cluster to simulate a failover.
$ pg_ctl promote -D db-slave |
As seen above, After slave
gets promoted, it switches to a new timeline for future data operations. At this point the master
and slave
are no longer streaming WAL files from each other and we essentialyl have two independent database clusters running. We will call them old master
and new master
in the following sections instead so it is clear.
3. Insert Some Data to the Old Master
We would like to create a data out-of-sync case by inserting some more data to the old master
cluster.
$ psql -d clusterdb -U cary -c "INSERT INTO test_table(x) SELECT y FROM generate_series(1, 100) a(y);" -p 5432 |
Check that both the old master
and new master
are clearly out of sync:
## new master ## |
4. Configure the Old Master as Standby Server to Sync with New Master
Now we would like to attempt a failover to make the old master
as a standy server to syncrhonize with the new master
.
Let’s shutdown the old master
cluster.
$ pg_ctl -D db-master stop |
Let’s update postgresql.conf in the old master
:
recovery_target_timeline = 'latest' |
the primary_conninfo
tells the old master
to stream WAL files from the new master
located at 127.0.0.1:5433.
Also, do not forget to touch the standby.signal
file to tell the cluster to run in standby mode:
touch db-master/standby.signal |
We specified in the old master
to connect to primary_slot_name
= main
. Let’s create the matching replication slot on the new master
.
$ psql -d clusterdb -U cary -c "select * from pg_create_physical_replication_slot('main');" -p 5433 |
Now the new master
has a matching replication slot called main
and it is not active at the moment. Now we are ready to start the old master
as standby server
5. Start the Old Master as Standby Server
Now, we are ready to start the old master
as a standby:
$ pg_ctl -D db-master start |
As you can see above, the old master
refuses to start because there is a timeline difference between the old master
and the new master
. This is caused by the additional data insertions that happens to the old master
after the promotion event in step number 3.
This is where pg_rewind comes handy in situation like this, to synchronize the two clusters.
6. Use pg_rewind to Synchronize the two Clusters
Now, let’s synchronize the two database clusters with pg_rewind.
$ pg_rewind --target-pgdata=db-master --source-server="port=5433 user=cary dbname=clusterdb" --progress |
After pg_rewind is finished, we will have to edit once more the configuration of the old master
because the tool copies most of the configuration settings from the new master
to the old master
as a synchronization process. Let’s examine both db-master/postgresql.auto.conf
and db-master/postgresql.conf
and make sure of the followings again.
############# db-master/postgresql.conf ############# |
and also, don’t forget about this:
touch db-master/standby.signal |
Now, we should be ready to start the old master
again.
7. Start the Old Master Agian as Standby Server
$ pg_ctl -D db-master start |
The old master
can now start as a streaming replication to the new master
and we can observe that after using pg_rewind the additional data that was inserted to old master
in step number 3 is now removed, as it has been rewound
from 300 entries to 200 entries to match up with the new master
.
## new master ## |
8. Summary
In this blog, we have simulated a failover case and observe the effect of promoting a standby slave
server while more data insertions happening to the original master
server. We have demonstrated how to use pg_rewind
tool to synchronize both master
and slave
after the slave
promotion. Though it results some data deletion at the original master
, in the end, we are able to resolve the timeline conflict with pg_rewind
and complete the database failover scenario.