Cary's Blog

Logical Replication Between PostgreSQL and MongoDB

1. Introduction

PostgreSQL and MongoDB are two popular open source relational (SQL) and non-relational (NoSQL) databases available today. Both are maintained by groups of very experienced development teams globally and are widely used in many popular industries for adminitration and analytical purposes. MongoDB is a NoSQL Document-oriented Database which stores the data in form of key-value pairs expressed in JSON or BSON; it provides high performance and scalability along with data modelling and data management of huge sets of data in an enterprise application. PostgreSQL is a SQL database designed to handle a range of workloads in many applications supporting many concurrent users; it is a feature-rich database with high extensibility, which allows users to create custom plugins, extensions, data types, common table expressions to expand existing features

I have recently been involved in the development of a MongoDB Decoder Plugin for PostgreSQL, which can be paired with a logical replication slot to publish WAL changes to a subscriber in a format that MongoDB can understand. Basically, we would like to enable logical replication between MongoDB (as subscriber) and PostgreSQL (as publisher) in an automatic fashion. Since both databases are very different in nature, physical replication of WAL files is not applicable in this case. The logical replication supported by PostgreSQL is a method of replicating data objects changes based on replication identity (usually a primary key) and it would be the ideal choice for this purpose as it is designed to allow sharing the object changes between PostgreSQL and multiple other databases. The MongoDB Decoder Plugin will play a very important role as it is directly responsible for producing a series of WAL changes in a format that MongoDB can understand (ie. Javascript and JSON).

In this blog, I would like to share some of my initial research and design approach towards the development of MongoDB Decoder Plugin.

Steaming-Replication-Setup-in-PG12-How-to-do-it-right

1. Overview

PostgreSQL 12 has been considered as a major update consisting of major performance boost with partitioning enhancements, indexing improvements, optimized planner logics and several others. One of the major changes is noticeably the removal of recovery.conf in a standby cluster. For this reason, the procedure to set up a streaming replication clusters has changed, and in this blog, I will demonstrate how to properly setup a streaming replication setup in PG12.

Streaming replication setup requires a master cluster and one or more slave clusters that will replicate the data inserted to the master by streaming the archived WAL files generated by master. The master and slaves can reside on different machines connected via network but in this blog, we will use one master and one slave setup and both will be run on the same machine with different port number.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04

Replication-Failover-with-pg_rewind-in-PG12

1. Overview

In the previous blog, we have discussed how to correctly set up streaming replication clusters between one master and one slave in Postgres version 12. In this blog, we will simulate a failover scenario on the master database, which causes the replica (or slave) database cluster to be promoted as new master and continue the operation. We will also simulate a failback scenario to reuse the old master cluster after the failover scenario with the help of pg_rewind.

Normally it is quite easy to do a failback to the old master after slave gets promoted to master but if there is data written to the old master after slave promotion, we will have an out-of-sync case between them both and we will have to use the pg_rewind tool to synchronize the two data directories in order to bring the old master to match the state of the new master. Please note that the pg_rewind tool will remove transactions from the old master in order to match up with the new, so certain pre-caution is needed to use this tool.

Here’s a brief overview of list of actions we are going to perform:

simulate failover by promoting slave cluster, so it becomes a new master
simulate data insertion to master cluster, also referred as old master after promotion
shutdown the old master cluster and set it up as a standby server
run pg_rewind on old master to synchronize transaction states with new master
bring up old master as a standby server to synchronize with the new master

This blog assumes you already have streaming replication setup between one master and one slave from previous blog. If you have not checked out the previous blog titled “Streaming Replication Setup in PG12 - How to Do it Right”, it is recommended to give that a read first.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04