The significance of LSN in PostgreSQL System

1. Introduction

Last year I wrote a blog about PostgreSQL’s timeline concept, which is essential for executing Point In Time Recovery (PITR) back to a particular timeline and particular Log Sequence Number (LSN). But we have not talked about the idea of LSN in which everything else is built upon. Today in this blog, I will describe the significance of LSN

2. The LSN and REDO

We hear the term REDO a lot and it could mean a couple things depending on the context it is used. If used as a REDO point, it indicates the location of the current WAL record since the current or last checkpoint is started. This is the point where the database should commence the recovery.

When REDO is used by itself as a verb, it refers to the action of a standby server, which will read the WAL contents sent from the primary via streaming replication and actually replay the actions. Depending on the database action, different ways to REDO needs to be defined. This is why in heapam.c source code, functions like heap_redo and heap2_redo exist.

LSN stands for Log Sequence Number and it is basically a pointer to a location in the WAL. It is represented as unsigned 64-bit value internally in PG; external to the PG, it is represented as 2 hexadecimal number separated by a slash (/), First number being the segment ID of a particular WAL file and second number being the offset within that segment file.

For example:

1/CF54A048 could also be represented as a single value as 0x01CF54A048, which equals to 7,773,397,064 and this is the integer number used internally in PG to compute or comparae different LSN values.

Based on this value, you could tell the volume of a database, the bigger the LSN, the larger the database.

I have been researching about the feasibility of having multiple PostgreSQL servers to sharer a single set of data storage. One server being able to read and write, the rest of servers are read-only. On the read-only servers, I introduced a temporary storage space in their respective cluster folder to temporary hold any data blocks that they want to eject from the buffer manager in case the primary server has not yet flushed data to the shared storage space.

During my testing with this approach, I came across an issue where the standbys may face a situation where the same data page but different version could exist in both the shared storage and also their own temporary space. Loading the wrong version of the page may result in incorrect output from user queries.

So, how can we solve this issue?

The pd_lsn parameter inside each buffer page header indicates the “last WAL LSN that has modified this page”. So by simply comparing the pd_lsn values from 2 version of the data page, I could ensure the standby always read the correct one.

The LSN is significant in a sense that almost every WAL recovery functions of PG work around LSN and in my own feasibility studies, LSN also plays an important role in making shared standby’s more intelligent in loading data blocks from multiple sources.