1. Introduction
In this blog post, I will talk about how to enable the generation of crash dump file (also known as core dump) and some common GDB commands to help a developer troubleshoot a crash-related issues within PostgreSQL and also other applications. Proper analysis of the issue normally will take time and certain degree of knowledge about the application source code. From experience, sometimes it may be better to look at the bigger environment instead of looking at the point of crash.
2. What is a Crash Dump File?
A crash dump file is a file that consists of the recorded state of the working memory of an application when it crashes. This state is represented by stacks of memory addresses and CPU registers and normally it is extremely difficult to debug with only memory addresses and CPU registers because they tell you no information about the application logic. Considering the core dump contents below, which shows the back trace of memory addresses to the point of crash.
#1 0x00687a3d in ?? () |
Not very useful is it? So, when we see a crash dump file that looks like this, it means the application is not built with debugging symbols, making this crash dump file useless. If this is the case, you will need to install the debug version of the application or re-build the application with debugging enabled.
3. How to Generate a Useful Crash Dump File
Before the generation of crash dump file, we need to ensure the application is built with debugging symbols. This can be done by executing the ./configure
script like this:
./configure enable-debug |
This adds the -g
argument to CFLAGS in src/Makefile.global
with optimization level set to 2 (-O2
). My preference is to also change the optimization to 0 (-O0
) so when we are navigating the stack using GDB, the navigation will make much more sense rather than jumping around and we will be able to print out most variables values in memory instead of getting optimized out
error in GDB.
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -g -O0 |
Now, we can enable the crash dump generation. This can be done by the user limit command.
ulimit -c unlimited |
to disable:
ulimit -c 0 |
Make sure there is enough disk space because crash dump file is normally very large as it records all the memory execution states from start to crash, and make sure the ulimit
is set up in the shell before starting PostgreSQL. When PostgreSQL crashes, a core dump file named core
will be generated in $PGDATA
4. Analyzing the Dump File using GDB
GDB (GNU Debugger) is a portable debugger that runs on many Unix-like systems and can work with many programming languages and is my favorite tool to analyze a crash dump file. To demonstrate this, I will intentionally add a line in PostgreSQL source code that will result in segmentation fault
crash type when a CREATE TABLE
command is run.
Assuming the PostgreSQL has already crashed and generated a core dump file core
in this location ~/highgo/git/postgres/postgresdb/core
. I would first use the file
utility to understand more about the core file. Information such as the kernel info, and the program that generated it.
caryh@HGPC01:~$ file /home/caryh/highgo/git/postgres/postgresdb/core |
The file
utility tells me that the core file is generated by this application /home/caryh/highgo/git/postgres/highgo/bin/postgres
, so I would execute gdb
like this:
gdb /home/caryh/highgo/git/postgres/highgo/bin/postgres -c /home/caryh/highgo/git/postgres/postgresdb/core |
Immediately after running gdb
on the core
file, it shows the location of the crash at heapam.c:1840
and that is exactly the line I have intentionally added to cause a crash.
5. Useful GDB Commands
With gdb
, it is very easy to identify the location of a crash, because it tells you immediately after running gdb
on the core
file. Unfortunately, 95% of the time, the location of the crash is not the real cause of the problem. This is why I mentioned earlier that sometimes it may be better to look at the bigger environment instead of looking at the point of crash. The crash is likely caused by a mistake in the application logic some where else in the application before it hits the point of crash. Even if you fix the crash, the mistake in application logic still exists and most likely, the application will crash somewhere else later or yield unsatisfactory results. Therefore, it is worth awhile to understand some of the powerful GDB commands that could help us understand the call stacks better to identify the real root cause.
5.1 The bt
(Back Trace) command
The bt
command shows a series of call stacks since the beginning of the application all the way to the point of crash. With full debugging enabled, you will be able to see the function arguments and values being passed in to each function calls as well as the source file and line numbers where they were called. This allows developer to travel backwards to check for any potential application logic mistake in the earlier processing.
|
5.1 The f
(Fly) command
The f
command followed by a stack number allows gdb to jump to a particular call stack listed by the bt
command and allows you to print other variable in that particular stack. For example:
(gdb) f 3 |
This forces gdb
to jump to stack number 3, which is at pg_type.c:484. In here, you can examine all other variables in this frame (in function TypeCreate).
5.2 The p
(Print) command
The most popular command in gdb
, which can be used to print variable addresses and values
(gdb) p tup |
With the asteroid, you can tell the p
command to either print the address of a pointer or the values pointed by the pointer.
5.3 The x
(examine) command
The x
command is used to examine a memory block contents with specified size and format. The following example tries to examine the t_data
values inside a HeapTuple
structure. Note that we first print the *tup
pointer to learn the size of t_data
is 176, then we use the x
command to examine the first 176 bytes pointed by t_data
(gdb) p *tup |
7. Conclusion
In this blog, we have discussed about how to generate a useful crash dump file with sufficient debug symbols to help developers troubleshoot a crash issue in PostgreSQL and also in other applications. We have also discussed about a very powerful and useful debugger gdb
and shared some of the most common commands that can be utilized to troubleshoot a crash issue from a core file. I hope the information here can help some developers out there to troubleshoot issues better.