Обсуждение: debug a lockup
PostgreSQL appears locked up. pgbench run that should have completed in a few seconds has been running 14 hours. psql invocationlocks up. No CPU usage showing in top. I personally suspect infra issues. (k8s pod, Pure block storage) But I'm getting pushback pointing the finger at PG. It's18.1, and pgbench is the only client FWIW. Any way to introspect the current non-debug build to get a clue what's going on in there? -- Scott Ribe scott_ribe@elevated-dev.com https://www.linkedin.com/in/scottribe/
Scott Ribe <scott_ribe@elevated-dev.com> writes:
> Any way to introspect the current non-debug build to get a clue what's going on in there?
Backend stack traces taken with gdb should yield at least some clue
even if you don't have debug symbols.
regards, tom lane
Hello,
Does it repeat on every run?
If it is possible, try to gracefully stop postgresql.
Not working? Try stop immediate and last resort stop abort.
If postgrres service does not stop, try to kill pgbench process.
First try kill -15 <pgbench PID> and if it does not work, kill -9
Nothing works, reboot the vm.
Open two terminals, start pgbench process in one. In the other ps -ef | grep pgbench
Find the parent process ID and do a strace -f -p <PID> (maybe your kernel has a different syntax, but it is to trace a process and its forks)
It can show which set of instructions is waiting. You will know because usually you are not able to read due to its speed, but when it stops, it is waiting for something.
Hope it helps.
ALW
From: Scott Ribe <scott_ribe@elevated-dev.com>
Sent: Tuesday, February 10, 2026 11:55 AM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: debug a lockup
Sent: Tuesday, February 10, 2026 11:55 AM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: debug a lockup
PostgreSQL appears locked up. pgbench run that should have completed in a few seconds has been running 14 hours. psql invocation locks up. No CPU usage showing in top.
I personally suspect infra issues. (k8s pod, Pure block storage) But I'm getting pushback pointing the finger at PG. It's 18.1, and pgbench is the only client FWIW.
Any way to introspect the current non-debug build to get a clue what's going on in there?
--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/
I personally suspect infra issues. (k8s pod, Pure block storage) But I'm getting pushback pointing the finger at PG. It's 18.1, and pgbench is the only client FWIW.
Any way to introspect the current non-debug build to get a clue what's going on in there?
--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/
OK, we figured it out--I think. pgbench was stuck in restart_syscall(<...resuming interrupted read... it was set to open 100 connections there were ~20 pg sessions in idle, and the last one (highest pid) in auth that one was in write to fd 2 So... This is running in kubernetes. I was doing some load testing against a storage service (thus 100 connections). PG waslaunched manually in a bash session connected to the pod, in k9s. There were ~20 total bash sessions open in k9s across15 nodes. Theory: k9s glitched and stopped reading the piped file descriptor, buffer filled, and PG blocked on the write. (I have seenprior evidence of less-than-perfect handling of output by k9s). Particularly, I had logging of connections on, so atauth it would have been writing to stderr. This happened in one of probably over 100 runs of the same test, so not readily reproducible and I wanted to autopsy it beforekilling off the hung processes. Unless someone pokes a hole in my theory, at this point I think it is neither pgbenchnor PG nor Pure/Portworx at fault. -- Scott Ribe scott_ribe@elevated-dev.com https://www.linkedin.com/in/scottribe/