Обсуждение: debug a lockup

Поиск

Список

Период

Сортировка

debug a lockup

От

Scott Ribe

Дата:

10 февраля, 19:55:55

PostgreSQL appears locked up. pgbench run that should have completed in a few seconds has been running  14 hours. psql
invocationlocks up. No CPU usage showing in top. 

I personally suspect infra issues. (k8s pod, Pure block storage) But I'm getting pushback pointing the finger at PG.
It's18.1, and pgbench is the only client FWIW. 

Any way to introspect the current non-debug build to get a clue what's going on in there?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Re: debug a lockup

От

Tom Lane

Дата:

10 февраля, 20:37:08

Scott Ribe <scott_ribe@elevated-dev.com> writes:
> Any way to introspect the current non-debug build to get a clue what's going on in there?

Backend stack traces taken with gdb should yield at least some clue
even if you don't have debug symbols.

            regards, tom lane

Re: debug a lockup

От

Aislan Luiz Wendling

Дата:

11 февраля, 03:00:43

Hello,

Does it repeat on every run?

If it is possible, try to gracefully stop postgresql.

Not working? Try stop immediate and last resort stop abort.

If postgrres service does not stop, try to kill pgbench process.

First try kill -15 <pgbench PID> and if it does not work, kill -9

Nothing works, reboot the vm.

Open two terminals, start pgbench process in one. In the other ps -ef | grep pgbench

Find the parent process ID and do a strace -f -p <PID> (maybe your kernel has a different syntax, but it is to trace a process and its forks)

It can show which set of instructions is waiting. You will know because usually you are not able to read due to its speed, but when it stops, it is waiting for something.

Hope it helps.

ALW

From: Scott Ribe <scott_ribe@elevated-dev.com>
Sent: Tuesday, February 10, 2026 11:55 AM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: debug a lockup

PostgreSQL appears locked up. pgbench run that should have completed in a few seconds has been running 14 hours. psql invocation locks up. No CPU usage showing in top.

I personally suspect infra issues. (k8s pod, Pure block storage) But I'm getting pushback pointing the finger at PG. It's 18.1, and pgbench is the only client FWIW.

Any way to introspect the current non-debug build to get a clue what's going on in there?

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Re: debug a lockup

От

Scott Ribe

Дата:

11 февраля, 03:12:19

OK, we figured it out--I think.

pgbench was stuck in restart_syscall(<...resuming interrupted read...

it was set to open 100 connections

there were ~20 pg sessions in idle, and the last one (highest pid) in auth

that one was in write to fd 2

So... This is running in kubernetes. I was doing some load testing against a storage service (thus 100 connections). PG
waslaunched manually in a bash session connected to the pod, in k9s. There were ~20 total bash sessions open in k9s
across15 nodes. 

Theory: k9s glitched and stopped reading the piped file descriptor, buffer filled, and PG blocked on the write. (I have
seenprior evidence of less-than-perfect handling of output by k9s). Particularly, I had logging of connections on, so
atauth it would have been writing to stderr. 

This happened in one of probably over 100 runs of the same test, so not readily reproducible and I wanted to autopsy it
beforekilling off the hung processes. Unless someone pokes a hole in my theory, at this point I think it is neither
pgbenchnor PG nor Pure/Portworx at fault. 

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: debug a lockup

debug a lockup

Re: debug a lockup

Re: debug a lockup

Re: debug a lockup