Обсуждение: PostgreSQL 15.5 stops processing user queries

Поиск
Список
Период
Сортировка

PostgreSQL 15.5 stops processing user queries

От
Andrey Zhidenkov
Дата:
Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The problem is that
PostgreSQL server periodically falls into a state when it accepts new connections but
doesn't execute any queries. The session which runs a query hangs and cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter how we connect to
the database - both TCP and unix socket sessions hang but it seems that existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0  0x00007f6d31dbd378 in poll () from /lib64/libc.so.6
#1  0x00007f6d3286aee1 in pqSocketCheck.part.2 () from /usr/pgsql-15/lib/libpq.so.5
#2  0x00007f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3  0x00007f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4  0x0000000000411320 in ExecQueryAndProcessResults (query=query@entry=0x23a68b0 "select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8, svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
    is_watch=is_watch@entry=false, opt=opt@entry=0x0, printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5  0x000000000040feb9 in SendQuery (query=0x23a68b0 "select 1;") at common.c:1117
#6  0x000000000040627b in main (argc=<optimized out>, argv=<optimized out>) at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL 15.5:

postgres=# select version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of machines that run the
same version of PostgreSQL but we have the problem only with one cluster so maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as well.

We will really appreciate any help, thanks!

--
With best regards, Andrei Zhidenkov.

Re: PostgreSQL 15.5 stops processing user queries

От
Tom Lane
Дата:
Andrey Zhidenkov <pensnarik@gmail.com> writes:
> We have encountered an issue with our PostgreSQL 15.5 installation. The
> problem is that
> PostgreSQL server periodically falls into a state when it accepts new
> connections but
> doesn't execute any queries. The session which runs a query hangs and
> cannot be terminated
> via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
> has state "S" and it's
> not terminated even if the PostgreSQL master process is stopped. No matter
> how we connect to
> the database - both TCP and unix socket sessions hang but it seems that
> existing sessions can
> still execute queries (once we succeeded to connect using pgBouncer which
> seemed to use an
> existing connection to the database).

Try using an existing session to capture pg_stat_activity and pg_locks
information about the stuck session(s).

> Here is a stack trace from gdb for one of the frozen sessions:

This appears to be a trace of a psql process waiting for a query
result.  It won't teach you much about what the server is doing.

            regards, tom lane



Re: PostgreSQL 15.5 stops processing user queries

От
Andrey Zhidenkov
Дата:
Sorry, wrong stack trace. Here is a correct one:

(gdb) bt
#0  0x00007f9acca1bdf6 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f9acca1bee8 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00000000007815a2 in PGSemaphoreLock ()
#3  0x00000000007fca6c in LWLockAcquire ()
#4  0x00007f9ac5c385e6 in pgsm_store () from /usr/pgsql-15/lib/pg_stat_monitor.so
#5  0x00007f9ac5c39a6b in pgsm_ExecutorEnd () from /usr/pgsql-15/lib/pg_stat_monitor.so
#6  0x0000000000658d91 in PortalCleanup ()
#7  0x000000000095b264 in PortalDrop ()
#8  0x000000000080c6df in exec_simple_query ()
#9  0x000000000080de22 in PostgresMain ()
#10 0x000000000078db80 in ServerLoop ()
#11 0x000000000078eb34 in PostmasterMain ()
#12 0x000000000050474d in main () 

It’s seems to be a problem in pg_stat_monitor extension. The similar issue is described here (in Chinese): https://blog.csdn.net/qq_43687755/article/details/117592635

On Thu, 21 Dec 2023, 18:32 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Zhidenkov <pensnarik@gmail.com> writes:
> We have encountered an issue with our PostgreSQL 15.5 installation. The
> problem is that
> PostgreSQL server periodically falls into a state when it accepts new
> connections but
> doesn't execute any queries. The session which runs a query hangs and
> cannot be terminated
> via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
> has state "S" and it's
> not terminated even if the PostgreSQL master process is stopped. No matter
> how we connect to
> the database - both TCP and unix socket sessions hang but it seems that
> existing sessions can
> still execute queries (once we succeeded to connect using pgBouncer which
> seemed to use an
> existing connection to the database).

Try using an existing session to capture pg_stat_activity and pg_locks
information about the stuck session(s).

> Here is a stack trace from gdb for one of the frozen sessions:

This appears to be a trace of a psql process waiting for a query
result.  It won't teach you much about what the server is doing.

                        regards, tom lane