Обсуждение: BUG #19373: One backend hanging in AioIoUringExecution blocking other backends

Поиск
Список
Период
Сортировка

BUG #19373: One backend hanging in AioIoUringExecution blocking other backends

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19373
Logged by:          Michael Kröll
Email address:      michael.kroell@gmail.com
PostgreSQL version: 18.1
Operating system:   Linux 6.1.0-41-amd64 #1 SMP PREEMPT_DYNAMIC Debian
Description:

We've upgraded to Pg18 with ``io_method=io_uring`` early last December and
things were running smoothly until early last Sunday one of the simple
SELECT queries which is triggered a couple of thousands a day and usually
only runs for milliseconds got stuck. It was hanging for almost 24h without
visible activity until I've manually killed the backend (with -9 force).

The query looked like this in the backend:
| pid     | leader_pid | state_change                  | wait_event_type |
wait_event          | state  |
|---------|------------|-------------------------------|-----------------|---------------------|--------|
| 2034811 |            | 2026-01-04 07:18:27.158077+01 | IO              |
AioIoUringExecution | active |
| 3497711 | 2034811    | 2026-01-04 07:18:27.182794+01 | IPC             |
MessageQueueSend    | active |
| 3497712 | 2034811    | 2026-01-04 07:18:27.184025+01 | IPC             |
MessageQueueSend    | active |

and the leader PID looked like waiting

```bash
~ # strace -p 2034811
strace: Process 2034811 attached
io_uring_enter(20, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8

[root@host] 2026-01-05 07:58:11
~ # ltrace -p 2034811
io_uring_wait_cqes(0x7f3af3ea9e10, 0x7fff2bb25e00, 1, 0
```

Even though there was a *global* statement_timeout=61s configured, backends
accessing the same table were hanging with ``LWLock AioUringCompletion``

Restarting the cluster did not go through until the hanging leader PID was
``SIGKILL``ed

Nothing in journal, Pg log or ring-buffer hinting to something around the
time-frame of problematic's backend query_start.

Did anyone experience similar issues?

Is that a kernel/io_uring issue or something which Pg should/could handle?

```
Pg 18.1 (Debian 18.1-1.pgdg12+2)
Linux 6.1.0-41-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.158-1 (2025-11-09)
x86_64 GNU/Linux
```





Re: BUG #19373: One backend hanging in AioIoUringExecution blocking other backends

От
surya poondla
Дата:
Hi Michael,

Thank you for the detailed report.

Even though there was a *global* statement_timeout=61s configured, backends
accessing the same table were hanging with ``LWLock AioUringCompletion``

The statement_timeout not interrupting and not erroring out looks weird, and this part could be a postgres bug in itself.
 
Restarting the cluster did not go through until the hanging leader PID was
``SIGKILL``ed
Am I understanding this correctly as "Normal shutdown (SIGTERM or pg_ctl stop) did not complete, and postmaster remained waiting on until AioIoUringExecution was force killed" ?

I’m interested in digging into this and am wondering about the below
1. What filesystem and storage was this instance running on
2. Was this a parallel sequential scan, was any index access involved?
3. By any chance do you have a reproducible test case?
4. Can you share what shared_preload_libraries you are using?

Regards,
Surya Poondla

Re: BUG #19373: One backend hanging in AioIoUringExecution blocking other backends

От
Michael Kröll
Дата:
On 1/17/26 01:09, surya poondla wrote:
>     Restarting the cluster did not go through until the hanging leader
>     PID was
>     ``SIGKILL``ed
> 
> Am I understanding this correctly as "Normal shutdown (SIGTERM or pg_ctl 
> stop) did not complete, and postmaster remained waiting on 
> until AioIoUringExecution was force killed" ?

Remained waiting until the PG backend process in wait_event 
AioIoUringExecution was SIGKILLed.

> I’m interested in digging into this and am wondering about the below
> 1. What filesystem and storage was this instance running on

Two of
Disk model: INTEL SSDPE2KX010T7
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
in a software RAID 1 on ext4

> 2. Was this a parallel sequential scan, was any index access involved?

I not have the exact query plan for this one as it depends on the bind 
parameters passed and there are three ARRAY type bind parameters for the 
query in question.

Typically the query plan looks like this:

  Gather  (cost=1825.51..30731.85 rows=6391 width=678)
    Workers Planned: 2
    ->  Nested Loop  (cost=825.51..29092.75 rows=2663 width=678)
          ->  Parallel Bitmap Heap Scan on offer o 
(cost=825.09..22161.55 rows=2676 width=542)
                Recheck Cond: (id = ANY 

('{3048845,2121345,2840302,2807790,3273743,2798121,2017850,3226237,1501236,2449122,2891576,2927727,3526960,3467910,2929690,3299523,3458918,2840304,1707208,2101471,245>
                Filter: (vfb_in_de AND ((loc)::text = ANY 
('{de,at}'::text[])))
                ->  Bitmap Index Scan on offer_id_npr_lzf_idx 
(cost=0.00..822.62 rows=14268 width=0)
                      Index Cond: (id = ANY 

('{3048845,2121345,2840302,2807790,3273743,2798121,2017850,3226237,1501236,2449122,2891576,2927727,3526960,3467910,2929690,3299523,3458918,2840304,1707208,2101471>
          ->  Index Scan using gh_haendler_pkey on gh_haendler 
(cost=0.42..2.58 rows=1 width=8)
                Index Cond: (h_id = o.h_id)
                Filter: (COALESCE(multimerchants_template_id, o.h_id) <> 
ALL ('{4957}'::integer[]))

In the case of the problematic query/params/backend there were *two* 
workers with the associated leader PID found in pg_stat_activity.

> 3. By any chance do you have a reproducible test case?

Unfortunately not: We had this happily running in production for 
multiple weeks without issues on three identical machines and it 
happened once on one. We could not reproduce it on our 
development/testing machines.

At the moment we have switched to io_method=worker which at least for 
the index driven use cases on those machines won't make a big difference.

We could configure one of those three boxes again with io_uring but will 
not know ahead if we'll trigger this issue ever again and would need to 
have a solid specific monitoring in place before.

At the same time, we've upgraded to Kernel 6.12 by now on those boxes 
and if the issue was related to an interaction dependant on the io_uring 
versions, this might be another reason, we'll likely not see the same 
issue again.

> 4. Can you share what shared_preload_libraries you are using?

pg_stat_statements is the only one used there.

Thank you for having a look. Sorry for not being able to provide more 
specifics.

BR,
Michael


> Regards,
> Surya Poondla