Re: Non-reproducible AIO failure

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: Non-reproducible AIO failure
Дата
Msg-id f50fe8a7-4e6a-4a72-9347-36c1960b91b2@gmail.com
обсуждение исходный текст
Ответ на Re: Non-reproducible AIO failure  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: Non-reproducible AIO failure
Список pgsql-hackers
Hello,

05.06.2025 22:00, Alexander Lakhin wrote:
> Thank you for your attention to this and for the tip! Today I tried the
> following:
> --- a/src/include/storage/aio.h
> +++ b/src/include/storage/aio.h
> @@ -89,8 +89,8 @@ typedef enum PgAioOp
>         /* intentionally the zero value, to help catch zeroed memory etc */
>         PGAIO_OP_INVALID = 0,
>
> -       PGAIO_OP_READV,
> -       PGAIO_OP_WRITEV,
> +       PGAIO_OP_READV = 0xaa,
> +       PGAIO_OP_WRITEV = 0xbb,

And one more bit of information from me, if you're not tired of it yet:
I've added:
@@ -2238,6 +2240,7 @@ FileStartReadV(PgAioHandle *ioh, File file,

         vfdP = &VfdCache[file];

+if (ioh->op != PGAIO_OP_INVALID) elog(LOG, "!!!FileStartReadV| ioh: %p, ioh->op: %d, ioh->state: %d, ioh->result: %d,

ioh->num_callbacks: %d, ioh->generation: %llu", ioh, ((PgAioHandle *)ioh)->op, ((PgAioHandle *)ioh)->state, 
((PgAioHandle *)ioh)->result, ((PgAioHandle *)ioh)->num_callbacks, ((PgAioHandle *)ioh)->generation);
         pgaio_io_start_readv(ioh, vfdP->fd, iovcnt, offset);

and got:
...
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29200
!!!AsyncReadBuffers [91056] (1)| blocknum: 70, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0, ioh->generation: 29200
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, 
ioh->generation: 29200
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29201
!!!AsyncReadBuffers [91056] (1)| blocknum: 71, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0, ioh->generation: 29201
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, 
ioh->generation: 29201
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29202
!!!AsyncReadBuffers [91056] (1)| blocknum: 72, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0, ioh->generation: 29202
!!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, 
ioh->generation: 29202
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203
!!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:14] pg_regress/brin LOG: !!!FileStartReadV| ioh: 0x1066d15d0, ioh->op: 170, 
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:15] pg_regress/brin CONTEXT:  SQL statement "SELECT array_agg(ctid) FROM brintest 
WHERE cidrcol = '10.2.14/24'::inet "
         PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:16] pg_regress/brin STATEMENT: DO $x$
...
2025-06-05 19:52:44.174 EDT [91056:17] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1066d15d0, ioh->op: 170, 
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203
2025-06-05 19:52:44.174 EDT [91056:18] pg_regress/brin CONTEXT:  SQL statement "SELECT array_agg(ctid) FROM brintest 
WHERE cidrcol = '10.2.14/24'::inet "
         PL/pgSQL function inline_code_block line 52 at EXECUTE
2025-06-05 19:52:44.174 EDT [91056:19] pg_regress/brin STATEMENT: DO $x$
         DECLARE
                 r record;
                 r2 record;
                 cond text;
                 idx_ctids tid[];
                 ss_ctids tid[];
                 count int;
                 plan_ok bool;
                 plan_line text;
         BEGIN
                 FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op)

WITH ORDINALITY AS oper LOOP
...
TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 91056
0   postgres                            0x0000000104ff771c ExceptionalCondition + 236
1   postgres                            0x0000000104cfe32c pgaio_io_before_start + 516
2   postgres                            0x0000000104cfe080 pgaio_io_start_readv + 36
3   postgres                            0x0000000104d1d310 FileStartReadV + 508
4   postgres                            0x0000000104d705f0 mdstartreadv + 668
5   postgres                            0x0000000104d73bd8 smgrstartreadv + 116
6   postgres                            0x0000000104d06484 AsyncReadBuffers + 2028
7   postgres                            0x0000000104d05420 StartReadBuffersImpl + 1196
8   postgres                            0x0000000104d04f64 StartReadBuffers + 64
9   postgres                            0x0000000104d0256c read_stream_start_pending_read + 1204
10  postgres                            0x0000000104d01c50 read_stream_look_ahead + 812
11  postgres                            0x0000000104d017e8 read_stream_next_buffer + 2356
12  postgres                            0x00000001046f661c heap_fetch_next_buffer + 284
13  postgres                            0x00000001046e73b8 heapgettup_pagemode + 192
14  postgres                            0x00000001046e7988 heap_getnextslot + 84

Thus, this time ioh->op switched to PGAIO_OP_READV earlier, even before
the logging in AsyncReadBuffers(). By the way, this
"DO .. FOR ... FROM brinopers, unnest(op) ... " triggers the assertion not
for the first time for me (I've counted 3 such failures in my notes).

Best regards,
Alexander Lakhin



В списке pgsql-hackers по дате отправления: