Re: Non-reproducible AIO failure
От | Alexander Lakhin |
---|---|
Тема | Re: Non-reproducible AIO failure |
Дата | |
Msg-id | f50fe8a7-4e6a-4a72-9347-36c1960b91b2@gmail.com обсуждение исходный текст |
Ответ на | Re: Non-reproducible AIO failure (Alexander Lakhin <exclusion@gmail.com>) |
Ответы |
Re: Non-reproducible AIO failure
|
Список | pgsql-hackers |
Hello, 05.06.2025 22:00, Alexander Lakhin wrote: > Thank you for your attention to this and for the tip! Today I tried the > following: > --- a/src/include/storage/aio.h > +++ b/src/include/storage/aio.h > @@ -89,8 +89,8 @@ typedef enum PgAioOp > /* intentionally the zero value, to help catch zeroed memory etc */ > PGAIO_OP_INVALID = 0, > > - PGAIO_OP_READV, > - PGAIO_OP_WRITEV, > + PGAIO_OP_READV = 0xaa, > + PGAIO_OP_WRITEV = 0xbb, And one more bit of information from me, if you're not tired of it yet: I've added: @@ -2238,6 +2240,7 @@ FileStartReadV(PgAioHandle *ioh, File file, vfdP = &VfdCache[file]; +if (ioh->op != PGAIO_OP_INVALID) elog(LOG, "!!!FileStartReadV| ioh: %p, ioh->op: %d, ioh->state: %d, ioh->result: %d, ioh->num_callbacks: %d, ioh->generation: %llu", ioh, ((PgAioHandle *)ioh)->op, ((PgAioHandle *)ioh)->state, ((PgAioHandle *)ioh)->result, ((PgAioHandle *)ioh)->num_callbacks, ((PgAioHandle *)ioh)->generation); pgaio_io_start_readv(ioh, vfdP->fd, iovcnt, offset); and got: ... !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29200 !!!AsyncReadBuffers [91056] (1)| blocknum: 70, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 29200 !!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29200 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 29200 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29201 !!!AsyncReadBuffers [91056] (1)| blocknum: 71, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 29201 !!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29201 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 29201 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29202 !!!AsyncReadBuffers [91056] (1)| blocknum: 72, ioh: 0x1066d15d0, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 29202 !!!pgaio_io_start_readv [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->generation: 29202 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 29202 !!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203 !!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 29203 2025-06-05 19:52:44.174 EDT [91056:14] pg_regress/brin LOG: !!!FileStartReadV| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203 2025-06-05 19:52:44.174 EDT [91056:15] pg_regress/brin CONTEXT: SQL statement "SELECT array_agg(ctid) FROM brintest WHERE cidrcol = '10.2.14/24'::inet " PL/pgSQL function inline_code_block line 52 at EXECUTE 2025-06-05 19:52:44.174 EDT [91056:16] pg_regress/brin STATEMENT: DO $x$ ... 2025-06-05 19:52:44.174 EDT [91056:17] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 29203 2025-06-05 19:52:44.174 EDT [91056:18] pg_regress/brin CONTEXT: SQL statement "SELECT array_agg(ctid) FROM brintest WHERE cidrcol = '10.2.14/24'::inet " PL/pgSQL function inline_code_block line 52 at EXECUTE 2025-06-05 19:52:44.174 EDT [91056:19] pg_regress/brin STATEMENT: DO $x$ DECLARE r record; r2 record; cond text; idx_ctids tid[]; ss_ctids tid[]; count int; plan_ok bool; plan_line text; BEGIN FOR r IN SELECT colname, oper, typ, value[ordinality], matches[ordinality] FROM brinopers, unnest(op) WITH ORDINALITY AS oper LOOP ... TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 91056 0 postgres 0x0000000104ff771c ExceptionalCondition + 236 1 postgres 0x0000000104cfe32c pgaio_io_before_start + 516 2 postgres 0x0000000104cfe080 pgaio_io_start_readv + 36 3 postgres 0x0000000104d1d310 FileStartReadV + 508 4 postgres 0x0000000104d705f0 mdstartreadv + 668 5 postgres 0x0000000104d73bd8 smgrstartreadv + 116 6 postgres 0x0000000104d06484 AsyncReadBuffers + 2028 7 postgres 0x0000000104d05420 StartReadBuffersImpl + 1196 8 postgres 0x0000000104d04f64 StartReadBuffers + 64 9 postgres 0x0000000104d0256c read_stream_start_pending_read + 1204 10 postgres 0x0000000104d01c50 read_stream_look_ahead + 812 11 postgres 0x0000000104d017e8 read_stream_next_buffer + 2356 12 postgres 0x00000001046f661c heap_fetch_next_buffer + 284 13 postgres 0x00000001046e73b8 heapgettup_pagemode + 192 14 postgres 0x00000001046e7988 heap_getnextslot + 84 Thus, this time ioh->op switched to PGAIO_OP_READV earlier, even before the logging in AsyncReadBuffers(). By the way, this "DO .. FOR ... FROM brinopers, unnest(op) ... " triggers the assertion not for the first time for me (I've counted 3 such failures in my notes). Best regards, Alexander Lakhin
В списке pgsql-hackers по дате отправления: