Re: Non-reproducible AIO failure
От | Alexander Lakhin |
---|---|
Тема | Re: Non-reproducible AIO failure |
Дата | |
Msg-id | 26606306-ab38-495d-bfc9-15c840d93d4d@gmail.com обсуждение исходный текст |
Ответ на | Re: Non-reproducible AIO failure (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Non-reproducible AIO failure
Re: Non-reproducible AIO failure |
Список | pgsql-hackers |
Hello Thomas and Andres, 04.06.2025 23:32, Thomas Munro wrote: > On Thu, Jun 5, 2025 at 8:02 AM Andres Freund <andres@anarazel.de> wrote: >> On 2025-06-03 08:00:01 +0300, Alexander Lakhin wrote: >>> 2025-06-03 00:19:09.282 EDT [25175:1] LOG: !!!pgaio_io_before_start| ioh: >>> 0x104c3e1a0, ioh->op: 1, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: >>> 2, ioh->generation: 21694 >> But here it somehow turned to 1 (PGAIO_OP_READV), despite there not being any >> "normal" reason for that. We know that the IO wasn't actually started as >> otherwise pgaio_io_start_readv() would have logged that fact. > A cheap/easy thing to try, maybe: give that enumeration more > distinctive values like 0xaaaa, 0xbbbb, or whatever, to see if we get > a wild 1 here? Thank you for your attention to this and for the tip! Today I tried the following: --- a/src/include/storage/aio.h +++ b/src/include/storage/aio.h @@ -89,8 +89,8 @@ typedef enum PgAioOp /* intentionally the zero value, to help catch zeroed memory etc */ PGAIO_OP_INVALID = 0, - PGAIO_OP_READV, - PGAIO_OP_WRITEV, + PGAIO_OP_READV = 0xaa, + PGAIO_OP_WRITEV = 0xbb, And got:: !!!AsyncReadBuffers [68292] (1)| blocknum: 9, ioh: 0x102b26230, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 13038 !!!pgaio_io_start_readv [68292]| ioh: 0x102b26230, ioh->op: 170, ioh->generation: 13038 !!!pgaio_io_reclaim [67729]| ioh: 0x102b25540, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 47468 !!!pgaio_io_reclaim [67729]| ioh: 0x102b25540, ioh->op: 0, ioh->generation: 47469 !!!pgaio_io_reclaim [68292]| ioh: 0x102b26230, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 13038 !!!pgaio_io_reclaim [68292]| ioh: 0x102b26230, ioh->op: 0, ioh->generation: 13039 !!!AsyncReadBuffers [68292] (1)| blocknum: 10, ioh: 0x102b26230, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 13039 !!!pgaio_io_start_readv [68292]| ioh: 0x102b26230, ioh->op: 170, ioh->generation: 13039 !!!pgaio_io_reclaim [67718]| ioh: 0x102b25390, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 35821 !!!pgaio_io_reclaim [67718]| ioh: 0x102b25390, ioh->op: 0, ioh->generation: 35822 !!!pgaio_io_reclaim [68292]| ioh: 0x102b26230, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2, ioh->generation: 13039 !!!pgaio_io_reclaim [68292]| ioh: 0x102b26230, ioh->op: 0, ioh->generation: 13040 !!!AsyncReadBuffers [68292] (1)| blocknum: 11, ioh: 0x102b26230, ioh->op: 0, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, ioh->generation: 13040 2025-06-05 04:15:38.847 EDT [68292:1] LOG: !!!pgaio_io_before_start| ioh: 0x102b26230, ioh->op: 170, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 13040 ### I reproduced the failure a few times and observed "ioh->op: 170" each time. ### 2025-06-05 04:15:38.847 EDT [68292:2] STATEMENT: select count(*) from simple r join extremely_skewed s using (id); TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 68292 0 postgres 0x000000010144b550 ExceptionalCondition + 236 1 postgres 0x0000000101152260 pgaio_io_before_start + 516 2 postgres 0x0000000101151fb4 pgaio_io_start_readv + 36 3 postgres 0x0000000101171144 FileStartReadV + 252 4 postgres 0x00000001011c4424 mdstartreadv + 668 5 postgres 0x00000001011c7a0c smgrstartreadv + 116 6 postgres 0x000000010115a3b8 AsyncReadBuffers + 2028 7 postgres 0x0000000101159354 StartReadBuffersImpl + 1196 8 postgres 0x0000000101158e98 StartReadBuffers + 64 9 postgres 0x00000001011564a0 read_stream_start_pending_read + 1204 10 postgres 0x0000000101155b84 read_stream_look_ahead + 812 11 postgres 0x000000010115571c read_stream_next_buffer + 2356 12 postgres 0x0000000100b4a04c heap_fetch_next_buffer + 284 13 postgres 0x0000000100b3ade8 heapgettup_pagemode + 192 14 postgres 0x0000000100b3b3b8 heap_getnextslot + 84 15 postgres 0x0000000100ebd818 table_scan_getnextslot + 340 16 postgres 0x0000000100ebd624 SeqNext + 148 17 postgres 0x0000000100ebdcf8 ExecScanFetch + 772 18 postgres 0x0000000100ebd8cc ExecScanExtended + 164 19 postgres 0x0000000100ebd004 ExecSeqScanWithProject + 244 20 postgres 0x0000000100e970f4 ExecProcNode + 68 I also tried reproducing this with -DFDDEBUG, no luck so far, probably due to significant change of the test duration. Best regards, Alexander Lakhin
В списке pgsql-hackers по дате отправления: