Обсуждение: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Good day, Andres and hackers. UnlockBufHdrExt does: buf_state |= set_bits; buf_state &= ~unset_bits; buf_state &= ~BM_LOCKED; TerminateBufferIO unconditionally does: unset_flag_bits |= BM_IO_ERROR; Due to this, AbortBufferIO and buffer_readv_complete_one are failed to set BM_IO_ERROR with call to TerminateBufferIO. It was found with proprietary code that was triggered on PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence. -- regards Yura Sokolov aka funny-falcon
Hi, On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote: > UnlockBufHdrExt does: > > buf_state |= set_bits; > buf_state &= ~unset_bits; > buf_state &= ~BM_LOCKED; > > TerminateBufferIO unconditionally does: > > unset_flag_bits |= BM_IO_ERROR; > > Due to this, AbortBufferIO and buffer_readv_complete_one are failed > to set BM_IO_ERROR with call to TerminateBufferIO. > > It was found with proprietary code that was triggered on > PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence. That's clearly not right. Care to write a patch? I think we should add a test for this in src/test/modules/test_aio too. As we don't rely on things like BM_IO_ERROR this is quite easy to not notice. Greetings, Andres Freund
26.03.2026 23:29, Andres Freund wrote:
> Hi,
>
> On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote:
>> UnlockBufHdrExt does:
>>
>> buf_state |= set_bits;
>> buf_state &= ~unset_bits;
>> buf_state &= ~BM_LOCKED;
>>
>> TerminateBufferIO unconditionally does:
>>
>> unset_flag_bits |= BM_IO_ERROR;
>>
>> Due to this, AbortBufferIO and buffer_readv_complete_one are failed
>> to set BM_IO_ERROR with call to TerminateBufferIO.
>>
>> It was found with proprietary code that was triggered on
>> PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence.
>
> That's clearly not right. Care to write a patch? I think we should add a
> test for this in src/test/modules/test_aio too. As we don't rely on things
> like BM_IO_ERROR this is quite easy to not notice.
I thought, fix is too small to go the way of patches.
I don't mind if you just push it.
But if I can save your time on writing test, I'll try.
...
I believe, proper way is to add assertion in UnlockBufHdrExt:
Assert(!(set_bits & unset_bits));
And make TerminateBufferIO to exclude BM_IO_ERROR if it is among
set_flag_bits.
Is it ok?
--
regards
Yura Sokolov aka funny-falcon
Good day, 27.03.2026 12:04, Yura Sokolov wrote: > 26.03.2026 23:29, Andres Freund wrote: >> Hi, >> >> On 2026-03-25 17:51:30 +0300, Yura Sokolov wrote: >>> UnlockBufHdrExt does: >>> >>> buf_state |= set_bits; >>> buf_state &= ~unset_bits; >>> buf_state &= ~BM_LOCKED; >>> >>> TerminateBufferIO unconditionally does: >>> >>> unset_flag_bits |= BM_IO_ERROR; >>> >>> Due to this, AbortBufferIO and buffer_readv_complete_one are failed >>> to set BM_IO_ERROR with call to TerminateBufferIO. >>> >>> It was found with proprietary code that was triggered on >>> PGAIO_RS_ERROR and made assertion on BM_IO_ERROR presence. >> >> That's clearly not right. Care to write a patch? I think we should add a >> test for this in src/test/modules/test_aio too. As we don't rely on things >> like BM_IO_ERROR this is quite easy to not notice. > I thought, fix is too small to go the way of patches. > I don't mind if you just push it. > > But if I can save your time on writing test, I'll try. > > ... > > I believe, proper way is to add assertion in UnlockBufHdrExt: > > Assert(!(set_bits & unset_bits)); > > And make TerminateBufferIO to exclude BM_IO_ERROR if it is among > set_flag_bits. > > Is it ok? Here patchset is. I've tried to modify 001_aio.pl to test presence BM_IO_ERROR after read failure. I've also added FlushBuffer failure test in second patch (v1-002) using injection point. I don't know if 001-aio.pl is a good place for since write is not async yet. v1-003 just mades DebugPrintBufferRefcount prettier. -- regards Yura Sokolov aka funny-falcon