Re: Non-reproducible AIO failure

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Non-reproducible AIO failure
Дата
Msg-id f9e85423-4d1e-4106-a468-d348dee4610d@garret.ru
обсуждение исходный текст
Ответ на Re: Non-reproducible AIO failure  (Andres Freund <andres@anarazel.de>)
Ответы Re: Non-reproducible AIO failure
Список pgsql-hackers


On 26/08/2025 3:37 AM, Andres Freund wrote:
Hi,

I'm a bit confused by this focus on bitfields - both Alexander and Konstantin
stated they could reproduce the issue without the bitfields.


Sorry if I am not correct, but it seems that the problem was never reproduced with replaced bitfields.
I have once again looked through all this thread (it is really long;) and find just one place when you wrote, that you was able to reproduce the problem without bitfields. But as far as understand it was the different problem which was fixed by your patch v1-0001-aio-Add-missing-memory-barrier-when-waiting-for-I.patch

Still it is not quite clear to me how bitfields can cause this issue.
Yes, fitfields are updated all together - when you update one field, all other bitfields are also rewritten.
But actually compiler (at least some versions) does quite straightforward thing (especially without optimization):
1. Loads all bitfields.
2. Update required bitfield.
3. Stores all bitfields.

It is very unlikely that some stale version is stored in registers - without -O optimizer is using registers only locally and doesn't try to keep some variables in registers. Also not clear how we can read some stale value from memory if we perform write barrier before updating status (in pgaio_io_update_state).
It it impossible that `op` and `state` fields belongs to different cache lines because PgAioHandle is aligned on 8. So doesn;t matter whether there are bitfields or bytes, this fields should be  propagated between cores all together.
Certainly, extraction of bitfields is not an atomic operations (as we see in assembler, it is done using two loads).
So we can observe inconsistent state of this two fields.

Let's consider fields `op` and `state`. Let's their initial state is `op=1`m `state=1`.
The we do update `op=2` and call pgaio_io_update_state(2). Some other task can see (op=1, state=1)  or (op=2, state=1) or (op=2, state=2) but not (op=1, state=2). 

The problem can happen only if some other task tries to update `op` without prior checking for `state`.
In this case we can get (op=3,state=1) as well as (op=2,state=2). 
But `op` field is updated in just two places: pgaio_io_reclaim and pgaio_io_stage. And in both cases we first check 'state' value.

What makes me concern is what happen in case of branch misprediction.
Assume that we have code:

if (state == 1) {
    op = 2;
} else { 
    op = 3;
}

It seems to be absolutely legal, but what will happen in case if with speculative execution processor start executing first branch and does assignment op=2, but then state is loaded and used to be not equal to 1, so processor has to undone this execution. But for some moment we we can have state (state=0 and op=2) which can be observed by other task and cause assertion failure.
Is it really possible?

But we have observed the generated code being pretty grotty and it's caused
more than enough confusion - so let's just replace them with plain uint8's and
cast in switches.

+1

May be I am wrong, but it seems to me that after add-moissing-memory-barrier patch was applied nobody reproduced assertion failure with replaced bitfields.

В списке pgsql-hackers по дате отправления: