Re: Non-reproducible AIO failure
От | Konstantin Knizhnik |
---|---|
Тема | Re: Non-reproducible AIO failure |
Дата | |
Msg-id | 7235a473-e949-404e-a85c-ccefd81c2efa@garret.ru обсуждение исходный текст |
Ответ на | Re: Non-reproducible AIO failure (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On 06/06/2025 2:31 am, Tom Lane wrote: > Matthias van de Meent <boekewurm+postgres@gmail.com> writes: >> I have a very wild guess that's probably wrong in a weird way, but >> here goes anyway: >> Did anyone test if interleaving the enum-typed bitfield fields of >> PgAioHandle with the uint8 fields might solve the issue? > Ugh. I think you probably nailed it. > > IMO all those struct fields better be declared uint8. > > regards, tom lane I also think that it can be in compiler. Bitfields with different enum type looks really exotic, so no wonder that optimizer can do something strange here. I failed to reproduce the problem with old version of clang (15.0). Also as far as I understand nobody was able to reproduce the problem with disabled optimizations (-O0). It definitely doesn't mean that there is bug in optimizer - just timing can be changed. Still it is not quite clear to me how `PGAIO_OP_READV` is managed to be written. There is just one place in the code when it is assigned: ``` pgaio_io_start_readv(PgAioHandle *ioh, int fd, int iovcnt, uint64 offset) { ... pgaio_io_stage(ioh, PGAIO_OP_READV); } ``` and `pgaio_io_stage` should update both `state` and `op`: ``` ioh->op = op; ioh->result = 0; pgaio_io_update_state(ioh, PGAIO_HS_DEFINED); ``` But as we see from the trace state is still PGAIO_HS_HANDED_OUT, so it was not updated. If there is some bug in optimizer which incorrectly construct mask for bitfield assignment, it is still not clean where it managed to get this PGAIO_OP_READV. And we can be sure that it is really PGAIO_OP_READV and just arbitrary garbage, because Alexander has replaced its value with 0xaa and we see in logs that it is rally stored. If there is race condition in `pgaio_io_update_state` (which enforces memory barrier before updating state) then for example inserting some sleep between assignment operation and status should increase probability of error. But it doesn't happen. Also as far as I understand, op is updated and read by the same backend. So it should not be some synchronization issue. So most likely it is bug in optimizer which generates incorrect code. Can Alexander or somebody else who was able to reproduce the problem share assembler code of `pgaio_io_reclaim` function? I am not sure that the bug is in this function - but it is prime suspect. Only `pgaio_io_start_readv` can set PGAIO_OP_READV, but we are almost sure that it was no called. So looks like that `op` was not cleared despite to what we see in logs. But if there was incorrect code in `pgaio_io_reclaim`, then it should always work incorrectly - doesn't clear "op" but in most cases it works...
В списке pgsql-hackers по дате отправления: