Re: Non-reproducible AIO failure

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Non-reproducible AIO failure
Дата
Msg-id 1f64e33a-fc2f-421b-9d81-c4417db71697@garret.ru
обсуждение исходный текст
Ответ на Re: Non-reproducible AIO failure  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On 06/06/2025 9:47 pm, Andres Freund wrote:
Hi,

On 2025-06-06 14:03:12 +0300, Konstantin Knizhnik wrote:
There is really essential difference in code generated by clang 15 (working)
and 16 (not working).
There also are code gen differences between upstream clang 17 and apple's
clang, which is based on llvm 17 as well (I've updated the toolchain, it
repros with that as well).

The code generated for the bitfield access is so attrocious that it's worth
changing to plain uint8 for that alone. It's annoying to have to add casts to
all the switches, but clearly there isn't a better way for now.

+1

What specific compilers / versions were you comparing? I don't quite see
similar assembly...


My MacBook (bug is not reproduced):

Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin22.6.0

Alexander's MacBook (bug is reproduced)

Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin23.5.0


As you can see pg16 just updates third byte of the structure (`op` field).
It is not affecting 'state' field at all.
It's really weird code. I think the state store would be in the call to
pgaio_io_update_state().

Yes, and that seems to be the problem,

While clang16 is also updating `state` field!
I assume this was intending to refer to clang15?  FWIW, I don't think it does
update the state field in the assembly you shared.

clang 15 (old) doesn't update `state` field. And this code works normally.
clang 16 - does.

This is very odd code, but I don't immediately see a bug in it :(

I also not an expert in ARM assembler. My analyze of the code is mostly the same as your.
Looks like the code generated by clang 16 does update all three bitfields in `pgaio_io_stage`.
As far as I understand assignment is correct: so it uses old (extracted) value of state and new (passed to the function as parameter) value of op).
But I am not 100% sure here.

It is not quite clear to me why it happen, because `pgaio_io_update_state` is not actually inlined.
And there is at least one problem that assignment of state moved through memory barrier  (in `pgaio_io_update_state`).

В списке pgsql-hackers по дате отправления: