Обсуждение: IO in wrong state on riscv64

Поиск

Список

Период

Сортировка

IO in wrong state on riscv64

От

Alexander Lakhin

Дата:

11 октября, 16:00:00

Hello Andres,

I've spotted one more interesting AIO-related failure on a riscv64 machine
[1]:
=== dumping /home/gburd/build/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs ===
diff -U3 /home/gburd/build/HEAD/pgsql/src/test/regress/expected/tsearch.out 
/home/gburd/build/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/tsearch.out
--- /home/gburd/build/HEAD/pgsql/src/test/regress/expected/tsearch.out 2025-10-09 00:06:10.612959611 +0000
+++ /home/gburd/build/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/tsearch.out 2025-10-09 
00:27:46.697588198 +0000
@@ -163,11 +163,7 @@
  (1 row)

  SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
- count
--------
-     6
-(1 row)
-
+ERROR:  IO in wrong state: 0
  SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
   count
  -------

I've managed to reproduce it using qemu-system-riscv64 with Debian trixie
(6.11.10-riscv64, clang 19.1.4):
$ CC=clang-19 ./configure --enable-debug --enable-cassert --enable-tap-tests && time make -j8 -s
...
configure: using CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla 
-Werror=unguarded-availability-new -Wendif-labels -Wmissing-format-attribute -Wcast-function-type -Wformat-security 
-Wmissing-variable-declarations -fno-strict-aliasing -fwrapv -fexcess-precision=standard 
-Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -Wno-format-truncation 
-Wno-cast-function-type-strict -g -O2

when running `make check` in a loop (it fails within 10 iterations for me)
with the reduced parallel_schedule:
test: test_setup

test: boolean char name varchar text int2 int4 int8 oid float8
x10

and extra.config:
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
restart_after_crash = off

debug_parallel_query = regress

shared_buffers =  512kB

backtrace_functions = 'pgaio_io_wait'
# mostly reflects greenfly's config, plus shared_buffers decreased
# below 027_stream_regress's 1MB.

With the following debug logging added:
--- a/src/backend/storage/aio/aio.c
+++ b/src/backend/storage/aio/aio.c
@@ -578,6 +578,20 @@ pgaio_io_wait(PgAioHandle *ioh, uint64 ref_generation)
         PgAioHandleState state;
         bool            am_owner;

+int inistate = ioh->state;
+if (inistate == PGAIO_HS_IDLE)
+{
+  fprintf(stderr, "!!!pgaio_io_wait[%d]| initial ioh->state: %d\n", getpid(), inistate);
+}
+for (int i = 0; i < 10; i++)
+{
+  int curstate = ioh->state;
+  if (curstate != inistate)
+  {
+    elog(PANIC, "!!!pgaio_io_wait| ioh->state changed from %d to %d at iteration %d", inistate, curstate, i);
+    break;
+  }
+}
         am_owner = ioh->owner_procno == MyProcNumber;

         if (pgaio_io_was_recycled(ioh, ref_generation, &state))
@@ -599,6 +613,10 @@ pgaio_io_wait(PgAioHandle *ioh, uint64 ref_generation)
         {
                 if (pgaio_io_was_recycled(ioh, ref_generation, &state))
                         return;
+if (ioh->state == PGAIO_HS_IDLE)
+{
+  fprintf(stderr, "!!!pgaio_io_wait[%d]| ioh->state: %d (initial state: %d)\n", getpid(), ioh->state, inistate);
+}

                 switch ((PgAioHandleState) state)
                 {

I'm observing several cases:
1)
2025-10-11 11:45:17.947 UTC [1191355:258] pg_regress/float8 LOG: statement: SELECT x,
                sind(x),
                sind(x) IN (-1,-0.5,0,0.5,1) AS sind_exact
         FROM (VALUES (0), (30), (90), (150), (180),
               (210), (270), (330), (360)) AS t(x);
2025-10-11 11:45:17.947 UTC [1191359:206] pg_regress/int8 LOG: statement: SELECT * FROM 
generate_series('+4567890123456789'::int8, '+4567890123456799'::int8);
!!!pgaio_io_wait[1191744]| ioh->state: 0 (initial state: 5)
2025-10-11 11:45:17.993 GMT [1191744:1] ERROR:  IO in wrong state: 0
2025-10-11 11:45:17.993 GMT [1191744:2] BACKTRACE:
         postgres: parallel worker for PID 1191355 (+0x4372a0) [0x555561b8a2a0]
         postgres: parallel worker for PID 1191355 (+0x442c66) [0x555561b95c66]
         postgres: parallel worker for PID 1191355 (StartBufferIO+0x66) [0x555561b95ac4]
         postgres: parallel worker for PID 1191355 (+0x43e014) [0x555561b91014]
         postgres: parallel worker for PID 1191355 (StartReadBuffer+0x308) [0x555561b90958]
...
         postgres: parallel worker for PID 1191355 (+0x3194b2) [0x555561a6c4b2]
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffb9ef791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fffb9ef79c4]
         postgres: parallel worker for PID 1191355 (_start+0x20) [0x5555618382c8]
...
2025-10-11 11:45:17.999 UTC [1191355:259] pg_regress/float8 ERROR: IO in wrong state: 0
2025-10-11 11:45:17.999 UTC [1191355:260] pg_regress/float8 BACKTRACE:
         postgres: debian regression [local] SELECT(ProcessParallelMessages+0x114) [0x5555618dca6c]
         postgres: debian regression [local] SELECT(ProcessInterrupts+0x340) [0x555561bce864]
         postgres: debian regression [local] SELECT(+0x2e0e9c) [0x555561a33e9c]
         postgres: debian regression [local] SELECT(standard_ExecutorRun+0x10c) [0x555561a19c20]
...
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffb9ef791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fffb9ef79c4]
         postgres: debian regression [local] SELECT(_start+0x20) [0x5555618382c8]
2025-10-11 11:45:17.999 UTC [1191355:261] pg_regress/float8 STATEMENT:  SELECT x,
                sind(x),
                sind(x) IN (-1,-0.5,0,0.5,1) AS sind_exact
         FROM (VALUES (0), (30), (90), (150), (180),
               (210), (270), (330), (360)) AS t(x);
2025-10-11 11:45:17.999 UTC [1189893:98] LOG:  background worker "parallel worker" (PID 1191744) exited with exit code
1

2)
!!!pgaio_io_wait[1353521]| ioh->state: 0 (initial state: 4)
...
2025-10-11 12:48:00.503 UTC [1353521:1] ERROR:  IO in wrong state: 0
2025-10-11 12:48:00.503 UTC [1353521:2] BACKTRACE:
         postgres: parallel worker for PID 1353449 (+0x4372a0) [0x5555905562a0]
         postgres: parallel worker for PID 1353449 (+0x442c66) [0x555590561c66]
         postgres: parallel worker for PID 1353449 (StartBufferIO+0x66) [0x555590561ac4]
         postgres: parallel worker for PID 1353449 (+0x43e014) [0x55559055d014]
         postgres: parallel worker for PID 1353449 (StartReadBuffer+0x308) [0x55559055c958]
...
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffaacf791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fffaacf79c4]
         postgres: parallel worker for PID 1353449 (_start+0x20) [0x5555902042c8]
2025-10-11 12:48:00.503 UTC [1353521:3] STATEMENT:  SELECT c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';
...
2025-10-11 12:48:00.512 UTC [1352470:51] LOG:  background worker "parallel worker" (PID 1353521) exited with exit code
1
2025-10-11 12:48:00.513 UTC [1353449:24] pg_regress/name ERROR:  IO in wrong state: 0
2025-10-11 12:48:00.513 UTC [1353449:25] pg_regress/name BACKTRACE:
         postgres: debian regression [local] SELECT(ProcessParallelMessages+0x114) [0x5555902a8a6c]
         postgres: debian regression [local] SELECT(ProcessInterrupts+0x340) [0x55559059a864]
         postgres: debian regression [local] SELECT(+0x2e0e9c) [0x5555903ffe9c]
         postgres: debian regression [local] SELECT(standard_ExecutorRun+0x10c) [0x5555903e5c20]
         postgres: debian regression [local] SELECT(+0x481a30) [0x5555905a0a30]
         postgres: debian regression [local] SELECT(PortalRun+0x180) [0x5555905a06e6]
         postgres: debian regression [local] SELECT(+0x480880) [0x55559059f880]
         postgres: debian regression [local] SELECT(PostgresMain+0xdfc) [0x55559059d7d8]
         postgres: debian regression [local] SELECT(+0x4793f0) [0x5555905983f0]
         postgres: debian regression [local] SELECT(postmaster_child_launch+0x132) [0x5555904f787c]
         postgres: debian regression [local] SELECT(+0x3dcd52) [0x5555904fbd52]
         postgres: debian regression [local] SELECT(InitProcessGlobals+0) [0x5555904f9c96]
         postgres: debian regression [local] SELECT(+0x3194b2) [0x5555904384b2]
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffaacf791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fffaacf79c4]
         postgres: debian regression [local] SELECT(_start+0x20) [0x5555902042c8]
2025-10-11 12:48:00.513 UTC [1353449:26] pg_regress/name STATEMENT: SELECT c.f1 FROM NAME_TBL c WHERE c.f1 ~ '[0-9]';

I also observed authentication failed due to this error.

3)
!!!pgaio_io_wait[1169773]| initial ioh->state: 0

2025-10-11 11:34:46.793 GMT [1169773:1] PANIC:  !!!pgaio_io_wait| ioh->state changed from 0 to 1 at iteration 0
# no other iteration number observed
2025-10-11 11:34:46.793 GMT [1169773:2] BACKTRACE:
         postgres: parallel worker for PID 1169684 (+0x4372a0) [0x55555c3492a0]
         postgres: parallel worker for PID 1169684 (+0x442c66) [0x55555c354c66]
         postgres: parallel worker for PID 1169684 (StartBufferIO+0x66) [0x55555c354ac4]
         postgres: parallel worker for PID 1169684 (+0x43e014) [0x55555c350014]
         postgres: parallel worker for PID 1169684 (StartReadBuffer+0x308) [0x55555c34f958]
...
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fff8faf791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fff8faf79c4]
         postgres: parallel worker for PID 1169684 (_start+0x20) [0x55555bff72c8]

2025-10-11 11:34:46.804 UTC [1169684:93] pg_regress/text ERROR: !!!pgaio_io_wait| ioh->state changed from 0 to 1 at 
iteration 0
2025-10-11 11:34:46.804 UTC [1169684:94] pg_regress/text BACKTRACE:
         postgres: debian regression [local] SELECT(ProcessParallelMessages+0x114) [0x55555c09ba6c]
         postgres: debian regression [local] SELECT(ProcessInterrupts+0x340) [0x55555c38d864]
         postgres: debian regression [local] SELECT(+0x2e0e9c) [0x55555c1f2e9c]
         postgres: debian regression [local] SELECT(standard_ExecutorRun+0x10c) [0x55555c1d8c20]
...
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fff8faf791c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fff8faf79c4]
         postgres: debian regression [local] SELECT(_start+0x20) [0x55555bff72c8]
With CFLAGS="-O0", it fails much sooner...

Probably the latter case is OK, I just wanted to check if the field
is apparently changed on the fly...


[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=greenfly&dt=2025-10-09%2000%3A06%3A03

Best regards,
Alexander

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

12 октября, 06:35:02

On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC:  !!!pgaio_io_wait| ioh->state changed from 0 to 1 at iteration 0
> # no other iteration number observed

Can you please disassemble pgaio_io_update_state() and
pgaio_io_was_recycled()?  I wonder if the memory barriers are not
being generated correctly, causing the state and generation to be
loaded out of order, or something like that...

The previous failure on greenfly was a TIMEOUT in the same test, as if
a query was hanging.

I've also been trying to understand two failures on CI on x86 with the
same symptom, but that might be something else, hard to say yet...

https://cirrus-ci.com/task/5551651043409920
https://www.postgresql.org/message-id/CA%2BhUKG%2BHh7u3YpPLKQS%2BB7KJu0Bw99EjhuPuvBRVccDfgod95g%40mail.gmail.com

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

12 октября, 06:58:52

On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> I've managed to reproduce it using qemu-system-riscv64 with Debian trixie

Huh, that's interesting.  What is the host architecture?  When I saw
that error myself and wondered about memory order, I dismissed the
idea of trying with qemu, figuring that my x86 host's TSO would affect
the coherency, but thinking again about that... I guess the compiler
might still reorder during riscv codegen if there is something wrong
with the barrier support, and even if it doesn't, the binary
translation to x86 might also feel free to reorder stuff if there are
no barrier instructions to prevent it?  Or maybe that doesn't happen
but your host is ARM?

Re: IO in wrong state on riscv64

От

Alexander Lakhin

Дата:

12 октября, 08:00:00

Hello Thomas,

12.10.2025 06:35, Thomas Munro wrote:
> On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC:  !!!pgaio_io_wait| ioh->state changed from 0 to 1 at iteration 0
>> # no other iteration number observed
> Can you please disassemble pgaio_io_update_state() and
> pgaio_io_was_recycled()?  I wonder if the memory barriers are not
> being generated correctly, causing the state and generation to be
> loaded out of order, or something like that...

Please find those attached (gdb "disass/m pgaio_io_update_state" misses
the start of the function (but it's still disassembled below), so I
decided to share the whole output).
This is from clean master, without any modifications, but with the issue
confirmed for this build:
2025-10-11 20:29:11.724 UTC [1679534:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.725 UTC [1679536:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.726 UTC [1679538:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.724 UTC [1679533:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.724 UTC [1679537:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.724 UTC [1679535:1] [unknown] LOG:  connection received: host=[local]
2025-10-11 20:29:11.729 UTC [1679539:1] [unknown] LOG:  connection received: host=[local
...
2025-10-11 20:29:11.778 UTC [1679537:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_schema
2025-10-11 20:29:11.778 UTC [1679533:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_type
2025-10-11 20:29:11.778 UTC [1679539:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_procedure
2025-10-11 20:29:11.778 UTC [1679536:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_misc
2025-10-11 20:29:11.778 UTC [1679538:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_table
2025-10-11 20:29:11.778 UTC [1679535:3] [unknown] LOG:  connection authorized: user=debian database=regression 
application_name=pg_regress/create_function_c
2025-10-11 20:29:11.790 UTC [1679534:2] [unknown] FATAL:  IO in wrong state: 0
2025-10-11 20:29:11.790 UTC [1679534:3] [unknown] BACKTRACE:
         postgres: debian regression [local] authentication(+0x4371ec) [0x555565cd61ec]
         postgres: debian regression [local] authentication(+0x442bb2) [0x555565ce1bb2]
         postgres: debian regression [local] authentication(StartBufferIO+0x66) [0x555565ce1a10]
         postgres: debian regression [local] authentication(+0x43df60) [0x555565cdcf60]
         postgres: debian regression [local] authentication(StartReadBuffer+0x308) [0x555565cdc8a4]
         postgres: debian regression [local] authentication(ReadBufferExtended+0x64) [0x555565cdaace]
...
         postgres: debian regression [local] authentication(postmaster_child_launch+0x132) [0x555565c7787c]
         postgres: debian regression [local] authentication(+0x3dcd52) [0x555565c7bd52]
         postgres: debian regression [local] authentication(InitProcessGlobals+0) [0x555565c79c96]
         postgres: debian regression [local] authentication(+0x3194b2) [0x555565bb84b2]
         /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffa690891c]
         /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74) [0x7fffa69089c4]
         postgres: debian regression [local] authentication(_start+0x20) [0x5555659842c8]

> The previous failure on greenfly was a TIMEOUT in the same test, as if
> a query was hanging.

Yeah, I'll try to reproduce it too...

> On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> I've managed to reproduce it using qemu-system-riscv64 with Debian trixie
> Huh, that's interesting.  What is the host architecture?  When I saw
> that error myself and wondered about memory order, I dismissed the
> idea of trying with qemu, figuring that my x86 host's TSO would affect
> the coherency, but thinking again about that... I guess the compiler
> might still reorder during riscv codegen if there is something wrong
> with the barrier support, and even if it doesn't, the binary
> translation to x86 might also feel free to reorder stuff if there are
> no barrier instructions to prevent it?  Or maybe that doesn't happen
> but your host is ARM?

I use AMD Ryzen 9 7900X, Ubuntu 24.04 and run the risc machine with:
qemu-system-riscv64 -machine virt -m 1G -smp 8 -cpu rv64 -device virtio-blk-device,drive=hd \
-drive file=.../dqib_riscv64-virt/image.qcow2,if=none,id=hd -device virtio-net-device,netdev=net \
-netdev user,id=net,hostfwd=tcp::22222-:22 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf \
-kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object rng-random,filename=/dev/urandom,id=rng \
-device virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs console=ttyS0"

I don't know what hardware greenfly uses (CC'ing Greg in case he'd like to
share some info on this), but timings are similar to what I'm seeing:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=greenfly&dt=2025-10-09%2000%3A06%3A03&stg=check
ok 160       - select_parallel                          5019 ms

`make check` on mine, with select_parallel repeated:
ok 160       - select_parallel                          6042 ms
ok 161       - select_parallel                          6089 ms
ok 162       - select_parallel                          6116 ms

copperhead and boomslang, which are using real RISC hardware [1], show
better timings:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=copperhead&dt=2025-10-11%2019%3A36%3A33&stg=check
ok 160       - select_parallel                          2677 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=boomslang&dt=2025-10-12%2002%3A17%3A53&stg=check
ok 160       - select_parallel                          3651 ms

Thank you for looking into this!

[1] https://www.postgresql.org/message-id/3db97903-884f-4b0c-b1cd-d7442e71ea75%40app.fastmail.com

Best regards,
Alexander

Вложения

Re: IO in wrong state on riscv64

От

Greg Burd

Дата:

12 октября, 16:10:27

On Sun, Oct 12, 2025, at 1:00 AM, Alexander Lakhin wrote:
> Hello Thomas,
>
> 12.10.2025 06:35, Thomas Munro wrote:
>> On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin
>> <exclusion@gmail.com> wrote:
>>> 2025-10-11 11:34:46.793 GMT [1169773:1] PANIC:  !!!pgaio_io_wait|
>>> ioh->state changed from 0 to 1 at iteration 0
>>> # no other iteration number observed
>> Can you please disassemble pgaio_io_update_state() and
>> pgaio_io_was_recycled()?  I wonder if the memory barriers are not
>> being generated correctly, causing the state and generation to be
>> loaded out of order, or something like that...
>
> Please find those attached (gdb "disass/m pgaio_io_update_state" misses
> the start of the function (but it's still disassembled below), so I
> decided to share the whole output).
> This is from clean master, without any modifications, but with the issue
> confirmed for this build:
> 2025-10-11 20:29:11.724 UTC [1679534:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.725 UTC [1679536:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.726 UTC [1679538:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.724 UTC [1679533:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.724 UTC [1679537:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.724 UTC [1679535:1] [unknown] LOG:  connection
> received: host=[local]
> 2025-10-11 20:29:11.729 UTC [1679539:1] [unknown] LOG:  connection
> received: host=[local
> ...
> 2025-10-11 20:29:11.778 UTC [1679537:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_schema
> 2025-10-11 20:29:11.778 UTC [1679533:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_type
> 2025-10-11 20:29:11.778 UTC [1679539:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_procedure
> 2025-10-11 20:29:11.778 UTC [1679536:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_misc
> 2025-10-11 20:29:11.778 UTC [1679538:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_table
> 2025-10-11 20:29:11.778 UTC [1679535:3] [unknown] LOG:  connection
> authorized: user=debian database=regression
> application_name=pg_regress/create_function_c
> 2025-10-11 20:29:11.790 UTC [1679534:2] [unknown] FATAL:  IO in wrong
> state: 0
> 2025-10-11 20:29:11.790 UTC [1679534:3] [unknown] BACKTRACE:
>          postgres: debian regression [local] authentication(+0x4371ec)
> [0x555565cd61ec]
>          postgres: debian regression [local] authentication(+0x442bb2)
> [0x555565ce1bb2]
>          postgres: debian regression [local]
> authentication(StartBufferIO+0x66) [0x555565ce1a10]
>          postgres: debian regression [local] authentication(+0x43df60)
> [0x555565cdcf60]
>          postgres: debian regression [local]
> authentication(StartReadBuffer+0x308) [0x555565cdc8a4]
>          postgres: debian regression [local]
> authentication(ReadBufferExtended+0x64) [0x555565cdaace]
> ...
>          postgres: debian regression [local]
> authentication(postmaster_child_launch+0x132) [0x555565c7787c]
>          postgres: debian regression [local] authentication(+0x3dcd52)
> [0x555565c7bd52]
>          postgres: debian regression [local]
> authentication(InitProcessGlobals+0) [0x555565c79c96]
>          postgres: debian regression [local] authentication(+0x3194b2)
> [0x555565bb84b2]
>          /lib/riscv64-linux-gnu/libc.so.6(+0x2791c) [0x7fffa690891c]
>          /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x74)
> [0x7fffa69089c4]
>          postgres: debian regression [local]
> authentication(_start+0x20) [0x5555659842c8]
>
>> The previous failure on greenfly was a TIMEOUT in the same test, as if
>> a query was hanging.
>
> Yeah, I'll try to reproduce it too...
>
>> On Sun, Oct 12, 2025 at 2:00 AM Alexander Lakhin
>> <exclusion@gmail.com> wrote:
>>> I've managed to reproduce it using qemu-system-riscv64 with Debian trixie
>> Huh, that's interesting.  What is the host architecture?  When I saw
>> that error myself and wondered about memory order, I dismissed the
>> idea of trying with qemu, figuring that my x86 host's TSO would affect
>> the coherency, but thinking again about that... I guess the compiler
>> might still reorder during riscv codegen if there is something wrong
>> with the barrier support, and even if it doesn't, the binary
>> translation to x86 might also feel free to reorder stuff if there are
>> no barrier instructions to prevent it?  Or maybe that doesn't happen
>> but your host is ARM?
>
> I use AMD Ryzen 9 7900X, Ubuntu 24.04 and run the risc machine with:
> qemu-system-riscv64 -machine virt -m 1G -smp 8 -cpu rv64 -device
> virtio-blk-device,drive=hd \
> -drive file=.../dqib_riscv64-virt/image.qcow2,if=none,id=hd -device
> virtio-net-device,netdev=net \
> -netdev user,id=net,hostfwd=tcp::22222-:22 -bios
> /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf \
> -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object
> rng-random,filename=/dev/urandom,id=rng \
> -device virtio-rng-device,rng=rng -nographic -append
> "root=LABEL=rootfs
> console=ttyS0"
>
> I don't know what hardware greenfly uses (CC'ing Greg in case he'd
> like to share some info on this),

Hello Alexander, thanks for digging into this issue.

The animal "greenfly" is an OrangePi RV2 [1] with 8GB of RAM.  It has
the optional 256GB eMMC module and uses that for the root filesystem.
It's running Ubuntu 24.04.3 and I'm compiling with clang 18-1.3.  I
think the rest of the details are in the animal registration or in the
logs, but let me know if you have questions or if there are things that
you'd like me to test.

> but timings are similar to what
> I'm seeing:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=greenfly&dt=2025-10-09%2000%3A06%3A03&stg=check
> ok 160       - select_parallel                          5019 ms
>
> `make check` on mine, with select_parallel repeated:
> ok 160       - select_parallel                          6042 ms
> ok 161       - select_parallel                          6089 ms
> ok 162       - select_parallel                          6116 ms
>
> copperhead and boomslang, which are using real RISC hardware [1], show
> better timings:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=copperhead&dt=2025-10-11%2019%3A36%3A33&stg=check
> ok 160       - select_parallel                          2677 ms
>
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=boomslang&dt=2025-10-12%2002%3A17%3A53&stg=check
> ok 160       - select_parallel                          3651 ms
>
> Thank you for looking into this!
>
> [1]
> https://www.postgresql.org/message-id/3db97903-884f-4b0c-b1cd-d7442e71ea75%40app.fastmail.com
>
> Best regards,
> Alexander
> Attachments:
> * pgaio_io_update_state.asm
> * pgaio_io_was_recycled.asm

best.

-greg

[1]
http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

13 октября, 01:44:57

On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> Please find those attached (gdb "disass/m pgaio_io_update_state" misses
> the start of the function (but it's still disassembled below), so I
> decided to share the whole output).

Could you please also disassemble pgaio_io_reclaim()?

Re: IO in wrong state on riscv64

От

Alexander Lakhin

Дата:

13 октября, 07:00:00

13.10.2025 01:44, Thomas Munro wrote:
> On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>> Please find those attached (gdb "disass/m pgaio_io_update_state" misses
>> the start of the function (but it's still disassembled below), so I
>> decided to share the whole output).
> Could you please also disassemble pgaio_io_reclaim()?

Sure, the output of disass/m pgaio_io_reclaim is attached.

A side note: I could not reproduce the hang so far — 8 out of 30
iterations of 027_stream_regress failed, but all completed within
700-800 seconds.

Best regards,
Alexander

Вложения

pgaio_io_reclaim.asm

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

13 октября, 08:40:52

On Mon, Oct 13, 2025 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> 13.10.2025 01:44, Thomas Munro wrote:
> > On Sun, Oct 12, 2025 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> >> Please find those attached (gdb "disass/m pgaio_io_update_state" misses
> >> the start of the function (but it's still disassembled below), so I
> >> decided to share the whole output).
> > Could you please also disassemble pgaio_io_reclaim()?
>
> Sure, the output of disass/m pgaio_io_reclaim is attached.

Thanks.  All seems to have something plausible in the right places,
but I know nothing about RISC-V... hmm, what happens if you replace
pg_{read,write}_barrier() with pg_memory_barrier(), in those three
functions?  And if it happens to help, perhaps you could try to figure
out which one(s) help?  Not that it should be necessary but as a clue
or to rule out this line of enquiry... I guess that should generate
FENCE RW,RW, meaning wait for all preceding reads and writes to
complete and don't let any following reads or writes begin, but that's
just from googling...

Re: IO in wrong state on riscv64

От

Alexander Lakhin

Дата:

14 октября, 07:00:00

Hello Thomas,

13.10.2025 08:40, Thomas Munro wrote:
> Thanks.  All seems to have something plausible in the right places,
> but I know nothing about RISC-V... hmm, what happens if you replace
> pg_{read,write}_barrier() with pg_memory_barrier(), in those three
> functions?  And if it happens to help, perhaps you could try to figure
> out which one(s) help?  Not that it should be necessary but as a clue
> or to rule out this line of enquiry... I guess that should generate
> FENCE RW,RW, meaning wait for all preceding reads and writes to
> complete and don't let any following reads or writes begin, but that's
> just from googling...

The replacements doesn't work for me, unfortunately: I have 3 out of 30
027_stream_regress test runs failed:
2025-10-13 21:27:26.161 UTC [4181290:5] pg_regress/brin ERROR:  IO in wrong state: 0
2025-10-13 21:27:26.161 UTC [4181290:6] pg_regress/brin STATEMENT: CREATE TABLE brintest (byteacol bytea,

Disassembly of those three functions is attached.

I'll try to find some simple C programs to test memory barriers outside of
postgres...

Best regards,
Alexander

Вложения

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

15 октября, 07:00:07

On Tue, Oct 14, 2025 at 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> The replacements doesn't work for me, unfortunately: I have 3 out of 30
> 027_stream_regress test runs failed:
> 2025-10-13 21:27:26.161 UTC [4181290:5] pg_regress/brin ERROR:  IO in wrong state: 0
> 2025-10-13 21:27:26.161 UTC [4181290:6] pg_regress/brin STATEMENT: CREATE TABLE brintest (byteacol bytea,

I wonder if the problem could be with buf->io_wref.  In WaitIO() we do:

        iow = buf->io_wref;
        UnlockBufHdr(buf, buf_state);
        ...
            pgaio_wref_wait(&iow);

... but UnlockBufHdr() is only concerned with write ordering:

static inline void
UnlockBufHdr(BufferDesc *desc, uint32 buf_state)
{
    pg_write_barrier();
    pg_atomic_write_u32(&desc->state, buf_state & (~BM_LOCKED));
}

What happens if you insert pg_memory_barrier() after the line that
reads buf->io_wref?

Does this reproduce on REL_18_STABLE?

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

20 октября, 05:05:53

Updating the list with some progress:  Alexander narrowed it down to
pgaio_io_wait() being inlined into pgaio_io_was_recycled().  It seems
to have some instructions in the wrong order, and that can be
reproduced standalone, so I've asked about it here:

https://github.com/llvm/llvm-project/issues/164194

I'm not seeing how that could be our fault, but let's see...

Re: IO in wrong state on riscv64

От

Andres Freund

Дата:

22 октября, 13:51:45

Hi,

On 2025-10-20 13:05:53 +1100, Thomas Munro wrote:
> Updating the list with some progress:  Alexander narrowed it down to
> pgaio_io_wait() being inlined into pgaio_io_was_recycled().  It seems
> to have some instructions in the wrong order, and that can be
> reproduced standalone, so I've asked about it here:
> 
> https://github.com/llvm/llvm-project/issues/164194
> 
> I'm not seeing how that could be our fault, but let's see...

This is some awesome work Alexander and Thomas. Thanks for doing all this
investigation.

Have you figured out if the LLVM bug is specific to riscv or is more general?
Afaict most of the MachineSink stuff mentioned in the llvm bug is generic, not
arch specific. But I'd be surprised if such a problem in a more widely used
arch hadn't been hit long ago...

It's also somewhat surprising that this hasn't caused issues more widely in PG
on riscv...

Greetings,

Andres Freund

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

23 октября, 00:51:26

On Wed, Oct 22, 2025 at 11:51 PM Andres Freund <andres@anarazel.de> wrote:
> Have you figured out if the LLVM bug is specific to riscv or is more general?
> Afaict most of the MachineSink stuff mentioned in the llvm bug is generic, not
> arch specific. But I'd be surprised if such a problem in a more widely used
> arch hadn't been hit long ago...

It also happens on MIPS, Loongarch and x86 (I think?), but not POWER
with that godbolt link I posted.

I wonder if the direction this will go is that it's actually UB: the
C11 memory model only defines fences for atomic operations in section
7.17.4.  I think that's basically what this says, and perhaps it's
also showing what that mention of mayStore is about:

https://llvm.org/docs/Atomics.html#optimization-outside-atomic

Surprisingly, even __sync_synchronize() doesn't seem to prevent Clang
from sinking that load instruction.  That led me to an interesting old
discussion:

https://discourse.llvm.org/t/intended-semantics-for-fence-seq-cst/28591/4

"... it is not an explicit goal of LLVM to
support or continue supporting legacy code that did what it had to to
express functional concurrent code. It may happen to work now, but
unless enough LLVM users express interest this may break one day, and
__sync_synchronize may not order anything (it may just emit a fence
without forcing any ordering)."

"... trying to make user code that doesn't use
volatile or atomic, but does use __sync_synchronize, work as intended
isn't one of LLVM's goals."

Hngghhghh...

On a more practical note, Alexander, does this work?

 #if !defined(pg_read_barrier_impl) && defined(HAVE_GCC__ATOMIC_INT32_CAS)
 /* acquire semantics include read barrier semantics */
-#              define pg_read_barrier_impl()
__atomic_thread_fence(__ATOMIC_ACQUIRE)
+#              define pg_read_barrier_impl()           do {
__atomic_signal_fence(__ATOMIC_ACQUIRE);
__atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0)
 #endif

Someone might want to write an arch-riscv.h that spits out the asm
instead (should be just FENCE R, R and FENCE W, W as far as I can see,
which might be slightly better than what acquire/release generate, on
paper at least), but apparently this problem is wider than RISCV so we
need a general solution anyway.

Re: IO in wrong state on riscv64

От

Alexander Lakhin

Дата:

23 октября, 10:00:00

Hello Thomas,

23.10.2025 00:51, Thomas Munro wrote:

On a more practical note, Alexander, does this work?
 #if !defined(pg_read_barrier_impl) && defined(HAVE_GCC__ATOMIC_INT32_CAS) /* acquire semantics include read barrier semantics */
-#              define pg_read_barrier_impl()
__atomic_thread_fence(__ATOMIC_ACQUIRE)
+#              define pg_read_barrier_impl()           do {
__atomic_signal_fence(__ATOMIC_ACQUIRE);
__atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0) #endif

Someone might want to write an arch-riscv.h that spits out the asm
instead (should be just FENCE R, R and FENCE W, W as far as I can see,
which might be slightly better than what acquire/release generate, on
paper at least), but apparently this problem is wider than RISCV so we
need a general solution anyway.

Thank you for working on this!

Unfortunately, this change doesn't lead to change in disassembly of
pgaio_io_wait(), produced with clang-19 -O1. That is, I'm getting the same
disassembly as the one I sent you before (error/pgaio_io_wait.asm).
I tried:
#if !defined(pg_read_barrier_impl) && defined(HAVE_GCC__ATOMIC_INT32_CAS)
/* acquire semantics include read barrier semantics */
-# define pg_read_barrier_impl() __atomic_thread_fence(__ATOMIC_ACQUIRE)
+# define pg_read_barrier_impl() do { } while (0)
#endif
to make sure that the change of pg_read_barrier_impl() affects
pgaio_io_wait() and really saw a difference in the disassembly.

I haven't run tests with the modified version yet, going to test later
just to be sure, but I suppose we want to see the binary code changed.

Best regards,
Alexander

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

23 октября, 10:52:29

On Thu, Oct 23, 2025 at 8:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> Unfortunately, this change doesn't lead to change in disassembly of
> pgaio_io_wait(), produced with clang-19 -O1. That is, I'm getting the same
> disassembly as the one I sent you before (error/pgaio_io_wait.asm).

Oh, I think this should work better:

#define pg_read_barrier_impl() \
do { pg_compiler_barrier_impl();
__atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0)

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

08 ноября, 02:47:56

On Thu, Oct 23, 2025 at 8:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Oh, I think this should work better:
>
> #define pg_read_barrier_impl() \
> do { pg_compiler_barrier_impl();
> __atomic_thread_fence(__ATOMIC_ACQUIRE); } while (0)

Alexander did some extensive testing and we stared at the codegen on a
lot of architectures to confirm that this prevents the reordering.
Pushed and back-patched like that.

Someone should probably write an arch-riscv.h file that generates
FENCE R, R and FENCE W, W (these are a bit weaker than what the
compiler is generating but strong enough for our "traditional
pre-C11-style" barrier semantics) and it's probably safe to declare
that RISC-V has atomic single copy for int64_t (hard to imagine this
isn't try, but we would need to cite the chapter and verse of the
technical manual).  But RISC-V was really just the messenger here, so
we had to address the generic version anyway.

Re: IO in wrong state on riscv64

От

Tom Lane

Дата:

08 ноября, 03:03:46

Thomas Munro <thomas.munro@gmail.com> writes:
> Alexander did some extensive testing and we stared at the codegen on a
> lot of architectures to confirm that this prevents the reordering.
> Pushed and back-patched like that.

You're a brave man to be pushing this into the last-ever release
of v13 with all of 12 hours remaining till code freeze.  I don't
mind so much for the newer branches, but I'm feeling nervous
about the risk/reward ratio for v13.

            regards, tom lane

Re: IO in wrong state on riscv64

От

Andres Freund

Дата:

08 ноября, 03:10:58

Hi,

On 2025-11-07 19:03:46 -0500, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Alexander did some extensive testing and we stared at the codegen on a
> > lot of architectures to confirm that this prevents the reordering.
> > Pushed and back-patched like that.
> 
> You're a brave man to be pushing this into the last-ever release
> of v13 with all of 12 hours remaining till code freeze.  I don't
> mind so much for the newer branches, but I'm feeling nervous
> about the risk/reward ratio for v13.

The patch doesn't do anything complicated, it just adds a compiler barrier to
the read/write barriers. As both the compiler barrier and read/write barrier
code is old I don't see a whole lot of potential for problems here. If this
were changing inline assembly or such, it'd be a different story...

Greetings,

Andres Freund

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

08 ноября, 03:13:32

On Sat, Nov 8, 2025 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You're a brave man to be pushing this into the last-ever release
> of v13 with all of 12 hours remaining till code freeze.  I don't
> mind so much for the newer branches, but I'm feeling nervous
> about the risk/reward ratio for v13.

I figured "adding a no-op compiler barrier where we already believed
there to be a compiler barrier and rely on it crucially" was pretty
safe as these things go, but that's a good point about v13 and I can
revert it in that one branch in a couple of hours if that's the
consensus.

Re: IO in wrong state on riscv64

От

Tom Lane

Дата:

08 ноября, 03:24:24

Thomas Munro <thomas.munro@gmail.com> writes:
> On Sat, Nov 8, 2025 at 1:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're a brave man to be pushing this into the last-ever release
>> of v13 with all of 12 hours remaining till code freeze.  I don't
>> mind so much for the newer branches, but I'm feeling nervous
>> about the risk/reward ratio for v13.

> I figured "adding a no-op compiler barrier where we already believed
> there to be a compiler barrier and rely on it crucially" was pretty
> safe as these things go, but that's a good point about v13 and I can
> revert it in that one branch in a couple of hours if that's the
> consensus.

It looks fairly safe to me too, or I'd be pushing back harder.
Still, we routinely get burnt by things we push into a branch's
final release, so I think it's worth having a discussion about
whether this has any possible downsides.

            regards, tom lane

Re: IO in wrong state on riscv64

От

Tom Lane

Дата:

08 ноября, 22:26:14

... BTW, I wonder why you did not add pg_compiler_barrier_impl()
to our other use of __atomic_thread_fence:

#if !defined(pg_memory_barrier_impl)
#    if defined(HAVE_GCC__ATOMIC_INT32_CAS)
#        define pg_memory_barrier_impl()        __atomic_thread_fence(__ATOMIC_SEQ_CST)
#    elif defined(__GNUC__)
#        define pg_memory_barrier_impl()        __sync_synchronize()
#    endif
#endif /* !defined(pg_memory_barrier_impl) */

If the problem is that Clang doesn't treat __atomic_thread_fence
as a compiler barrier, why is this usage safer than the other two?

            regards, tom lane

Re: IO in wrong state on riscv64

От

Thomas Munro

Дата:

09 ноября, 01:50:39

On Sun, Nov 9, 2025 at 8:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... BTW, I wonder why you did not add pg_compiler_barrier_impl()
> to our other use of __atomic_thread_fence:
>
> #if !defined(pg_memory_barrier_impl)
> #    if defined(HAVE_GCC__ATOMIC_INT32_CAS)
> #        define pg_memory_barrier_impl()        __atomic_thread_fence(__ATOMIC_SEQ_CST)
> #    elif defined(__GNUC__)
> #        define pg_memory_barrier_impl()        __sync_synchronize()
> #    endif
> #endif /* !defined(pg_memory_barrier_impl) */
>
> If the problem is that Clang doesn't treat __atomic_thread_fence
> as a compiler barrier, why is this usage safer than the other two?

__sync_synchronize() actually has the same problem, more surprisingly
if you ask me, but I suppose it doesn't seem very likely that they
ever shipped a compiler with the C11-reinterpretation retrofitted
underneath the traditional GCC __sync workalikes that didn't qualify
for the top branch instead.  You're quite right about that one though,
and I'm annoyed at myself for missing it.  I suspect it might be less
prone to actual problems due to details of the contexts it's used in
(ie luck), a bit like pg_write_barrier() as far I know so far.  I'm
looking into the codegen across our branches in practice, but I
imagine that in any case it should be OK as a follow-up fix for the
next release, it's just an additional problem and hopefully only a
hypothetical one...  I'm looking into the codegen in practice across
branches, compilers and architectures to see if I can confirm that...

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: IO in wrong state on riscv64

Вложения

Вложения

Вложения