Re: GNU/Hurd portability patches

Поиск

Список

Период

Сортировка

От	Alexander Lakhin
Тема	Re: GNU/Hurd portability patches
Дата	10 ноября 22:00:01
Msg-id	fa85e679-9d13-43ae-8882-3f50c709f446@gmail.com обсуждение исходный текст
Ответ на	Re: GNU/Hurd portability patches (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: GNU/Hurd portability patches Re: GNU/Hurd portability patches
Список	pgsql-hackers

Дерево обсуждения

Hello Thomas and Michael!

Sorry for the delay. I've finally completed a new round of experiments and
discovered the following:

12.10.2025 03:42, Thomas Munro wrote:

Hmm.  We only install the handler for real signal numbers, and it
clearly managed to find the handler, so then how did it corrupt signo
before calling the function?  I wonder if there could concurrency bugs
reached by our perhaps unusually large amount of signaling (we have
found bugs in the signal implementations of several other OSes...).
This might be the code:

https://github.com/bminor/glibc/blob/master/hurd/hurdsig.c#L639

It appears to suspend the thread selected to handle the signal, mess
with its stack/context and then resume it, just like traditional
monokernels, it's just done in user space by code running in a helper
thread that communicates over Mach ports.  So it looks like I
misunderstood that comment in the docs, it's not the handler itself
that runs in a different thread, unless I'm looking at the wrong code
(?).

Some random thoughts after skim-reading that and
glibc/sysdeps/mach/hurd/x86/trampoline.c:
* I wonder if setting up sigaltstack() and then using SA_ONSTACK in
pqsignal() would behave differently, though SysV AMD64 calling
conventions (used by Hurd IIGC) have the first argument in %rdi, not
the stack, so I don't really expect that to be relevant...
* I wonder about the special code paths for handlers that were already
running and happened to be in sigreturn(), or something like that,
which I didn't study at all, but it occurred to me that our pqsignal
will only block the signal itself while running a handler (since it
doesn't specify SA_NODEFER)... so what happens if you block all
signals while running each handler by changing
sigemptyset(&act.sa_mask) to sigfillset(&act.sa_mask)?

Thank you for the suggestion!

With this modification:
@@ -137,7 +140,7 @@ pqsignal(int signo, pqsigfunc func)

#if !(defined(WIN32) && defined(FRONTEND))
        act.sa_handler = func;
-       sigemptyset(&act.sa_mask);
+       sigfillset(&act.sa_mask);
        act.sa_flags = SA_RESTART;

I got 100 iterations passed (12 of them hanged) without that Assert
triggered.

* I see special code paths for SIGIO and SIGURG that I didn't try to
understand, but I wonder what would happen if we s/SIGURG/SIGXCPU/

With sed 's/SIGURG/SIGXCPU/' -i src/backend/storage/ipc/waiteventset.c, I
still got:
!!!wrapper_handler[8401]| postgres_signal_arg: 28565808, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 8401
...
2025-11-09 12:51:24.095 GMT postmaster[7282] LOG: client backend (PID 8401) was terminated by signal 6: Aborted
2025-11-09 12:51:24.095 GMT postmaster[7282] DETAIL: Failed process was running: UPDATE PKTABLE set ptest2=5 where ptest2=2;
---

!!!wrapper_handler[21000]| postgres_signal_arg: 28545040, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 21000
...
2025-11-09 13:06:59.458 GMT postmaster[20669] LOG: client backend (PID 21000) was terminated by signal 6: Aborted
2025-11-09 13:06:59.458 GMT postmaster[20669] DETAIL: Failed process was running: UPDATE pvactst SET i = i WHERE i < 1000;
---
!!!wrapper_handler[21973]| postgres_signal_arg: 28562608, PG_NSIG: 33
TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File: "pqsignal.c", Line: 94, PID: 21973

2025-11-09 14:56:23.955 GMT postmaster[20665] LOG: client backend (PID 21973) was terminated by signal 6: Aborted
2025-11-09 14:56:23.955 GMT postmaster[20665] DETAIL: Failed process was running: INSERT INTO pagg_tab_m SELECT i % 30, i % 40, i % 50 FROM generate_series(0, 2999) i;

The failure rate is approximately 1 per 30 runs.

Besides that Assert and the hangs, I also observed:
--- /home/demo/postgresql/src/test/regress/expected/xml.out     2025-10-11 10:04:43.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/xml.out      2025-11-10 07:20:56.000000000 +0000
@@ -1788,10 +1788,14 @@
                                          proargtypes text))
    SELECT * FROM z
    EXCEPT SELECT * FROM x;
- proname | proowner | procost | pronargs | proargnames | proargtypes
----------+----------+---------+----------+-------------+-------------
-(0 rows)
-
+ERROR: could not parse XML document
+DETAIL: line 1: Input is not proper UTF-8, indicate encoding !
+Bytes: 0x92 0x11 0x69 0x3C
+<data>X~R^Qi<proc><proname>pg_get_replication_slots</proname><proowner>10</proowne
+       ^
+line 1: PCDATA invalid Char value 17
+<data>X~R^Qi<proc><proname>pg_get_replication_slots</proname><proowner>10</proowne
+

TRAP: failed Assert("AllocBlockIsValid(block)"), File: "aset.c", Line: 1536, PID: 16354
...
2025-11-09 10:21:16.249 GMT postmaster[15242] LOG: client backend (PID 16354) was terminated by signal 6: Aborted
2025-11-09 10:21:16.249 GMT postmaster[15242] DETAIL: Failed process was running: CREATE INDEX i_bmtest_a ON bmscantest(a);
2025-11-09 10:21:16.249 GMT postmaster[15242] LOG: terminating any other active server processes

TRAP: failed Assert("npages == tbm->npages"), File: "tidbitmap.c", Line: 825, PID: 4641
...
2025-10-14 12:09:00.555 BST postmaster[3818] LOG: client backend (PID 4641) was terminated by signal 6: Aborted
2025-10-14 12:09:00.555 BST postmaster[3818] DETAIL: Failed process was running: select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;

--- /home/demo/postgresql/src/test/regress/expected/join_hash.out       2025-10-11 10:04:34.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/join_hash.out        2025-10-14 11:30:16.000000000 +0100
@@ -485,20 +485,12 @@
(8 rows)

select count(*) from simple r join extremely_skewed s using (id);
- count
--------
- 20000
-(1 row)
-
+ERROR: could not read from temporary file: read only 411688 of 47854847 bytes

--- /home/demo/postgresql/src/test/regress/expected/bitmapops.out       2025-10-11 10:04:29.000000000 +0100
+++ /home/demo/postgresql/src/test/regress/results/bitmapops.out        2025-10-14 11:08:58.000000000 +0100
@@ -13,6 +13,10 @@
   SELECT (r%53), (r%59), 'foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'
   FROM generate_series(1,70000) r;
CREATE INDEX i_bmtest_a ON bmscantest(a);
+ERROR: index row size 6736 exceeds btree version 4 maximum 2704 for index "i_bmtest_a"

30.10.2025 17:30, Michael Banck wrote:

I checked this, if I just run the following excerpt of
entry_timestamp.sql in a tight loop, I get a few (<10) occurrances out
of 10000 iterations where min/max plan time is 0 (or rather
minmax_plan_zero is non-zero):

SELECT pg_stat_statements_reset();
SET pg_stat_statements.track_planning = TRUE;
SELECT 1 AS "STMTTS1";
SELECT  count(*) as total,  count(*) FILTER (    WHERE min_plan_time + max_plan_time = 0  ) as minmax_plan_zero
FROM pg_stat_statements
WHERE query LIKE '%STMTTS%';

On the assumption that this isn't a general bug, but just a timing issue
(planning 'SELECT 1' isn't complicated), I see two possibilities:

1. Ignore the plan times, and replace SELECT 1 with SELECT
pg_sleep(1e-6), similar to e849bd551. I guess this would reduce test
coverage so likely not be great?

2. Make the query a bit more complicated so that the plan time is likely
to be non-negligable. I actually had to go quite a way to make it pretty
failsafe, the attached made it fail less than 5 times out of 50000
iterations, not sure whether that is acceptable or still considered
flaky?

What concerns me is that there is also subscription.sql and maybe could
be other test(s) that expect at least 1000ns (far from infinite) timer
resolution. Probably it would make sense to define which timer resolution
we consider acceptable for tests and then to check if Hurd can provide it.

Best regards,
Alexander

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: GNU/Hurd portability patches