Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Дата
Msg-id bdb91b16-b6da-32a8-15fd-c024e11b453a@gmail.com
обсуждение исходный текст
Ответ на Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED  (Andres Freund <andres@anarazel.de>)
Ответы Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Список pgsql-hackers
Hello,
18.02.2023 04:06, Andres Freund wrote:
> Hi,
>
> On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
>> I still have no theory for how this condition was reached despite a
>> lot of time thinking about it and searching for more clues.  As far as
>> I can tell, the recent improvements to postmaster's signal and event
>> handling shouldn't be related: the state management and logic was
>> unchanged.
> Yea, it's all very odd.
>
> If you look at the log:
>
> 2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL:  No valid identifier after ".".
> 2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT:  SELECT parse_ident('xxx.1020');
> ...
> TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File:
"../src/backend/storage/ipc/pmsignal.c",Line: 329, PID: 5948
 
> abort() has been called
> ...
> 2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  server process (PID 5948) was terminated by exception 0xC0000354
> 2023-02-08 00:53:27.420 GMT postmaster[872] HINT:  See C include file "ntstatus.h" for a description of the
hexadecimalvalue.
 
> 2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  terminating any other active server processes
> 2023-02-08 00:53:27.434 GMT postmaster[872] LOG:  all server processes terminated; reinitializing
>
>
> and that it's indeed the money test that failed:
>       money                        ... FAILED (test process exited with exit code 2)     7337 ms
>
> it's very hard to understand how this stack can come to be:
>
> 00000085`f03ffa40 00007ff6`fd89faa8     ucrtbased!abort(void)+0x5a [minkernel\crts\ucrt\src\appcrt\startup\abort.cpp
@77]
 
> 00000085`f03ffa80 00007ff6`fd6474dc     postgres!ExceptionalCondition(
>             char * conditionName = 0x00007ff6`fdd03ca8 "PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED",
>             char * fileName = 0x00007ff6`fdd03c80 "../src/backend/storage/ipc/pmsignal.c",
>             int lineNumber = 0n329)+0x78 [c:\cirrus\src\backend\utils\error\assert.c @ 67]
> 00000085`f03ffac0 00007ff6`fd676eff     postgres!MarkPostmasterChildActive(void)+0x7c
[c:\cirrus\src\backend\storage\ipc\pmsignal.c@ 329]
 
> 00000085`f03ffb00 00007ff6`fd59aa3a     postgres!InitProcess(void)+0x2ef [c:\cirrus\src\backend\storage\lmgr\proc.c @
375]
> 00000085`f03ffb60 00007ff6`fd467689     postgres!SubPostmasterMain(
>             int argc = 0n3,
>             char ** argv = 0x000001c6`f3814e80)+0x33a [c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
> 00000085`f03ffd90 00007ff6`fda0e1c9     postgres!main(
>             int argc = 0n3,
>             char ** argv = 0x000001c6`f3814e80)+0x2f9 [c:\cirrus\src\backend\main\main.c @ 192]
>
> How can a process that we did notify crashing, that has already executed SQL
> statements, end up in MarkPostmasterChildActive()?
Maybe it's just the backend started for the money test has got
the same PID (5948) that the backend for the name test had?
A simple script that I've found [1] shows that the pids reused rather often
(for me, approximately each 300 process starts in Windows 10 H2), buy maybe
under some circumstances (many concurrent processes?) PIDs can coincide even
so often to trigger that behavior.

[1] https://superuser.com/questions/636497/does-windows-7-reuse-process-ids



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: Using BRIN indexes for sorted output