Обсуждение: BUG #18996: Assertion fails in waiteventset.c when dropping database in single mode in PG18
BUG #18996: Assertion fails in waiteventset.c when dropping database in single mode in PG18
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 18996
Logged by: Patrick Stählin
Email address: me@packi.ch
PostgreSQL version: 18beta2
Operating system: Fedora 40
Description:
Hi!
We're starting to incorporate PG18 (REL_18_BETA2) in our builds/testing. Our
tests currently fail because we drop the postgres database in single mode
before we give our customers access to them, as they won't have superuser
access and we allow them to re-create that database. It also triggers when I
create a database foo and then drop it so it's not related to the postgres
database specifically. The assert doesn't trigger with REL_17_5 built with
the same instructions.
Steps to reproduce:
I've compiled a vanilla REL_18_BETA2 with:
meson setup build -Dcassert=true --buildtype=debug
--prefix=/usr/local/pgsql
cd build
ninja
sudo ninja install
And then ran the following:
export PGBIN=/usr/local/pgsql/bin
$PGBIN/initdb testdb6001
echo "DROP DATABASE postgres;" | $PGBIN/postgres --single -D testdb6001
template1
Commandline output:
~/postgres $ $PGBIN/initdb testdb6001
The files belonging to this database system will be owned by user
"patrick.staehlin".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are enabled.
creating directory testdb6001 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Europe/Vaduz
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option
-A, or --auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
/usr/local/pgsql/bin/pg_ctl -D testdb6001 -l logfile start
~/postgres$ echo "DROP DATABASE postgres;" | $PGBIN/postgres --single -D
testdb6001 template1
PostgreSQL stand-alone backend 18beta2
backend> 2025-07-24 10:39:45.890 CEST [2714826] LOG: checkpoint starting:
immediate force wait
2025-07-24 10:39:45.890 CEST [2714826] STATEMENT: DROP DATABASE postgres;
2025-07-24 10:39:45.891 CEST [2714826] LOG: checkpoint complete: wrote 2
buffers (0.2%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0
recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=1 kB, estimate=1 kB;
lsn=0/17BD4B0, redo lsn=0/17BD458
2025-07-24 10:39:45.891 CEST [2714826] STATEMENT: DROP DATABASE postgres;
TRAP: failed Assert("pos < set->nevents"), File:
"../src/backend/storage/ipc/waiteventset.c", Line: 662, PID: 2714826
/usr/local/pgsql/bin/postgres(ExceptionalCondition+0xab)[0xbaef1e]
/usr/local/pgsql/bin/postgres(ModifyWaitEvent+0x40)[0x997163]
/usr/local/pgsql/bin/postgres(WaitLatch+0xa9)[0x9862ae]
/usr/local/pgsql/bin/postgres(ConditionVariableTimedSleep+0xb6)[0x99a049]
/usr/local/pgsql/bin/postgres(WaitForProcSignalBarrier+0xee)[0x98f793]
/usr/local/pgsql/bin/postgres(dropdb+0x60a)[0x64afcc]
/usr/local/pgsql/bin/postgres(DropDatabase+0x145)[0x64bebc]
/usr/local/pgsql/bin/postgres(standard_ProcessUtility+0x75d)[0x9cea9d]
/usr/local/pgsql/bin/postgres(ProcessUtility+0x13c)[0x9ce339]
/usr/local/pgsql/bin/postgres[0x9ccc08]
/usr/local/pgsql/bin/postgres[0x9cce8d]
/usr/local/pgsql/bin/postgres(PortalRun+0x30b)[0x9cc337]
/usr/local/pgsql/bin/postgres[0x9c4b92]
/usr/local/pgsql/bin/postgres(PostgresMain+0xbcb)[0x9ca2ff]
/usr/local/pgsql/bin/postgres(PostgresMain+0x0)[0x9c9734]
/usr/local/pgsql/bin/postgres(main+0x38b)[0x7addf4]
/lib64/libc.so.6(+0x2a088)[0x7f57437be088]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7f57437be14b]
/usr/local/pgsql/bin/postgres(_start+0x25)[0x498aa5]
2714826 Aborted (core dumped) | $PGBIN/postgres
--single -F -B 1024 -j -D testdb$PORT template1
~/.postgres$
Let me know if you need more information.
Thanks,
Patrick
Re: BUG #18996: Assertion fails in waiteventset.c when dropping database in single mode in PG18
От
Patrick Stählin
Дата:
Hi! On 7/24/25 11:49 AM, PG Bug reporting form wrote: > > We're starting to incorporate PG18 (REL_18_BETA2) in our builds/testing. Our > tests currently fail because we drop the postgres database in single mode > before we give our customers access to them, as they won't have superuser > access and we allow them to re-create that database. It also triggers when I > create a database foo and then drop it so it's not related to the postgres > database specifically. The assert doesn't trigger with REL_17_5 built with > the same instructions. We, or rather git bisect, traced it down to commit 84e5b2f07a5e8ba983ff0f6e71b063b27f45f346 that added a new wait event in InitializeLatchWaitSet if we're running under postmaster but then didn't add the same check in WaitLatch and always referenced it. This probably caused the assert later on, when we were waiting on the ProcBarrier. I've attached a patch based on REL_18_STABLE that seems to fix the issue for us and passes the selftests. Thanks, Patrick
Вложения
Re: BUG #18996: Assertion fails in waiteventset.c when dropping database in single mode in PG18
От
Michael Paquier
Дата:
On Thu, Jul 24, 2025 at 03:46:12PM +0200, Patrick Stählin wrote:
> We, or rather git bisect, traced it down to commit
> 84e5b2f07a5e8ba983ff0f6e71b063b27f45f346 that added a new wait event in
> InitializeLatchWaitSet if we're running under postmaster but then didn't add
> the same check in WaitLatch and always referenced it. This probably caused
> the assert later on, when we were waiting on the ProcBarrier.
>
> I've attached a patch based on REL_18_STABLE that seems to fix the issue for
> us and passes the selftests.
That's the same kind of single-user-mode shortcut we have in the past
for similar issues, like 0ce5cf2ef24f for example.
@@ -187,9 +187,10 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
if (!(wakeEvents & WL_LATCH_SET))
latch = NULL;
ModifyWaitEvent(LatchWaitSet, LatchWaitSetLatchPos, WL_LATCH_SET, latch);
- ModifyWaitEvent(LatchWaitSet, LatchWaitSetPostmasterDeathPos,
- (wakeEvents & (WL_EXIT_ON_PM_DEATH | WL_POSTMASTER_DEATH)),
- NULL);
+ if (IsUnderPostmaster)
+ ModifyWaitEvent(LatchWaitSet, LatchWaitSetPostmasterDeathPos,
+ (wakeEvents & (WL_EXIT_ON_PM_DEATH | WL_POSTMASTER_DEATH)),
+ NULL);
Yeah, that looks good to me. It does not make sense to rely on that
for !IsUnderPostmaster, which is something that we obviously support
in WaitLatch() based on the assertion a couple of lines above while
the initialization happens. Will fix, thanks for the report!
--
Michael