suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)
От | Tomas Vondra |
---|---|
Тема | suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?) |
Дата | |
Msg-id | 67f7132d-3923-47a6-9de2-5b7d86ddb73f@vondra.me обсуждение исходный текст |
Ответы |
Re: suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)
|
Список | pgsql-hackers |
Hi, I have noticed one of my buildfarm machines - widowbird - did not report any results since February 17. And it seems to be stuck somewhere in amcheck: $ ps ax | grep postgres 1180067 ? Ss 0:02 /mnt/data/buildfarm/buildroot/HEAD/inst/bin/postgres -D data-C 1180069 ? Ss 0:00 postgres: checkpointer 1180070 ? Ss 0:00 postgres: background writer 1180072 ? Ss 0:00 postgres: walwriter 1180073 ? Ss 0:01 postgres: autovacuum launcher 1180074 ? Ss 0:00 postgres: logical replication launcher 1180107 ? Ss 0:05 postgres: buildfarm contrib_regression_amcheck [local] INSERT 1180111 ? Ss 0:00 postgres: autovacuum worker 1180134 ? Ss 0:00 postgres: autovacuum worker 1180135 ? Ss 0:00 postgres: autovacuum worker 1374029 pts/0 S+ 0:00 grep --color=auto postgres So there's PID 1180107, executing an insert, but not progressing. The backtrace looks like this (first couple lines, full backtrace attached): #0 0x0000007fa64b8ddc in __GI_epoll_pwait (epfd=5, events=0x55ad6285a8, maxevents=1, timeout=timeout@entry=-1, set=set@entry=0x0) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42 #1 0x0000007fa64b8fe8 in epoll_wait (epfd=<optimized out>, events=<optimized out>, maxevents=<optimized out>, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:32 #2 0x000000558f043588 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ff8ed4e18, cur_timeout=-1, set=0x55ad628540) at latch.c:1571 #3 WaitEventSetWait (set=0x55ad628540, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ff8ed4e18, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=134217781) at latch.c:1519 #4 0x000000558f043778 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1, wait_event_info=wait_event_info@entry=134217781) at latch.c:538 #5 0x000000558f052274 in ConditionVariableTimedSleep (cv=0x7f9ac9deb0, timeout=timeout@entry=-1, wait_event_info=wait_event_info@entry=134217781) at condition_variable.c:163 #6 0x000000558f05286c in ConditionVariableTimedSleep (wait_event_info=134217781, timeout=-1, cv=<optimized out>) at condition_variable.c:135 #7 0x000000558ed2fc90 in AdvanceXLInsertBuffer (upto=upto@entry=608174080, tli=tli@entry=1, opportunistic=opportunistic@entry=false) at xlog.c:2224 So, it's stuck in AdvanceXLInsertBuffer ... interesting. Another interesting fact is it's testing 75dfde13639, which is just a couple commits after 6a2275b895: commit 6a2275b8953a4462d44daf001bdd60b3d48f0946 Author: Alexander Korotkov <akorotkov@postgresql.org> Date: Mon Feb 17 04:19:01 2025 +0200 Get rid of WALBufMappingLock Allow multiple backends to initialize WAL buffers concurrently. This way `MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without taking a single LWLock in exclusive mode. ... which reworked AdvanceXLInsertBuffer() quite a bit, it seems. OTOH the last (successful) run on widorbird was on eaf502747b, which already includes 6a2275b895, so maybe it's unrelated. Is there something else I could collect from the stuck instance, before I restart it? regards -- Tomas Vondra
Вложения
В списке pgsql-hackers по дате отправления: