conchuela timeouts since 2021-10-09 system upgrade

Поиск
Список
Период
Сортировка
От Noah Misch
Тема conchuela timeouts since 2021-10-09 system upgrade
Дата
Msg-id 20211024161942.GB3945842@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: conchuela timeouts since 2021-10-09 system upgrade  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-bugs
On Sun, Oct 24, 2021 at 02:45:38PM +0300, Andrey Borodin wrote:
> > 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а):
> >  Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple
> > "IPC::Run: timeout on timer" in the new tests.  No other animal has.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09
> > is an example run.  The pgbench queries finished quickly, but the
> > $pgbench_h->finish() apparently timed out after 180s.  I guess this would be
> > consistent with pgbench blocking in write(), waiting for something to empty a
> > pipe buffer so it can write more.  I thought finish() will drain any incoming
> > I/O, though.  This phenomenon has been appearing regularly via
> > src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to
> > resolve it.  A stack trace of the stuck pgbench should be informative, though.
> 
> Some thoughts:
> 0. I doubt that psql\pgbench is stuck in these failures.

Got it.  If pgbench is a zombie, the fault does lie in IPC::Run or the kernel.

> 1. All observed similar failures seem to be related to finish() sub of IPC::Run harness
> 2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something.
> 3. There is reported bug of finish [1]. But the description is slightly different.

Since that report is about a Perl-process child on Linux, I think we can treat
it as unrelated.

These failures started on 2021-10-09, the day conchuela updated from DragonFly
v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
Since the theorized kernel bug seems not to affect
src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
the newest failure sites is that they don't write anything to the child stdin.
Does writing e.g. a single byte (that the child doesn't use) work around the
problem?  If not, does passing the script via stdin, like "pgbench -f-
<script.sql", work around the problem?

> [0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481
> [1] https://github.com/toddr/IPC-Run/issues/57



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17245: Index corruption involving deduplicated entries