Re: conchuela timeouts since 2021-10-09 system upgrade

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: conchuela timeouts since 2021-10-09 system upgrade
Дата
Msg-id 20211026015157.GA113335@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: conchuela timeouts since 2021-10-09 system upgrade  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: conchuela timeouts since 2021-10-09 system upgrade  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
> > FWIW it's easy to make the issue reproduce faster with following diff
> > -       '--no-vacuum --client=1 --transactions=100',
> > +       '--no-vacuum --client=1 --transactions=1',
> 
> Hmm, didn't help here.  It seems that even though prairiedog managed to
> fail on its first attempt, it's not terribly reproducible there; I've
> seen only one failure in about 30 manual attempts.  In the one failure,
> the non-background pgbench completed fine (as determined by counting
> statements in the postmaster's log); but the background one had only
> finished about 90 transactions before seemingly getting stuck.  No new
> SQL commands had been issued after about 10 seconds.

Interesting.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58
also shows a short command count, just 131/200 completed.  However,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-25%2000%3A35%3A27
shows the full 200/200.  I'm starting to think the prairiedog failures have
only superficial similarity to the conchuela failures.

> Nonetheless, I have a theory and a proposal.  This coding pattern
> seems pretty silly:
> 
>     $pgbench_h->pump_nb;
>     $pgbench_h->finish();
> 
> ISTM that if you need to call pump at all, you need a loop not just
> one call.  So I'm guessing that when it fails, it's for lack of
> pumping.

The pump_nb() is just unnecessary.  We've not added anything destined for
stdin, and finish() takes care of pumping outputs.

> The other thing I noticed is that at least on prairiedog's host, the
> number of invocations of the DROP/CREATE/bt_index_check transaction
> is ridiculously out of proportion to the number of invocations of the
> other transactions.  It can only get through seven or eight iterations
> of the index transaction before the other transactions are all done,
> which means the last 190 iterations of that transaction are a complete
> waste of cycles.

That makes sense.

> What I think we should do in these two tests is nuke the use of
> background_pgbench entirely; that looks like a solution in search
> of a problem, and it seems unnecessary here.  Why not run
> the DROP/CREATE/bt_index_check transaction as one of three script
> options in the main pgbench run?

The author tried that and got deadlocks:
https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru


On prairiedog, the proximate trouble is pgbench getting stuck.  IPC::Run is
behaving normally given a stuck pgbench.  When pgbench stops sending queries,
does pg_stat_activity show anything at all running?  If so, are those backends
waiting on locks?  If not, what's the pgbench stack trace at that time?



В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17247: How to avoid crating multiple Foreign keys on same column on same table.
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Logical replication - empty search_path bug?