Re: Resetting spilled txn statistics in pg_stat_replication

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Resetting spilled txn statistics in pg_stat_replication
Дата
Msg-id CA+fd4k6Yn9N=+6brYZmUM4Q0Er4ymo9hu8ueyotE_hN4gmHstA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >
> > > > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > > >> It is possible that MAXALIGN stuff is playing a role here and or the
> > > > >> background transaction stuff. I think if we go with the idea of
> > > > >> testing spill_txns and spill_count being positive then the results
> > > > >> will be stable. I'll write a patch for that.
> > > >
> > > > Here's our first failure on a MAXALIGN-8 machine:
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08
> > > >
> > > > So this is just plain not stable.  It is odd though.  I can
> > > > easily think of mechanisms that would cause the WAL volume
> > > > to occasionally be *more* than the "typical" case.  What
> > > > would cause it to be *less*, if MAXALIGN is ruled out?
> > > >
> > >
> > > The original theory I have given above [1] which is an interleaved
> > > autovacumm transaction. Let me try to explain in a bit more detail.
> > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
> > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
> > > g(i);') a parallel autovacuum transaction occurs. The problem as seen
> > > in buildfarm will happen when autovacuum transaction happens after 80%
> > > or more of the Insert is done.
> > >
> > > In such a situation we will start decoding 'Insert' first and need to
> > > spill multiple times due to the amount of changes (more than threshold
> > > logical_decoding_work_mem) and then before we encounter Commit of
> > > transaction that performed Insert (and probably some more changes from
> > > that transaction) we will encounter a small transaction (autovacuum
> > > transaction).  The decode of that small transaction will send the
> > > stats collected till now which will lead to the problem shown in
> > > buildfarm.
> >
> > That seems a possible scenario.
> >
> > I think probably this also explains the reason why spill_count
> > slightly varied and spill_txns was still 1. The spill_count value
> > depends on how much the process spilled out transactions before
> > encountering the commit of an autovacuum transaction. Since we have
> > the spill statistics per reorder buffer, not per transactions, it's
> > possible.
> >
>
> Okay, here is an updated version (changed some comments) of the patch
> I posted some time back. What do you think? I have tested this on both
> Windows and Linux environments. I think it is a bit tricky to
> reproduce the exact scenario so if you are fine we can push this and
> check or let me know if you any better idea?

I agree to check if the spill_counts and spill_txns are positive. I
thought we can reduce the number of tuples to insert to the half. It
would help to reduce the likelihood of other transactions interfere
and speed up the test (currently, the stats.sql test takes almost 1
sec in my environment). But it might lead to another problem like the
logical decoding doesn't spill out the transaction on some
environment.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication