Re: Resetting spilled txn statistics in pg_stat_replication

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Resetting spilled txn statistics in pg_stat_replication
Дата
Msg-id CA+fd4k59yKGXVTahoez_G_aARPSNks=uQFK-XtDD_LS9tU22qA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Tue, 13 Oct 2020 at 16:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > The original theory I have given above [1] which is an interleaved
> > > > > autovacumm transaction. Let me try to explain in a bit more detail.
> > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
> > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
> > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen
> > > > > in buildfarm will happen when autovacuum transaction happens after 80%
> > > > > or more of the Insert is done.
> > > > >
> > > > > In such a situation we will start decoding 'Insert' first and need to
> > > > > spill multiple times due to the amount of changes (more than threshold
> > > > > logical_decoding_work_mem) and then before we encounter Commit of
> > > > > transaction that performed Insert (and probably some more changes from
> > > > > that transaction) we will encounter a small transaction (autovacuum
> > > > > transaction).  The decode of that small transaction will send the
> > > > > stats collected till now which will lead to the problem shown in
> > > > > buildfarm.
> > > >
> > > > That seems a possible scenario.
> > > >
> > > > I think probably this also explains the reason why spill_count
> > > > slightly varied and spill_txns was still 1. The spill_count value
> > > > depends on how much the process spilled out transactions before
> > > > encountering the commit of an autovacuum transaction. Since we have
> > > > the spill statistics per reorder buffer, not per transactions, it's
> > > > possible.
> > > >
> > >
> > > Okay, here is an updated version (changed some comments) of the patch
> > > I posted some time back. What do you think? I have tested this on both
> > > Windows and Linux environments. I think it is a bit tricky to
> > > reproduce the exact scenario so if you are fine we can push this and
> > > check or let me know if you any better idea?
> >
> > I agree to check if the spill_counts and spill_txns are positive.
> >
>
> I am able to reproduce this problem via debugger. Basically, execute
> the Insert mentioned above from one the psql sessions and in
> ExecInsert() stop the execution once 'estate->es_processed > 4000' and
> then from another psql terminal execute some DDL which will be ignored
> but will any try to decode commit. Then perform 'continue' in the
> first session. This will lead to inconsistent stats value depending
> upon at what time DDL is performed.

Thanks!
I'm also able to reproduce this in a similar way and have confirmed
the patch fixes it.

>  I'll push the patch as I am more
> confident now.

+1. Let's check how the tests are going to be.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] Runtime Partition Pruning