Re: Resetting spilled txn statistics in pg_stat_replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Resetting spilled txn statistics in pg_stat_replication
Дата
Msg-id CAA4eK1KtR90LZVhASiDpqc9yHp8ALWAXMewkR4r+ce3Zj4fcrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Resetting spilled txn statistics in pg_stat_replication  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Ответы Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Resetting spilled txn statistics in pg_stat_replication  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Re: Resetting spilled txn statistics in pg_stat_replication  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > The original theory I have given above [1] which is an interleaved
> > > > autovacumm transaction. Let me try to explain in a bit more detail.
> > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test
> > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000)
> > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen
> > > > in buildfarm will happen when autovacuum transaction happens after 80%
> > > > or more of the Insert is done.
> > > >
> > > > In such a situation we will start decoding 'Insert' first and need to
> > > > spill multiple times due to the amount of changes (more than threshold
> > > > logical_decoding_work_mem) and then before we encounter Commit of
> > > > transaction that performed Insert (and probably some more changes from
> > > > that transaction) we will encounter a small transaction (autovacuum
> > > > transaction).  The decode of that small transaction will send the
> > > > stats collected till now which will lead to the problem shown in
> > > > buildfarm.
> > >
> > > That seems a possible scenario.
> > >
> > > I think probably this also explains the reason why spill_count
> > > slightly varied and spill_txns was still 1. The spill_count value
> > > depends on how much the process spilled out transactions before
> > > encountering the commit of an autovacuum transaction. Since we have
> > > the spill statistics per reorder buffer, not per transactions, it's
> > > possible.
> > >
> >
> > Okay, here is an updated version (changed some comments) of the patch
> > I posted some time back. What do you think? I have tested this on both
> > Windows and Linux environments. I think it is a bit tricky to
> > reproduce the exact scenario so if you are fine we can push this and
> > check or let me know if you any better idea?
>
> I agree to check if the spill_counts and spill_txns are positive.
>

I am able to reproduce this problem via debugger. Basically, execute
the Insert mentioned above from one the psql sessions and in
ExecInsert() stop the execution once 'estate->es_processed > 4000' and
then from another psql terminal execute some DDL which will be ignored
but will any try to decode commit. Then perform 'continue' in the
first session. This will lead to inconsistent stats value depending
upon at what time DDL is performed. I'll push the patch as I am more
confident now.

> I
> thought we can reduce the number of tuples to insert to the half. It
> would help to reduce the likelihood of other transactions interfere
> and speed up the test (currently, the stats.sql test takes almost 1
> sec in my environment). But it might lead to another problem like the
> logical decoding doesn't spill out the transaction on some
> environment.
>

Yeah, and in other cases, in spill.sql we are using the same amount of
data to test spilling.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication