Обсуждение: Notify downstream to discard the streamed transaction which was aborted due to crash.

Поиск
Список
Период
Сортировка

Notify downstream to discard the streamed transaction which was aborted due to crash.

От
"houzj.fnst@fujitsu.com"
Дата:
Hi,

When developing another feature, I find an existing bug which was reported from Dilip[1].

Currently, it's possible that we only send a streaming block without sending a
end of stream message(stream abort) when decoding and streaming a transaction
that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT
for such a crashed transaction. This will cause the subscriber(with
streaming=on) create a stream file but won't delete it until the apply
worker restart.

BUG repro(borrowed from Dilip):
---
1. start 2 servers(config: logical_decoding_work_mem=64kB)
./pg_ctl -D data/ -c -l pub_logs start
./pg_ctl -D data1/ -c -l sub_logs start

2. Publisher:
create table t(a int PRIMARY KEY ,b text);
create publication test_pub for table t
with(PUBLISH='insert,delete,update,truncate');
alter table t replica identity FULL ;

3. Subscription Server:
create table t(a int,b text);
create subscription test_sub CONNECTION 'host=localhost port=10000
dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
test_slot_sub1,streaming=on);

4. Publication Server:
begin ;
insert into t values (generate_series(1,50000), 'zzzzzzzzz');  -- (while executing this restart publisher in 2-3 secs)

Restart the publication server, while the transaction is still in an
uncommitted state.
./pg_ctl -D data/ -c -l pub_logs restart -mi
---

After restarting the publisher, we can see the subscriber receive a streaming
block and create a stream file(/base/pgsql_tmp/xxx.fileset).

To fix it, One idea is to send a stream abort message when we are cleaning up
crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
a tiny patch which changes the same. I have confirmed that the bug is fixed and
all regression tests pass. I didn't add a testcase because we need to make sure
the crash happens before all the WAL logged transactions data are decoded which
doesn't seem easy to write a stable test for this.

Thoughts ?

[1] https://www.postgresql.org/message-id/CAFiTN-sTYk%3Dh75Jn1a7ee%2B5hOcdQFjKGDvF_0NWQQXmoBv4A%2BA%40mail.gmail.com

Best regards,
Hou zj

Вложения

Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

От
Amit Kapila
Дата:
On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
>
> To fix it, One idea is to send a stream abort message when we are cleaning up
> crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
> a tiny patch which changes the same. I have confirmed that the bug is fixed and
> all regression tests pass. I didn't add a testcase because we need to make sure
> the crash happens before all the WAL logged transactions data are decoded which
> doesn't seem easy to write a stable test for this.
>

Your fix looks good to me. Have you tried this in PG-14 where it was introduced?

-- 
With Regards,
Amit Kapila.



RE: Notify downstream to discard the streamed transaction which was aborted due to crash.

От
"houzj.fnst@fujitsu.com"
Дата:
On Friday, January 6, 2023 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com>
> wrote:
> >
> >
> > To fix it, One idea is to send a stream abort message when we are
> > cleaning up crashed transaction on publisher(e.g. in
> > ReorderBufferAbortOld()). And here is a tiny patch which changes the
> > same. I have confirmed that the bug is fixed and all regression tests
> > pass. I didn't add a testcase because we need to make sure the crash
> > happens before all the WAL logged transactions data are decoded which
> doesn't seem easy to write a stable test for this.
> >
> 
> Your fix looks good to me. Have you tried this in PG-14 where it was
> introduced?

Yes, I have confirmed that PG-14 has the same problem and can be fixed after
applying the patch.

Best regards,
Hou zj

Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

От
Dilip Kumar
Дата:
On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> When developing another feature, I find an existing bug which was reported from Dilip[1].
>
> Currently, it's possible that we only send a streaming block without sending a
> end of stream message(stream abort) when decoding and streaming a transaction
> that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT
> for such a crashed transaction. This will cause the subscriber(with
> streaming=on) create a stream file but won't delete it until the apply
> worker restart.
>
> BUG repro(borrowed from Dilip):
> ---
> 1. start 2 servers(config: logical_decoding_work_mem=64kB)
> ./pg_ctl -D data/ -c -l pub_logs start
> ./pg_ctl -D data1/ -c -l sub_logs start
>
> 2. Publisher:
> create table t(a int PRIMARY KEY ,b text);
> create publication test_pub for table t
> with(PUBLISH='insert,delete,update,truncate');
> alter table t replica identity FULL ;
>
> 3. Subscription Server:
> create table t(a int,b text);
> create subscription test_sub CONNECTION 'host=localhost port=10000
> dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
> test_slot_sub1,streaming=on);
>
> 4. Publication Server:
> begin ;
> insert into t values (generate_series(1,50000), 'zzzzzzzzz');  -- (while executing this restart publisher in 2-3
secs)
>
> Restart the publication server, while the transaction is still in an
> uncommitted state.
> ./pg_ctl -D data/ -c -l pub_logs restart -mi
> ---
>
> After restarting the publisher, we can see the subscriber receive a streaming
> block and create a stream file(/base/pgsql_tmp/xxx.fileset).
>
> To fix it, One idea is to send a stream abort message when we are cleaning up
> crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
> a tiny patch which changes the same. I have confirmed that the bug is fixed and
> all regression tests pass. I didn't add a testcase because we need to make sure
> the crash happens before all the WAL logged transactions data are decoded which
> doesn't seem easy to write a stable test for this.
>
> Thoughts ?

Fix looks good to me.  Thanks for working on this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Notify downstream to discard the streamed transaction which was aborted due to crash.

От
Amit Kapila
Дата:
On Fri, Jan 6, 2023 at 11:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> >
> > To fix it, One idea is to send a stream abort message when we are cleaning up
> > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
> > a tiny patch which changes the same. I have confirmed that the bug is fixed and
> > all regression tests pass. I didn't add a testcase because we need to make sure
> > the crash happens before all the WAL logged transactions data are decoded which
> > doesn't seem easy to write a stable test for this.
> >
> > Thoughts ?
>
> Fix looks good to me.  Thanks for working on this.
>

Pushed!

-- 
With Regards,
Amit Kapila.