Обсуждение: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

Поиск
Список
Период
Сортировка

BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16643
Logged by:          Henry Hinze
Email address:      henry.hinze@googlemail.com
PostgreSQL version: 13.0
Operating system:   Ubuntu 20.04
Description:

The following setup of logical replication for one table works nicely with
PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
over and over again.
All settings are default except for wal_level which was changed to
"logical".

To replicate:
-- Create 2 Databases
psql

CREATE DATABASE d1;
CREATE DATABASE d2;

-- In d1: Create table, fill with data, create publication, create
replication slot
psql d1

CREATE TABLE t (f int);
INSERT INTO t SELECT * FROM generate_series(1, 10000000);
ALTER TABLE t ADD PRIMARY KEY (f);

CREATE PUBLICATION p;
ALTER PUBLICATION p ADD TABLE t;
SELECT pg_create_logical_replication_slot('my_slot','pgoutput');

-- In d2: Create table, create subscription

psql d2

CREATE TABLE t (f int);
ALTER TABLE t ADD PRIMARY KEY (f);
CREATE SUBSCRIPTION s CONNECTION 'user=postgres dbname=d1' PUBLICATION p
WITH (create_slot = false, slot_name='my_slot');


Here are the Logs:
**********************
2020-09-29 21:27:13.267 CEST [17178] postgres@d1 LOG:  logical decoding
found consistent point at 0/33CDDBB8
2020-09-29 21:27:13.267 CEST [17178] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:27:13.267 CEST [17178] postgres@d1 STATEMENT:  select
pg_create_logical_replication_slot('my_slot','pgoutput');
2020-09-29 21:27:37.543 CEST [17229] LOG:  logical replication apply worker
for subscription "s" has started
2020-09-29 21:27:37.548 CEST [17230] postgres@d1 LOG:  starting logical
decoding for slot "my_slot"
2020-09-29 21:27:37.548 CEST [17230] postgres@d1 DETAIL:  Streaming
transactions committing after 0/33CDDBF0, reading WAL from 0/33CDDBB8.
2020-09-29 21:27:37.548 CEST [17230] postgres@d1 LOG:  logical decoding
found consistent point at 0/33CDDBB8
2020-09-29 21:27:37.548 CEST [17230] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:27:37.552 CEST [17231] LOG:  logical replication table
synchronization worker for subscription "s", table "t" has started
2020-09-29 21:27:37.704 CEST [17232] postgres@d1 LOG:  logical decoding
found consistent point at 0/33D49208
2020-09-29 21:27:37.704 CEST [17232] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:27:58.888 CEST [17232] postgres@d1 LOG:  starting logical
decoding for slot "my_slot_16398_sync_16393"
2020-09-29 21:27:58.888 CEST [17232] postgres@d1 DETAIL:  Streaming
transactions committing after 0/33D49240, reading WAL from 0/33D49208.
2020-09-29 21:27:58.888 CEST [17232] postgres@d1 LOG:  logical decoding
found consistent point at 0/33D49208
2020-09-29 21:27:58.888 CEST [17232] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:28:05.138 CEST [17231] ERROR:  error reading result of
streaming command: 
2020-09-29 21:28:05.143 CEST [17130] LOG:  background worker "logical
replication worker" (PID 17231) exited with exit code 1
2020-09-29 21:28:05.145 CEST [17256] LOG:  logical replication table
synchronization worker for subscription "s", table "t" has started
2020-09-29 21:28:05.222 CEST [17257] postgres@d1 LOG:  logical decoding
found consistent point at 0/62FBD4E8
2020-09-29 21:28:05.222 CEST [17257] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:28:49.187 CEST [17257] postgres@d1 LOG:  starting logical
decoding for slot "my_slot_16398_sync_16393"
2020-09-29 21:28:49.187 CEST [17257] postgres@d1 DETAIL:  Streaming
transactions committing after 0/62FBD520, reading WAL from 0/62FBD4E8.
2020-09-29 21:28:49.187 CEST [17257] postgres@d1 LOG:  logical decoding
found consistent point at 0/62FBD4E8
2020-09-29 21:28:49.187 CEST [17257] postgres@d1 DETAIL:  There are no
running transactions.
2020-09-29 21:28:55.672 CEST [17256] ERROR:  error reading result of
streaming command: 
2020-09-29 21:28:55.676 CEST [17130] LOG:  background worker "logical
replication worker" (PID 17256) exited with exit code 1
2020-09-29 21:28:55.679 CEST [17308] LOG:  logical replication table
synchronization worker for subscription "s", table "t" has started
2020-09-29 21:28:55.833 CEST [17309] postgres@d1 LOG:  logical decoding
found consistent point at 0/9E9AEA38
2020-09-29 21:28:55.833 CEST [17309] postgres@d1 DETAIL:  There are no
running transactions.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following setup of logical replication for one table works nicely with
> PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
> over and over again.

It kinda looks like something is timing out before it's managed to
catch up with the (rather large) initial state of table t.  What
non-default replication settings are you using?

            regards, tom lane



Hi Tom,

thanks for your reply!

For my comparison test of PG 12 and 13 I did not change any default setting except wal_level.

But I also did some tests and increased wal_keep_size and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but without success. The setup works in PG 13 only with a small amount of data.

Best,
Henry

Am Di., 29. Sept. 2020 um 22:09 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following setup of logical replication for one table works nicely with
> PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
> over and over again.

It kinda looks like something is timing out before it's managed to
catch up with the (rather large) initial state of table t.  What
non-default replication settings are you using?

                        regards, tom lane


--
Diplominformatiker Henry Hinze
Dietzgenstraße 75
13156 Berlin
Tel:  +49 - 177 - 3160621
USt-ID: DE306639264
I've made an important observation!

Since I had the impression this setup was already working with RC1 of PG 13, I re-installed RC1 and did the same test. And it's working fine!

So it seems that something has changed from RC1 to the final version that breaks logical replication in some cases.

Best,
Henry

Am Mi., 30. Sept. 2020 um 08:34 Uhr schrieb Henry Hinze <henry.hinze@gmail.com>:
Hi Tom,

thanks for your reply!

For my comparison test of PG 12 and 13 I did not change any default setting except wal_level.

But I also did some tests and increased wal_keep_size and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but without success. The setup works in PG 13 only with a small amount of data.

Best,
Henry

Am Di., 29. Sept. 2020 um 22:09 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following setup of logical replication for one table works nicely with
> PG 12.4. In PG 13.0 the initial startup never finishes. Instead it starts
> over and over again.

It kinda looks like something is timing out before it's managed to
catch up with the (rather large) initial state of table t.  What
non-default replication settings are you using?

                        regards, tom lane





Henry Hinze <henry.hinze@gmail.com> writes:
> I've made an important observation!
> Since I had the impression this setup was already working with RC1 of PG
> 13, I re-installed RC1 and did the same test. And it's working fine!

Ugh.  So that points the finger at commits 07082b08c/bfb12cd2b,
which are the only nearby change between rc1 and 13.0.  A quick
comparison of before-and-after checkouts confirms it.

After some digging around, I realize that that commit actually
resulted in a protocol break.  libpqwalreceiver is expecting to
get an additional CommandComplete message after COPY OUT finishes,
per libpqrcv_endstreaming(), and it's no longer getting one.

(I have not read the protocol document to see if this is per spec;
but spec or no, that's what libpqwalreceiver is expecting.)

The question that this raises is how the heck did that get past
our test suites?  It seems like the error should have been obvious
to even the most minimal testing.

            regards, tom lane



Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

От
Alvaro Herrera
Дата:
On 2020-Sep-30, Tom Lane wrote:

> Henry Hinze <henry.hinze@gmail.com> writes:
> > I've made an important observation!
> > Since I had the impression this setup was already working with RC1 of PG
> > 13, I re-installed RC1 and did the same test. And it's working fine!
> 
> Ugh.  So that points the finger at commits 07082b08c/bfb12cd2b,
> which are the only nearby change between rc1 and 13.0.  A quick
> comparison of before-and-after checkouts confirms it.

Oh dear.

> After some digging around, I realize that that commit actually
> resulted in a protocol break.  libpqwalreceiver is expecting to
> get an additional CommandComplete message after COPY OUT finishes,
> per libpqrcv_endstreaming(), and it's no longer getting one.
> 
> (I have not read the protocol document to see if this is per spec;
> but spec or no, that's what libpqwalreceiver is expecting.)

Yeah, definitely.

The minimal fix seems to be to add an EndReplicationCommand() call in
the T_StartReplicationCmd case.  Testing this now ...

> The question that this raises is how the heck did that get past
> our test suites?  It seems like the error should have been obvious
> to even the most minimal testing.

... yeah, that's indeed an important question.  I'm going to guess that
the TAP suites are too forgiving :-(



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Sep-30, Tom Lane wrote:
>> The question that this raises is how the heck did that get past
>> our test suites?  It seems like the error should have been obvious
>> to even the most minimal testing.

> ... yeah, that's indeed an important question.  I'm going to guess that
> the TAP suites are too forgiving :-(

One thing I noticed while trying to trace this down is that while the
initial table sync is happening, we have *both* a regular
walsender/walreceiver pair and a "sync" pair, eg

postgres  905650  0.0  0.0 186052 11888 ?        Ss   17:12   0:00 postgres: logical replication worker for
subscription16398  
postgres  905651 50.1  0.0 173704 13496 ?        Ss   17:12   0:09 postgres: walsender postgres [local] idle
postgres  905652  104  0.4 186832 148608 ?       Rs   17:12   0:19 postgres: logical replication worker for
subscription16398 sync 16393  
postgres  905653 12.2  0.0 174380 15524 ?        Ss   17:12   0:02 postgres: walsender postgres [local] COPY

Is it supposed to be like that?  Notice also that the regular walsender
has consumed significant CPU time; it's not pinning a CPU like the sync
walreceiver is, but it's eating maybe 20% of a CPU according to "top".
I wonder whether in cases with only small tables (which is likely all
that our tests test), the regular walreceiver manages to complete the
table sync despite repeated(?) failures of the sync worker.

            regards, tom lane



Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

От
Peter Eisentraut
Дата:
On 2020-09-30 23:32, Tom Lane wrote:
> The question that this raises is how the heck did that get past
> our test suites?  It seems like the error should have been obvious
> to even the most minimal testing.

What's the difference between this case and what the test suite is 
testing?  Is it that it replicates between two databases on the same 
instance?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

От
Peter Eisentraut
Дата:
On 2020-09-30 23:52, Tom Lane wrote:
> One thing I noticed while trying to trace this down is that while the
> initial table sync is happening, we have *both* a regular
> walsender/walreceiver pair and a "sync" pair, eg
> 
> postgres  905650  0.0  0.0 186052 11888 ?        Ss   17:12   0:00 postgres: logical replication worker for
subscription16398
 
> postgres  905651 50.1  0.0 173704 13496 ?        Ss   17:12   0:09 postgres: walsender postgres [local] idle
> postgres  905652  104  0.4 186832 148608 ?       Rs   17:12   0:19 postgres: logical replication worker for
subscription16398 sync 16393
 
> postgres  905653 12.2  0.0 174380 15524 ?        Ss   17:12   0:02 postgres: walsender postgres [local] COPY

That's normal.  You could also have even more if tables are syncing in 
parallel.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2020-Oct-01, Peter Eisentraut wrote:
>> What's the difference between this case and what the test suite is testing?
>> Is it that it replicates between two databases on the same instance?

> I don't know why the tests pass, but the message
>   ERROR:  error reading result of streaming command: 
> does appear in the logs after running src/test/subscription many times
> (I see it in tests 001, 002, 013 and 014, apart from the new one in
> 100).  It's indeed surprising that these tests all pass!

> I turned Henry's reproducer into the attached TAP test, and it does
> reproduce the problem; but if I reduce the number of rows from 5000 to
> 1000, then it no longer does.  I don't quite see why this would be a
> problem with a larger table only.  Do you?

I think we really need to figure that out before concluding that this
problem is solved.  Now that we've seen this, I'm wondering urgently
what other coverage gaps we've got there.

> The fix is the commented-out line in walsender.c; the test reliably
> passes for me if I uncomment that, and the error message disappear from
> the server logs in all the other tests.

I agree that this is what we need to do code-wise; we can't let the
protocol break stand, or we'll break all sorts of cross-version
replication scenarios.  However, we'd better also change the protocol
spec to say that this is what is supposed to happen.

            regards, tom lane



Re: BUG #16643: PG13 - Logical replication - initial startup never finishes and gets stuck in startup loop

От
Alvaro Herrera
Дата:
On 2020-Sep-30, Tom Lane wrote:

> After some digging around, I realize that that commit actually
> resulted in a protocol break.  libpqwalreceiver is expecting to
> get an additional CommandComplete message after COPY OUT finishes,
> per libpqrcv_endstreaming(), and it's no longer getting one.
> 
> (I have not read the protocol document to see if this is per spec;
> but spec or no, that's what libpqwalreceiver is expecting.)
> 
> The question that this raises is how the heck did that get past
> our test suites?  It seems like the error should have been obvious
> to even the most minimal testing.

So I think I can answer this now.  Petr, as intellectual author of this
code I would appreciate your thoughts on this -- you probably understand
this much better.  Same goes for Peter as committer.

I think the logical replication feature is missing an detailed
architectural diagram.

What is happening is that the tablesync code, which is in charge of
initial syncing of tables, has two main code paths: the first one is an
optimization (?) in LogicalRepSyncTableStart() that if the sync takes
little time, the process will exit immediately (via
finish_sync_worker()) without returning control to the main logical
replication apply worker code.  In the TAP tests, all tables are pretty
small, so that optimization always fires.

So in that case we never get control back to walrcv_endstreaming (the
complainant in Henry's test case), which is why replication never fails
in the tests.  You can verify this because the final "return slotname"
line in LogicalRepSyncTableStart has zero coverage.


The other code path returns the slot name to ApplyWorkerMain, which
does the normal walrcv_startstreaming / LogicalRepApplyLoop /
endstreaming dance.  And whenever that code path is taken, all the
src/test/subscription tests fail with the missing command tag, and
they work correctly when the command tag is restored, which is what
we wanted.

My proposal at this stage is to remove the optimization in
LogicalRepSyncTableStart, per 0001, if only because the code becomes
simpler (0002 reindents code).  With this, we get a coverage increase
for tablesync.c not only in the final "return slotname" but also for 
process_syncing_tables_for_sync() which is currently uncovered:
https://coverage.postgresql.org/src/backend/replication/logical/tablesync.c.gcov.html#268

However, and this is one reason why I'd welcome Petr/Peter thoughts on
this, I don't really understand what happens in LogicalRepApplyLoop
afterwards with a tablesync worker; are we actually doing anything
useful there, considering that the actual data copy seems to have
occurred in the CopyFrom() call in copy_table?  In other words, by the
time we return control to ApplyWorkerMain with a slot name, isn't the
work all done, and the only thing we need is to synchronize protocol and
close the connection?

Or is it?


Another thing I noticed is that if I crank up the number of rows in the
new t/100_bugs.pl test case(*), the two sync workers run in sequence -- not
in parallel.  What's going on with that?  Shouldn't there be two
simultaneous workers?

(*) In the submitted patch, the test uses 10 million rows.  I intend to
reduce that to, say, 5000 rows in the committed version.  Strictly
speaking, there's no need for this test at all since with the
LogicalRepSyncTableStart changes I propose, even t/001_rep_changes.pl
fails.  But since my new test case fails even without that change, I
prefer to keep it anyway.


Вложения
Hi,

On 10/10/2020 21:06, Alvaro Herrera wrote:
> On 2020-Sep-30, Tom Lane wrote:
> 
>> After some digging around, I realize that that commit actually
>> resulted in a protocol break.  libpqwalreceiver is expecting to
>> get an additional CommandComplete message after COPY OUT finishes,
>> per libpqrcv_endstreaming(), and it's no longer getting one.
>>
>> (I have not read the protocol document to see if this is per spec;
>> but spec or no, that's what libpqwalreceiver is expecting.)
>>
>> The question that this raises is how the heck did that get past
>> our test suites?  It seems like the error should have been obvious
>> to even the most minimal testing.
> 
> So I think I can answer this now.  Petr, as intellectual author of this
> code I would appreciate your thoughts on this -- you probably understand
> this much better.  Same goes for Peter as committer.
> 
> I think the logical replication feature is missing an detailed
> architectural diagram.
> 
> What is happening is that the tablesync code, which is in charge of
> initial syncing of tables, has two main code paths: the first one is an
> optimization (?) in LogicalRepSyncTableStart() that if the sync takes
> little time, the process will exit immediately (via
> finish_sync_worker()) without returning control to the main logical
> replication apply worker code.  In the TAP tests, all tables are pretty
> small, so that optimization always fires.

It's not only about size of the tables, it's mainly that there is no 
write activity so the main apply is not moving past the LSN at which 
table sync has started at. With bigger table you get at least something 
written (running xacts, autovacuum, or whatever) that moves lsn forward 
eventually.

> 
> So in that case we never get control back to walrcv_endstreaming (the
> complainant in Henry's test case), which is why replication never fails
> in the tests.  You can verify this because the final "return slotname"
> line in LogicalRepSyncTableStart has zero coverage.
> 

Correct.

> 
> The other code path returns the slot name to ApplyWorkerMain, which
> does the normal walrcv_startstreaming / LogicalRepApplyLoop /
> endstreaming dance.  And whenever that code path is taken, all the
> src/test/subscription tests fail with the missing command tag, and
> they work correctly when the command tag is restored, which is what
> we wanted.
> 
> My proposal at this stage is to remove the optimization in
> LogicalRepSyncTableStart, per 0001, if only because the code becomes
> simpler (0002 reindents code).  With this, we get a coverage increase
> for tablesync.c not only in the final "return slotname" but also for
> process_syncing_tables_for_sync() which is currently uncovered:
> https://coverage.postgresql.org/src/backend/replication/logical/tablesync.c.gcov.html#268
> 
> However, and this is one reason why I'd welcome Petr/Peter thoughts on
> this, I don't really understand what happens in LogicalRepApplyLoop
> afterwards with a tablesync worker; are we actually doing anything
> useful there, considering that the actual data copy seems to have
> occurred in the CopyFrom() call in copy_table?  In other words, by the
> time we return control to ApplyWorkerMain with a slot name, isn't the
> work all done, and the only thing we need is to synchronize protocol and
> close the connection?
> 

There are 2 possible states at that point, either tablesync is ahead 
(when main apply lags or nothing is happening on publication side) or 
it's behind the main apply. When tablesync is ahead we are indeed done 
and just need to update the state of the table (which is what the code 
you removed did, but LogicalRepApplyLoop should do it as well, just a 
bit later). When it's behind we need to do catchup for that table only 
which still happens in the tablesync worker. See the explanation at the 
beginning of tablesync.c, it probably needs some small adjustments after 
the changes in your first patch.

> 
> Another thing I noticed is that if I crank up the number of rows in the
> new t/100_bugs.pl test case(*), the two sync workers run in sequence -- not
> in parallel.  What's going on with that?  Shouldn't there be two
> simultaneous workers?
>

That's likely because the second replication slot is waiting for xid 
created by the first one to commit or abort while building consistent 
snapshot.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Hi,

On 14/10/2020 03:12, Alvaro Herrera wrote:
> On 2020-Oct-12, Petr Jelinek wrote:
> 
>>> However, and this is one reason why I'd welcome Petr/Peter thoughts on
>>> this, I don't really understand what happens in LogicalRepApplyLoop
>>> afterwards with a tablesync worker; are we actually doing anything
>>> useful there, considering that the actual data copy seems to have
>>> occurred in the CopyFrom() call in copy_table?  In other words, by the
>>> time we return control to ApplyWorkerMain with a slot name, isn't the
>>> work all done, and the only thing we need is to synchronize protocol and
>>> close the connection?
>>
>> There are 2 possible states at that point, either tablesync is ahead (when
>> main apply lags or nothing is happening on publication side) or it's behind
>> the main apply. When tablesync is ahead we are indeed done and just need to
>> update the state of the table (which is what the code you removed did, but
>> LogicalRepApplyLoop should do it as well, just a bit later). When it's
>> behind we need to do catchup for that table only which still happens in the
>> tablesync worker. See the explanation at the beginning of tablesync.c, it
>> probably needs some small adjustments after the changes in your first patch.
> 
> ... Ooh, things start to make some sense now.  So how about the
> attached?  There are some not really related cleanups.  (Changes to
> protocol.sgml are still pending.)
> 

It would be nice if the new sentences at the beginning of tablesync.c 
started with uppercase, but that's about as nitpicky as I can be :)

> If I understand correcly, the early exit in tablesync.c is not saving *a
> lot* of time (we don't actually skip replaying any WAL), even if it's
> saving execution of a bunch of code.  So I stand by my position that
> removing the code is better because it's clearer about what is actually
> happening.
> 

I don't really have any problems with the simplification you propose. 
The saved time is probably in order of hundreds of ms which for table 
sync is insignificant.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Oct-14, Petr Jelinek wrote:
>
> > It would be nice if the new sentences at the beginning of tablesync.c
> > started with uppercase, but that's about as nitpicky as I can be :)
>
> OK, fixed that :-)  And pushed (to master only).  There's one more
> change I added at the last minute, which is to remove the 'missing_ok'
> parameter of GetSubscriptionRelState.  There are some other cosmetic
> changes, but nothing of substance.
>
> > > If I understand correcly, the early exit in tablesync.c is not saving *a
> > > lot* of time (we don't actually skip replaying any WAL), even if it's
> > > saving execution of a bunch of code.  So I stand by my position that
> > > removing the code is better because it's clearer about what is actually
> > > happening.
> >
> > I don't really have any problems with the simplification you propose. The
> > saved time is probably in order of hundreds of ms which for table sync is
> > insignificant.
>
> Great, thanks.
>
> If you think this is done ... I have taken a few notes in the process:
>
> * STREAM COMMIT bug?
>   In apply_handle_stream_commit, we do CommitTransactionCommand, but
>   apparently in a tablesync worker we shouldn't do it.
>

In the tablesync stage, we don't allow streaming. See pgoutput_startup
where we disable streaming for the init phase. As far as I understand,
for tablesync we create the initial slot during which streaming will
be disabled then we will copy the table (here logical decoding won't
be used) and then allow the apply worker to get any other data which
is inserted in the meantime. Now, I might be missing something here
but if you can explain it a bit more or share some test to show how we
can reach here via tablesync worker then we can discuss the possible
solution.

Sorry for responding late.

-- 
With Regards,
Amit Kapila.



On 2020-Nov-04, Amit Kapila wrote:

> On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > * STREAM COMMIT bug?
> >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> >   apparently in a tablesync worker we shouldn't do it.
> 
> In the tablesync stage, we don't allow streaming. See pgoutput_startup
> where we disable streaming for the init phase. As far as I understand,
> for tablesync we create the initial slot during which streaming will
> be disabled then we will copy the table (here logical decoding won't
> be used) and then allow the apply worker to get any other data which
> is inserted in the meantime. Now, I might be missing something here
> but if you can explain it a bit more or share some test to show how we
> can reach here via tablesync worker then we can discuss the possible
> solution.

Hmm, okay, that sounds like there would be no bug then.  Maybe what we
need is just an assert in apply_handle_stream_commit that
!am_tablesync_worker(), as in the attached patch.  Passes tests.

Вложения
On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Nov-04, Amit Kapila wrote:
>
> > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > > * STREAM COMMIT bug?
> > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > >   apparently in a tablesync worker we shouldn't do it.
> >
> > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > where we disable streaming for the init phase. As far as I understand,
> > for tablesync we create the initial slot during which streaming will
> > be disabled then we will copy the table (here logical decoding won't
> > be used) and then allow the apply worker to get any other data which
> > is inserted in the meantime. Now, I might be missing something here
> > but if you can explain it a bit more or share some test to show how we
> > can reach here via tablesync worker then we can discuss the possible
> > solution.
>
> Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> need is just an assert in apply_handle_stream_commit that
> !am_tablesync_worker(), as in the attached patch.  Passes tests.
>

+1. But do we want to have this Assert only in stream_commit API or
all stream APIs as well?

-- 
With Regards,
Amit Kapila.



On Thu, Nov 5, 2020 at 9:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > >   apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> >
>
> +1. But do we want to have this Assert only in stream_commit API or
> all stream APIs as well?
>

One more point to look here is at what point does the tablesync worker
is involved in applying decode transactions if any?  Basically, I
would like to ensure that if it uses the slot it has initially created
(before copy) then it is probably fine because we don't enable
streaming with it during the initial phase but if it uses the slot to
decode xacts after copy then we need to probably once check if it is
still true that streaming is not enabled at that point. I am not
completely sure if there are existing test cases to cover any such
scenarios so probably thinking a bit more about this might be helpful.


-- 
With Regards,
Amit Kapila.



On 2020-Nov-05, Amit Kapila wrote:

> On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > >   apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> >
> 
> +1. But do we want to have this Assert only in stream_commit API or
> all stream APIs as well?

Well, the only reason I care about this is that apply_handle_commit
contains a comment that we must not do CommitTransactionCommand in the
syncworker case; so if you look at apply_handle_stream_commit and note
that it doesn't concern it about that, you become concerned that it
might be broken.  I don't think the other routines handling the "stream"
thing have that issue.



On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Nov-05, Amit Kapila wrote:
>
> > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > On 2020-Nov-04, Amit Kapila wrote:
> > >
> > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > > > * STREAM COMMIT bug?
> > > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > > >   apparently in a tablesync worker we shouldn't do it.
> > > >
> > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > > where we disable streaming for the init phase. As far as I understand,
> > > > for tablesync we create the initial slot during which streaming will
> > > > be disabled then we will copy the table (here logical decoding won't
> > > > be used) and then allow the apply worker to get any other data which
> > > > is inserted in the meantime. Now, I might be missing something here
> > > > but if you can explain it a bit more or share some test to show how we
> > > > can reach here via tablesync worker then we can discuss the possible
> > > > solution.
> > >
> > > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > > need is just an assert in apply_handle_stream_commit that
> > > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> > >
> >
> > +1. But do we want to have this Assert only in stream_commit API or
> > all stream APIs as well?
>
> Well, the only reason I care about this is that apply_handle_commit
> contains a comment that we must not do CommitTransactionCommand in the
> syncworker case; so if you look at apply_handle_stream_commit and note
> that it doesn't concern it about that, you become concerned that it
> might be broken.  I don't think the other routines handling the "stream"
> thing have that issue.
>

Fair enough, as mentioned in my previous email, I think we need to
confirm once that after copy how the decoding happens on upstream for
transactions during the phase where tablesync workers is moving to
state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to
come up (in next few days) with some test case to debug and test this
particular scenario and share my findings.

-- 
With Regards,
Amit Kapila.



On Sat, Nov 7, 2020 at 9:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 7, 2020 at 5:31 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-05, Amit Kapila wrote:
> >
> > > On Wed, Nov 4, 2020 at 7:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > >
> > > > On 2020-Nov-04, Amit Kapila wrote:
> > > >
> > > > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > >
> > > > > > * STREAM COMMIT bug?
> > > > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > > > >   apparently in a tablesync worker we shouldn't do it.
> > > > >
> > > > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > > > where we disable streaming for the init phase. As far as I understand,
> > > > > for tablesync we create the initial slot during which streaming will
> > > > > be disabled then we will copy the table (here logical decoding won't
> > > > > be used) and then allow the apply worker to get any other data which
> > > > > is inserted in the meantime. Now, I might be missing something here
> > > > > but if you can explain it a bit more or share some test to show how we
> > > > > can reach here via tablesync worker then we can discuss the possible
> > > > > solution.
> > > >
> > > > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > > > need is just an assert in apply_handle_stream_commit that
> > > > !am_tablesync_worker(), as in the attached patch.  Passes tests.
> > > >
> > >
> > > +1. But do we want to have this Assert only in stream_commit API or
> > > all stream APIs as well?
> >
> > Well, the only reason I care about this is that apply_handle_commit
> > contains a comment that we must not do CommitTransactionCommand in the
> > syncworker case; so if you look at apply_handle_stream_commit and note
> > that it doesn't concern it about that, you become concerned that it
> > might be broken.  I don't think the other routines handling the "stream"
> > thing have that issue.
> >
>
> Fair enough, as mentioned in my previous email, I think we need to
> confirm once that after copy how the decoding happens on upstream for
> transactions during the phase where tablesync workers is moving to
> state SUBREL_STATE_SYNCDONE from SUBREL_STATE_CATCHUP. I'll try to
> come up (in next few days) with some test case to debug and test this
> particular scenario and share my findings.

IIUC, the table sync worker does the initial copy using the consistent
snapshot.  And after that, if the main apply worker is behind us then
it will wait for the apply worker to reach the table sync worker's
start point and after that, the apply worker can continue applying the
changes.  OTOH, of the apply worker have already moved ahead in
processing the WAL after it had launched the table sync worker that
means the apply worker would have skipped those many transactions as
the table was not in SYNC DONE state so now the table sync worker need
to cover this gap by applying the walls using normal apply path and it
can be moved to the SYNC done state once it catches up with the actual
apply worker.  After putting the table sync worker in the catchup
state the apply worker will wait for the table sync worker to catchup.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Oct-14, Petr Jelinek wrote:
> >
> > > It would be nice if the new sentences at the beginning of tablesync.c
> > > started with uppercase, but that's about as nitpicky as I can be :)
> >
> > OK, fixed that :-)  And pushed (to master only).  There's one more
> > change I added at the last minute, which is to remove the 'missing_ok'
> > parameter of GetSubscriptionRelState.  There are some other cosmetic
> > changes, but nothing of substance.
> >
> > > > If I understand correcly, the early exit in tablesync.c is not saving *a
> > > > lot* of time (we don't actually skip replaying any WAL), even if it's
> > > > saving execution of a bunch of code.  So I stand by my position that
> > > > removing the code is better because it's clearer about what is actually
> > > > happening.
> > >
> > > I don't really have any problems with the simplification you propose. The
> > > saved time is probably in order of hundreds of ms which for table sync is
> > > insignificant.
> >
> > Great, thanks.
> >
> > If you think this is done ... I have taken a few notes in the process:
> >
> > * STREAM COMMIT bug?
> >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> >   apparently in a tablesync worker we shouldn't do it.
> >
>
> In the tablesync stage, we don't allow streaming. See pgoutput_startup
> where we disable streaming for the init phase. As far as I understand,
> for tablesync we create the initial slot during which streaming will
> be disabled then we will copy the table (here logical decoding won't
> be used) and then allow the apply worker to get any other data which
> is inserted in the meantime.

I think this assumption is not completely correct,  because if the
tablesync worker is behind the apply worker then it will start the
streaming by itself until it reaches from CATCHUP to SYNC DONE state.
So during that time if streaming is on then the tablesync worker will
also send the streaming on.  I think for disabling the streaming in
the tablesync worker we can do something like this.

diff --git a/src/backend/replication/logical/worker.c
b/src/backend/replication/logical/worker.c
index 3a5b733ee3..21ac29f703 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3084,7 +3084,7 @@ ApplyWorkerMain(Datum main_arg)
                LOGICALREP_PROTO_STREAM_VERSION_NUM :
LOGICALREP_PROTO_VERSION_NUM;
        options.proto.logical.publication_names = MySubscription->publications;
        options.proto.logical.binary = MySubscription->binary;
-       options.proto.logical.streaming = MySubscription->stream;
+       options.proto.logical.streaming = am_tablesync_worker() ?
false : MySubscription->stream;

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > where we disable streaming for the init phase. As far as I understand,
> > for tablesync we create the initial slot during which streaming will
> > be disabled then we will copy the table (here logical decoding won't
> > be used) and then allow the apply worker to get any other data which
> > is inserted in the meantime.
>
> I think this assumption is not completely correct,  because if the
> tablesync worker is behind the apply worker then it will start the
> streaming by itself until it reaches from CATCHUP to SYNC DONE state.
> So during that time if streaming is on then the tablesync worker will
> also send the streaming on.
>

Yeah, this seems to be possible and this is the reason I mentioned
above to dig more into this case. Did you try it via some test case? I
think we can generate a test via debugger where after the tablesync
worker reaches CATCHUP state stop it via debugger, then we can perform
some large transaction on the same table which apply worker will skip
and tablesync worker will try to apply changes and should fail.

>  I think for disabling the streaming in
> the tablesync worker we can do something like this.
>

Sure, but why do we want to prohibit streaming in tablesync worker
unless there is some fundamental reason for the same? If we can write
a test based on what I described above then we can probably know if
there is any real issue with allowing streaming via tablesync worker.

-- 
With Regards,
Amit Kapila.



On Mon, Nov 9, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 4, 2020 at 10:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime.
> >
> > I think this assumption is not completely correct,  because if the
> > tablesync worker is behind the apply worker then it will start the
> > streaming by itself until it reaches from CATCHUP to SYNC DONE state.
> > So during that time if streaming is on then the tablesync worker will
> > also send the streaming on.
> >
>
> Yeah, this seems to be possible and this is the reason I mentioned
> above to dig more into this case. Did you try it via some test case? I
> think we can generate a test via debugger where after the tablesync
> worker reaches CATCHUP state stop it via debugger, then we can perform
> some large transaction on the same table which apply worker will skip
> and tablesync worker will try to apply changes and should fail.

Yeah, we can test like that.  I haven't yet tested yet.

> >  I think for disabling the streaming in
> > the tablesync worker we can do something like this.
> >
>
> Sure, but why do we want to prohibit streaming in tablesync worker
> unless there is some fundamental reason for the same? If we can write
> a test based on what I described above then we can probably know if
> there is any real issue with allowing streaming via tablesync worker.

I think there is no fundamental reason for the same,  but I thought it
is just an initial catchup state so does it make sense to stream the
transaction.  But if we want to stream then we need to put some
handling in apply_handle_stream_commit so that it can avoid committing
if this is the table-sync worker.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Mon, Nov 9, 2020 at 1:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 9, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Nov 7, 2020 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Sure, but why do we want to prohibit streaming in tablesync worker
> > unless there is some fundamental reason for the same? If we can write
> > a test based on what I described above then we can probably know if
> > there is any real issue with allowing streaming via tablesync worker.
>
> I think there is no fundamental reason for the same,  but I thought it
> is just an initial catchup state so does it make sense to stream the
> transaction.  But if we want to stream then we need to put some
> handling in apply_handle_stream_commit so that it can avoid committing
> if this is the table-sync worker.
>

Sure, let's try that then instead of blindly avoid streaming if it is possible.

-- 
With Regards,
Amit Kapila.



On Tue, Nov 17, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Yeah, this seems to be possible and this is the reason I mentioned
> above to dig more into this case. Did you try it via some test case? I
> think we can generate a test via debugger where after the tablesync
> worker reaches CATCHUP state stop it via debugger, then we can perform
> some large transaction on the same table which apply worker will skip
> and tablesync worker will try to apply changes and should fail.

Hello Amit.

FYI - This email is just to confirm that your above idea for debugging
the tablesync worker does work.

I have made a small temporary patch to aid testing this: PSA.

The patch just adds some more LOGs - it helps see what is going on.
The patch also gives a 30 second opportunity to attach to the
tablesync worker process.


----

So the necessary debugging steps are like this:

0. Start debugger
- Later we will attach to the tablesync worker process so it is
convenient to set all the breakpoint in advance
(gdb) b LogicalRepSyncTableStart
(gdb) b LogicalRepApplyLoop
(gdb) b apply_dispatch
(gdb) b process_syncing_tables_for_sync

1. Pub/Sub nodes: Create/Start Publisher and Subscriber nodes with
same test table.

2. Pub node: Add some initial data to the table (to give tablesync
init something to copy)

3. Pub node: CREATE PUBLISHER

4. Sub node: CREATE SUBSCRIBER
- at this point the slot gets created and the tablesync worker and
apply worker are launched
- that patch causes the tablesync worker to sleep 30 seconds to give
opportunity to attach to it in debugger

5. Debugger/tablesync: Attach to the tablesync worker
- continue
- breaks at LogicalRepSyncTableStart
- step over all the initial COPY code to the end of the function
(where apply worker has told it to CATCHUP)

6. Pub: While the tablesync worker is paused in debugger do some other
operations on the original table
- e.g. psql -d test_pub -c "INSERT INTO test_tab VALUES(1, 'foo');"

7. Debugger/tablesync: Allow the tablesync worker to continue
- tablesync worker will return from LogicalRepSyncTableStart and then
into LogicalRepApplyLoop
- if tablesync worker detects it is "behind" then the
LogicalRepApplyLoop will call apply_dispatch to deal with the message
resulting from the step 6 operation.

8. Debugger/tablesync: Some apply handlers (e.g. apply_handle_commit)
if not the LogicalRepApplyLoop itself, will end up calling the
process_syncing_tables.
- which calls process_syncing_tables_for_sync
- which will cause the tablesync worker to set SYNCDONE status just
before killing itself.

9. Sub node: The apply worker wait loop will now proceed again
tick/tick/tick every 1 second to process any incoming replication
messages.

---

Using the above technique, you can cause a tablesync worker to enter
some of the normal "apply" handling which might otherwise be very hard
to reproduce outside of the debugger.

---

Here is an example of some of the logging for the above debugging scenario:

2020-11-17 10:46:11.325 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:12.328 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:12.328 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:13.337 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:13.337 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:14.340 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:14.340 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:15.155 AEDT [9841] LOG:  !!>> tablesync worker: About
to call LogicalRepSyncTableStart to do initial syncing
2020-11-17 10:46:15.343 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:15.343 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:15.343 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:15.343 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:16.349 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:16.349 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:17.351 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:17.351 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:18.353 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:18.353 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:18.650 AEDT [10734] LOG:  logical decoding found
consistent point at 0/1B6D5E0
2020-11-17 10:46:18.650 AEDT [10734] DETAIL:  There are no running transactions.
2020-11-17 10:46:18.650 AEDT [10734] STATEMENT:
CREATE_REPLICATION_SLOT "tap_sub_24599_sync_16385" TEMPORARY LOGICAL
pgoutput USE_SNAPSHOT
2020-11-17 10:46:18.658 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:18.658 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:46:18.744 AEDT [9841] LOG:  !!>> tablesync worker: wait
for CATCHUP state notification
2020-11-17 10:46:18.744 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:46:18.744 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables

psql -d test_pub -c "INSERT INTO test_tab VALUES(1, 'foo');"
INSERT 0 1

[postgres@CentOS7-x64 ~]$ 2020-11-17 10:47:39.269 AEDT [9841] LOG:
!!>> tablesync worker: received CATCHUP state notification
2020-11-17 10:47:44.117 AEDT [9841] LOG:  !!>> tablesync worker:
Returned from LogicalRepSyncTableStart
2020-11-17 10:48:05.678 AEDT [10734] LOG:  starting logical decoding
for slot "tap_sub_24599_sync_16385"
2020-11-17 10:48:05.678 AEDT [10734] DETAIL:  Streaming transactions
committing after 0/1B6D618, reading WAL from 0/1B6D5E0.
2020-11-17 10:48:05.678 AEDT [10734] STATEMENT:  START_REPLICATION
SLOT "tap_sub_24599_sync_16385" LOGICAL 0/1B6D618 (proto_version '2',
publication_names '"tap_pub"')
2020-11-17 10:48:05.678 AEDT [10734] LOG:  logical decoding found
consistent point at 0/1B6D5E0
2020-11-17 10:48:05.678 AEDT [10734] DETAIL:  There are no running transactions.
2020-11-17 10:48:05.678 AEDT [10734] STATEMENT:  START_REPLICATION
SLOT "tap_sub_24599_sync_16385" LOGICAL 0/1B6D618 (proto_version '2',
publication_names '"tap_pub"')
2020-11-17 10:48:09.258 AEDT [9841] LOG:  !!>> tablesync worker:
LogicalRepApplyLoop
2020-11-17 10:49:36.537 AEDT [9841] LOG:  !!>> tablesync worker:
called process_syncing_tables
2020-11-17 10:49:36.538 AEDT [9841] LOG:  logical replication table
synchronization worker for subscription "tap_sub", table "test_tab"
has finished
2020-11-17 10:49:36.588 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:49:36.589 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:49:36.590 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:49:37.591 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:49:37.591 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:49:38.592 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:49:38.592 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:49:39.594 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:49:39.594 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables
2020-11-17 10:49:40.595 AEDT [9838] LOG:  !!>> apply worker: LogicalRepApplyLoop
2020-11-17 10:49:40.595 AEDT [9838] LOG:  !!>> apply worker: called
process_syncing_tables

---

I have so far only been trying above with the non-streaming
subscription, and TBH stepping through this startup state "dance" of
the tablesync/apply workers is already causing more questions than
answers for me. Anyway, I will raise any questions as separate emails
to this one.

BTW, do you think these tablesync discussions should be moved to
hackers list? I think they are no longer related to the reported BUG
of this current thread.

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения
On Tue, Nov 17, 2020 at 7:44 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Nov 17, 2020 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Yeah, this seems to be possible and this is the reason I mentioned
> > above to dig more into this case. Did you try it via some test case? I
> > think we can generate a test via debugger where after the tablesync
> > worker reaches CATCHUP state stop it via debugger, then we can perform
> > some large transaction on the same table which apply worker will skip
> > and tablesync worker will try to apply changes and should fail.
>
> Hello Amit.
>
> FYI - This email is just to confirm that your above idea for debugging
> the tablesync worker does work.
>

Thanks for trying this out.

> ---
>
> I have so far only been trying above with the non-streaming
> subscription, and TBH stepping through this startup state "dance" of
> the tablesync/apply workers is already causing more questions than
> answers for me. Anyway, I will raise any questions as separate emails
> to this one.
>
> BTW, do you think these tablesync discussions should be moved to
> hackers list?
>

Sure. I think it is better to start a new thread for the streaming
issue we have suspected here and possible ways to fix the same. I
guess you have some other observations as well which you might want to
discuss separately.



-- 
With Regards,
Amit Kapila.



On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Nov-04, Amit Kapila wrote:
>
> > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > > * STREAM COMMIT bug?
> > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > >   apparently in a tablesync worker we shouldn't do it.
> >
> > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > where we disable streaming for the init phase. As far as I understand,
> > for tablesync we create the initial slot during which streaming will
> > be disabled then we will copy the table (here logical decoding won't
> > be used) and then allow the apply worker to get any other data which
> > is inserted in the meantime. Now, I might be missing something here
> > but if you can explain it a bit more or share some test to show how we
> > can reach here via tablesync worker then we can discuss the possible
> > solution.
>
> Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> need is just an assert in apply_handle_stream_commit that
> !am_tablesync_worker(), as in the attached patch.  Passes tests.

Hi.

Using the same debugging technique described in a previous mail [1], I
have tested again but this time using a SUBSCRIPTION capable of
streaming.

While paused in the debugger (to force an unusual timing situation) I
can publish INSERTs en masse and cause streaming replication to occur.

To cut a long story short, a tablesync worker CAN in fact end up
processing (e.g. apply_dispatch) streaming messages.
So the tablesync worker CAN get into the apply_handle_stream_commit.
And this scenario, albeit rare, will crash.

For example,
---
Program received signal SIGABRT, Aborted.
0x00007f37570f22c7 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f37570f22c7 in raise () from /lib64/libc.so.6
#1  0x00007f37570f39b8 in abort () from /lib64/libc.so.6
#2  0x0000000000f2bf6a in ExceptionalCondition
(conditionName=0x11330a4 "IsTransactionState()", errorType=0x1133094
"FailedAssertion",
    fileName=0x1133088 "tablesync.c", lineNumber=273) at assert.c:69
#3  0x0000000000bb046e in process_syncing_tables_for_sync
(current_lsn=23984432) at tablesync.c:273
#4  0x0000000000bb1039 in process_syncing_tables
(current_lsn=23984432) at tablesync.c:535
#5  0x0000000000bb5c7a in apply_handle_stream_commit
(s=0x7ffd2d892fd0) at worker.c:1080
#6  0x0000000000bb8edb in apply_dispatch (s=0x7ffd2d892fd0) at worker.c:1966
#7  0x0000000000bb97c9 in LogicalRepApplyLoop (last_received=23984432)
at worker.c:2166
#8  0x0000000000bbca84 in ApplyWorkerMain (main_arg=1) at worker.c:3122
#9  0x0000000000b36f11 in StartBackgroundWorker () at bgworker.c:820
#10 0x0000000000b5cf3a in do_start_bgworker (rw=0x1f3da50) at postmaster.c:5837
#11 0x0000000000b5d661 in maybe_start_bgworkers () at postmaster.c:6062
#12 0x0000000000b5b560 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5223
#13 <signal handler called>
#14 0x00007f37571b10d3 in __select_nocancel () from /lib64/libc.so.6
#15 0x0000000000b52a40 in ServerLoop () at postmaster.c:1691
#16 0x0000000000b51fa8 in PostmasterMain (argc=3, argv=0x1f164f0) at
postmaster.c:1400
#17 0x00000000009ab55d in main (argc=3, argv=0x1f164f0) at main.c:209
---

Also, PSA v2 of my "helper" patch for debugging the tablesync worker.
(This patch has more logging than v1)

---

[1] - https://www.postgresql.org/message-id/CAHut%2BPsprtsa4o89wtNnKLxxwXeDKAX9nNsdghT1Pv63siz%2BAA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения
On Wed, Nov 18, 2020 at 8:18 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 1:29 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2020-Nov-04, Amit Kapila wrote:
> >
> > > On Thu, Oct 15, 2020 at 8:20 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > > > * STREAM COMMIT bug?
> > > >   In apply_handle_stream_commit, we do CommitTransactionCommand, but
> > > >   apparently in a tablesync worker we shouldn't do it.
> > >
> > > In the tablesync stage, we don't allow streaming. See pgoutput_startup
> > > where we disable streaming for the init phase. As far as I understand,
> > > for tablesync we create the initial slot during which streaming will
> > > be disabled then we will copy the table (here logical decoding won't
> > > be used) and then allow the apply worker to get any other data which
> > > is inserted in the meantime. Now, I might be missing something here
> > > but if you can explain it a bit more or share some test to show how we
> > > can reach here via tablesync worker then we can discuss the possible
> > > solution.
> >
> > Hmm, okay, that sounds like there would be no bug then.  Maybe what we
> > need is just an assert in apply_handle_stream_commit that
> > !am_tablesync_worker(), as in the attached patch.  Passes tests.
>
> Hi.
>
> Using the same debugging technique described in a previous mail [1], I
> have tested again but this time using a SUBSCRIPTION capable of
> streaming.
>
> While paused in the debugger (to force an unusual timing situation) I
> can publish INSERTs en masse and cause streaming replication to occur.
>
> To cut a long story short, a tablesync worker CAN in fact end up
> processing (e.g. apply_dispatch) streaming messages.
> So the tablesync worker CAN get into the apply_handle_stream_commit.
> And this scenario, albeit rare, will crash.
>

Thank you for reproducing this issue. Dilip, Peter, is anyone of you
interested in writing a fix for this?

-- 
With Regards,
Amit Kapila.



On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > To cut a long story short, a tablesync worker CAN in fact end up
> > processing (e.g. apply_dispatch) streaming messages.
> > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > And this scenario, albeit rare, will crash.
> >
>
> Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> interested in writing a fix for this?

Hi Amit.

FYI - Sorry, I am away/offline for the next 5 days.

However, if this bug still remains unfixed after next Tuesday then I
can look at it then.

---

IIUC there are 2 options:
1) Disallow streaming for the tablesync worker.
2) Make streaming work for the tablesync worker.

I prefer option (a) not only because of the KISS principle, but also
because this is how the tablesync worker was previously thought to
behave anyway. I expect this fix may be like the code that Dilip
already posted [1]
[1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com

OTOH, option (b) fix may or may not be possible (I don't know), but I
have doubts that it is worthwhile to consider making a special fix for
a scenario which so far has never been reproduced outside of the
debugger.

--

Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > To cut a long story short, a tablesync worker CAN in fact end up
> > > processing (e.g. apply_dispatch) streaming messages.
> > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > And this scenario, albeit rare, will crash.
> > >
> >
> > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > interested in writing a fix for this?
>
> Hi Amit.
>
> FYI - Sorry, I am away/offline for the next 5 days.
>
> However, if this bug still remains unfixed after next Tuesday then I
> can look at it then.
>

Fair enough. Let's see if Dilip or I can get a chance to look into
this before that.

> ---
>
> IIUC there are 2 options:
> 1) Disallow streaming for the tablesync worker.
> 2) Make streaming work for the tablesync worker.
>
> I prefer option (a) not only because of the KISS principle, but also
> because this is how the tablesync worker was previously thought to
> behave anyway. I expect this fix may be like the code that Dilip
> already posted [1]
> [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
>
> OTOH, option (b) fix may or may not be possible (I don't know), but I
> have doubts that it is worthwhile to consider making a special fix for
> a scenario which so far has never been reproduced outside of the
> debugger.
>

I would prefer option (b) unless the fix is not possible due to design
constraints. I don't think it is a good idea to make tablesync workers
behave differently unless we have a reason for doing so.

-- 
With Regards,
Amit Kapila.



On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > processing (e.g. apply_dispatch) streaming messages.
> > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > And this scenario, albeit rare, will crash.
> > > >
> > >
> > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > interested in writing a fix for this?
> >
> > Hi Amit.
> >
> > FYI - Sorry, I am away/offline for the next 5 days.
> >
> > However, if this bug still remains unfixed after next Tuesday then I
> > can look at it then.
> >
>
> Fair enough. Let's see if Dilip or I can get a chance to look into
> this before that.
>
> > ---
> >
> > IIUC there are 2 options:
> > 1) Disallow streaming for the tablesync worker.
> > 2) Make streaming work for the tablesync worker.
> >
> > I prefer option (a) not only because of the KISS principle, but also
> > because this is how the tablesync worker was previously thought to
> > behave anyway. I expect this fix may be like the code that Dilip
> > already posted [1]
> > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> >
> > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > have doubts that it is worthwhile to consider making a special fix for
> > a scenario which so far has never been reproduced outside of the
> > debugger.
> >
>
> I would prefer option (b) unless the fix is not possible due to design
> constraints. I don't think it is a good idea to make tablesync workers
> behave differently unless we have a reason for doing so.
>

Okay, I will analyze this and try to submit my finding today.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > > processing (e.g. apply_dispatch) streaming messages.
> > > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > > And this scenario, albeit rare, will crash.
> > > > >
> > > >
> > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > > interested in writing a fix for this?
> > >
> > > Hi Amit.
> > >
> > > FYI - Sorry, I am away/offline for the next 5 days.
> > >
> > > However, if this bug still remains unfixed after next Tuesday then I
> > > can look at it then.
> > >
> >
> > Fair enough. Let's see if Dilip or I can get a chance to look into
> > this before that.
> >
> > > ---
> > >
> > > IIUC there are 2 options:
> > > 1) Disallow streaming for the tablesync worker.
> > > 2) Make streaming work for the tablesync worker.
> > >
> > > I prefer option (a) not only because of the KISS principle, but also
> > > because this is how the tablesync worker was previously thought to
> > > behave anyway. I expect this fix may be like the code that Dilip
> > > already posted [1]
> > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> > >
> > > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > > have doubts that it is worthwhile to consider making a special fix for
> > > a scenario which so far has never been reproduced outside of the
> > > debugger.
> > >
> >
> > I would prefer option (b) unless the fix is not possible due to design
> > constraints. I don't think it is a good idea to make tablesync workers
> > behave differently unless we have a reason for doing so.
> >
>
> Okay, I will analyze this and try to submit my finding today.

I have done my analysis, basically, the table sync worker is applying
all the changes (for multiple transactions from upstream) under the
single transaction (on downstream).  Now for normal changes, we can
just avoid committing in apply_handle_commit and everything is fine
for streaming changes we also have the transaction at the stream level
which we need to manage the buffiles for storing the streaming
changes.  So if we want to do that for the streaming transaction then
we need to avoid commit transactions on apply_handle_stream_commit as
apply_handle_stream_stop for the table sync worker.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > > > processing (e.g. apply_dispatch) streaming messages.
> > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > > > And this scenario, albeit rare, will crash.
> > > > > >
> > > > >
> > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > > > interested in writing a fix for this?
> > > >
> > > > Hi Amit.
> > > >
> > > > FYI - Sorry, I am away/offline for the next 5 days.
> > > >
> > > > However, if this bug still remains unfixed after next Tuesday then I
> > > > can look at it then.
> > > >
> > >
> > > Fair enough. Let's see if Dilip or I can get a chance to look into
> > > this before that.
> > >
> > > > ---
> > > >
> > > > IIUC there are 2 options:
> > > > 1) Disallow streaming for the tablesync worker.
> > > > 2) Make streaming work for the tablesync worker.
> > > >
> > > > I prefer option (a) not only because of the KISS principle, but also
> > > > because this is how the tablesync worker was previously thought to
> > > > behave anyway. I expect this fix may be like the code that Dilip
> > > > already posted [1]
> > > > [1] https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> > > >
> > > > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > > > have doubts that it is worthwhile to consider making a special fix for
> > > > a scenario which so far has never been reproduced outside of the
> > > > debugger.
> > > >
> > >
> > > I would prefer option (b) unless the fix is not possible due to design
> > > constraints. I don't think it is a good idea to make tablesync workers
> > > behave differently unless we have a reason for doing so.
> > >
> >
> > Okay, I will analyze this and try to submit my finding today.
>
> I have done my analysis, basically, the table sync worker is applying
> all the changes (for multiple transactions from upstream) under the
> single transaction (on downstream).  Now for normal changes, we can
> just avoid committing in apply_handle_commit and everything is fine
> for streaming changes we also have the transaction at the stream level
> which we need to manage the buffiles for storing the streaming
> changes.  So if we want to do that for the streaming transaction then
> we need to avoid commit transactions on apply_handle_stream_commit as
> apply_handle_stream_stop for the table sync worker.
>

And what about apply_handle_stream_abort? And, I guess we need to
avoid other related things like update of
replorigin_session_origin_lsn, replorigin_session_origin_timestamp,
etc. in  apply_handle_stream_commit() as we are apply_handle_commit().

I think it is difficult to have a reliable test case for this but feel
free to propose if you have any ideas on the same.

-- 
With Regards,
Amit Kapila.



On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > > > > processing (e.g. apply_dispatch) streaming messages.
> > > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > > > > And this scenario, albeit rare, will crash.
> > > > > > >
> > > > > >
> > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > > > > interested in writing a fix for this?
> > > > >
> > > > > Hi Amit.
> > > > >
> > > > > FYI - Sorry, I am away/offline for the next 5 days.
> > > > >
> > > > > However, if this bug still remains unfixed after next Tuesday then I
> > > > > can look at it then.
> > > > >
> > > >
> > > > Fair enough. Let's see if Dilip or I can get a chance to look into
> > > > this before that.
> > > >
> > > > > ---
> > > > >
> > > > > IIUC there are 2 options:
> > > > > 1) Disallow streaming for the tablesync worker.
> > > > > 2) Make streaming work for the tablesync worker.
> > > > >
> > > > > I prefer option (a) not only because of the KISS principle, but also
> > > > > because this is how the tablesync worker was previously thought to
> > > > > behave anyway. I expect this fix may be like the code that Dilip
> > > > > already posted [1]
> > > > > [1]
https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> > > > >
> > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > > > > have doubts that it is worthwhile to consider making a special fix for
> > > > > a scenario which so far has never been reproduced outside of the
> > > > > debugger.
> > > > >
> > > >
> > > > I would prefer option (b) unless the fix is not possible due to design
> > > > constraints. I don't think it is a good idea to make tablesync workers
> > > > behave differently unless we have a reason for doing so.
> > > >
> > >
> > > Okay, I will analyze this and try to submit my finding today.
> >
> > I have done my analysis, basically, the table sync worker is applying
> > all the changes (for multiple transactions from upstream) under the
> > single transaction (on downstream).  Now for normal changes, we can
> > just avoid committing in apply_handle_commit and everything is fine
> > for streaming changes we also have the transaction at the stream level
> > which we need to manage the buffiles for storing the streaming
> > changes.  So if we want to do that for the streaming transaction then
> > we need to avoid commit transactions on apply_handle_stream_commit as
> > apply_handle_stream_stop for the table sync worker.
> >
>
> And what about apply_handle_stream_abort? And, I guess we need to
> avoid other related things like update of
> replorigin_session_origin_lsn, replorigin_session_origin_timestamp,
> etc. in  apply_handle_stream_commit() as we are apply_handle_commit().

Yes, we need to change these as well.  I have tested using the POC
patch and working fine.  I will send the patch after some more
testing.
>
> I think it is difficult to have a reliable test case for this but feel
> free to propose if you have any ideas on the same.

I am not sure how to write an automated test case for this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Nov 20, 2020 at 10:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 10:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 18, 2020 at 2:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 18, 2020 at 11:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Nov 18, 2020 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > > To cut a long story short, a tablesync worker CAN in fact end up
> > > > > > > > processing (e.g. apply_dispatch) streaming messages.
> > > > > > > > So the tablesync worker CAN get into the apply_handle_stream_commit.
> > > > > > > > And this scenario, albeit rare, will crash.
> > > > > > > >
> > > > > > >
> > > > > > > Thank you for reproducing this issue. Dilip, Peter, is anyone of you
> > > > > > > interested in writing a fix for this?
> > > > > >
> > > > > > Hi Amit.
> > > > > >
> > > > > > FYI - Sorry, I am away/offline for the next 5 days.
> > > > > >
> > > > > > However, if this bug still remains unfixed after next Tuesday then I
> > > > > > can look at it then.
> > > > > >
> > > > >
> > > > > Fair enough. Let's see if Dilip or I can get a chance to look into
> > > > > this before that.
> > > > >
> > > > > > ---
> > > > > >
> > > > > > IIUC there are 2 options:
> > > > > > 1) Disallow streaming for the tablesync worker.
> > > > > > 2) Make streaming work for the tablesync worker.
> > > > > >
> > > > > > I prefer option (a) not only because of the KISS principle, but also
> > > > > > because this is how the tablesync worker was previously thought to
> > > > > > behave anyway. I expect this fix may be like the code that Dilip
> > > > > > already posted [1]
> > > > > > [1]
https://www.postgresql.org/message-id/CAFiTN-uUgKpfdbwSGnn3db3mMQAeviOhQvGWE_pC9icZF7VDKg%40mail.gmail.com
> > > > > >
> > > > > > OTOH, option (b) fix may or may not be possible (I don't know), but I
> > > > > > have doubts that it is worthwhile to consider making a special fix for
> > > > > > a scenario which so far has never been reproduced outside of the
> > > > > > debugger.
> > > > > >
> > > > >
> > > > > I would prefer option (b) unless the fix is not possible due to design
> > > > > constraints. I don't think it is a good idea to make tablesync workers
> > > > > behave differently unless we have a reason for doing so.
> > > > >
> > > >
> > > > Okay, I will analyze this and try to submit my finding today.
> > >
> > > I have done my analysis, basically, the table sync worker is applying
> > > all the changes (for multiple transactions from upstream) under the
> > > single transaction (on downstream).  Now for normal changes, we can
> > > just avoid committing in apply_handle_commit and everything is fine
> > > for streaming changes we also have the transaction at the stream level
> > > which we need to manage the buffiles for storing the streaming
> > > changes.  So if we want to do that for the streaming transaction then
> > > we need to avoid commit transactions on apply_handle_stream_commit as
> > > apply_handle_stream_stop for the table sync worker.
> > >
> >
> > And what about apply_handle_stream_abort? And, I guess we need to
> > avoid other related things like update of
> > replorigin_session_origin_lsn, replorigin_session_origin_timestamp,
> > etc. in  apply_handle_stream_commit() as we are apply_handle_commit().
>
> Yes, we need to change these as well.  I have tested using the POC
> patch and working fine.  I will send the patch after some more
> testing.
> >
> > I think it is difficult to have a reliable test case for this but feel
> > free to propose if you have any ideas on the same.
>
> I am not sure how to write an automated test case for this.

Here is the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения
On Fri, Nov 20, 2020 at 11:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > And what about apply_handle_stream_abort? And, I guess we need to
> > > avoid other related things like update of
> > > replorigin_session_origin_lsn, replorigin_session_origin_timestamp,
> > > etc. in  apply_handle_stream_commit() as we are apply_handle_commit().
> >
> > Yes, we need to change these as well.  I have tested using the POC
> > patch and working fine.  I will send the patch after some more
> > testing.
> > >
> > > I think it is difficult to have a reliable test case for this but feel
> > > free to propose if you have any ideas on the same.
> >
> > I am not sure how to write an automated test case for this.
>
> Here is the patch.
>

Few comments:
==============
1.
+ /* The synchronization worker runs in single transaction. */
+ if (!am_tablesync_worker())
+ {
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = commit_data.end_lsn;
+ replorigin_session_origin_timestamp = commit_data.committime;
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ store_flush_position(commit_data.end_lsn);
+ }
+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }

Here, why the check IsTransactionState() is not there similar to
apply_handle_commit? Also, this code looks the same as in
apply_handle_commit(), can we move this into a common function say
apply_handle_commit_internal or something like that? If we decide to
do so then we can move a few more things as mentioned below in the
common function:

apply_handle_commit()
{
..
in_remote_transaction = false;

/* Process any tables that are being synchronized in parallel. */
process_syncing_tables(commit_data.end_lsn);
..
}

2.
@@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
  {
  /* Cleanup the subxact info */
  cleanup_subxact_info();
- CommitTransactionCommand();
+
+ if (!am_tablesync_worker())
+ CommitTransactionCommand();

Here, also you can add a comment: "/* The synchronization worker runs
in single transaction. */"

-- 
With Regards,
Amit Kapila.



On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 20, 2020 at 11:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Nov 20, 2020 at 11:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 11:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > And what about apply_handle_stream_abort? And, I guess we need to
> > > > avoid other related things like update of
> > > > replorigin_session_origin_lsn, replorigin_session_origin_timestamp,
> > > > etc. in  apply_handle_stream_commit() as we are apply_handle_commit().
> > >
> > > Yes, we need to change these as well.  I have tested using the POC
> > > patch and working fine.  I will send the patch after some more
> > > testing.
> > > >
> > > > I think it is difficult to have a reliable test case for this but feel
> > > > free to propose if you have any ideas on the same.
> > >
> > > I am not sure how to write an automated test case for this.
> >
> > Here is the patch.
> >
>
> Few comments:
> ==============
> 1.
> + /* The synchronization worker runs in single transaction. */
> + if (!am_tablesync_worker())
> + {
> + /*
> + * Update origin state so we can restart streaming from correct position
> + * in case of crash.
> + */
> + replorigin_session_origin_lsn = commit_data.end_lsn;
> + replorigin_session_origin_timestamp = commit_data.committime;
> + CommitTransactionCommand();
> + pgstat_report_stat(false);
> + store_flush_position(commit_data.end_lsn);
> + }
> + else
> + {
> + /* Process any invalidation messages that might have accumulated. */
> + AcceptInvalidationMessages();
> + maybe_reread_subscription();
> + }
>
> Here, why the check IsTransactionState() is not there similar to
> apply_handle_commit? Also, this code looks the same as in
> apply_handle_commit(), can we move this into a common function say
> apply_handle_commit_internal or something like that? If we decide to
> do so then we can move a few more things as mentioned below in the
> common function:

Moved to the common function as suggested.

> apply_handle_commit()
> {
> ..
> in_remote_transaction = false;
>
> /* Process any tables that are being synchronized in parallel. */
> process_syncing_tables(commit_data.end_lsn);
> ..
> }

This as well.

> 2.
> @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
>   {
>   /* Cleanup the subxact info */
>   cleanup_subxact_info();
> - CommitTransactionCommand();
> +
> + if (!am_tablesync_worker())
> + CommitTransactionCommand();
>
> Here, also you can add a comment: "/* The synchronization worker runs
> in single transaction. */"
>

Done

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения
On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2.
> > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
> >   {
> >   /* Cleanup the subxact info */
> >   cleanup_subxact_info();
> > - CommitTransactionCommand();
> > +
> > + if (!am_tablesync_worker())
> > + CommitTransactionCommand();
> >
> > Here, also you can add a comment: "/* The synchronization worker runs
> > in single transaction. */"
> >
>
> Done
>

Okay, thanks. I have slightly changed the comments and moved the newly
added function in the attached patch. I have tested the reported
scenario and additionally verified that the fix is good even if the
tablesync worker processed the partial transaction due to streaming.
This won't do any harm because later apply worker will replay the
entire transaction. This could be a problem if the apply worker also
tries to stream the transaction between the SUBREL_STATE_CATCHUP and
SUBREL_STATE_SYNCDONE state because then apply worker might have
skipped applying the partial transactions processed by tablesync
worker. But, I have checked that the apply worker waits for sync
worker to complete its processing between these two states. See
process_syncing_tables_for_apply. Does this make sense?

Peter, can you also please once test the attached and see if this
fixes the problem for you as well?

-- 
With Regards,
Amit Kapila.

Вложения
On Mon, Nov 23, 2020 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 23, 2020 at 10:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Sat, Nov 21, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 2.
> > > @@ -902,7 +906,9 @@ apply_handle_stream_abort(StringInfo s)
> > >   {
> > >   /* Cleanup the subxact info */
> > >   cleanup_subxact_info();
> > > - CommitTransactionCommand();
> > > +
> > > + if (!am_tablesync_worker())
> > > + CommitTransactionCommand();
> > >
> > > Here, also you can add a comment: "/* The synchronization worker runs
> > > in single transaction. */"
> > >
> >
> > Done
> >
>
> Okay, thanks. I have slightly changed the comments and moved the newly
> added function in the attached patch.

Okay, looks good to me.

 I have tested the reported
> scenario and additionally verified that the fix is good even if the
> tablesync worker processed the partial transaction due to streaming.
> This won't do any harm because later apply worker will replay the
> entire transaction. This could be a problem if the apply worker also
> tries to stream the transaction between the SUBREL_STATE_CATCHUP and
> SUBREL_STATE_SYNCDONE state because then apply worker might have
> skipped applying the partial transactions processed by tablesync
> worker. But, I have checked that the apply worker waits for sync
> worker to complete its processing between these two states.

Right

 See
> process_syncing_tables_for_apply. Does this make sense?

Yes, it makes sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Peter, can you also please once test the attached and see if this
> fixes the problem for you as well?

I have reviewed the v3 patch code, and repeated the tablesync testing
(for a stream-enabled SUBSCRIPTION) using the debugging technique
previously described [1].
https://www.postgresql.org/message-id/CAHut%2BPt4PyKQCwqzQ%3DEFF%3DbpKKJD7XKt_S23F6L20ayQNxg77A%40mail.gmail.com

Now the test was successful. The stream start/stop/commit/abort are
all being handled by the tablesync worker without errors.

===

TESTING

I performed the tablesync stream test by multiple different ways of
inserting the en masse data:

1. Cause streaming (using a PREPARE TRANSACTION)
------------------------------------------------
psql -d test_pub -c "BEGIN;INSERT INTO test_tab SELECT i, md5(i::text)
FROM generate_series(3, 5000) s(i);PREPARE TRANSACTION
'test_prepared_tab';"
psql -d test_pub -c "COMMIT PREPARED 'test_prepared_tab';"

tablesync does 'S' (stream start)
tablesync does 'R'
tablesync does 'I' (insert) x n
tablesync does 'E' (stream end)
tablesync does 'S'
tablesync does 'I' x n
tablesync does 'E'
tablesync does { 'S',  'I' x n, 'E' } repeats 13 more times
tablesync does 'c' (stream commit)
- In handle_stream_commit the tablesync worker processes the spool
file, calling apply_dispatch for all the messages
- 'R'
- { 'I' + should_apply_changes_for_rel = true } x 5000
- then calls process_syncing_tables_for_sync, which sets the state to
SUBREL_STATE_SYNCDONE
- then tablesync worker process exits

Now the "apply" worker catches up.
- it runs all the same messages again but does nothing because
should_apply_changes_for_rel = false


2. Cause streaming (not using a PREPARE TRANSACTION)
----------------------------------------------------
psql -d test_pub -c "INSERT INTO test_tab SELECT i, md5(i::text) FROM
generate_series(3, 5000) s(i);"

gives very similar results to above


3. Causing a stream abort to happen
-----------------------------------
psql -d test_pub -c "BEGIN; INSERT INTO test_tab SELECT i,
md5(i::text) FROM generate_series(3, 5000) s(i); INSERT INTO test_tab
VALUES(4999, 'already exists'); END;"

tablesync does 'S' (stream start)
tablesync does 'E' (stream end)
tablesync does 'A' (stream abort)
tablesync does 'B'
tablesync does 'C'
- calls process_syncing_tables
- tablesync worker process exits
apply worker continues...

===

REVIEW COMMENTS

Despite the tests working OK I thought there should be a couple small
changes made for this patch, as follows.

(1) process_sync_tables should be last

IMO the process_syncing_tables should be always the *last* thing any
apply handler does, because it seems a bad idea for the worker to
change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
there is still a little bit more processing remaining. This is why I
made the same modification in the other WIP patch v26-0006 [2]
[2] https://www.postgresql.org/message-id/CAHut%2BPuQcLbjrBXM54%2By%2B%3DYsmEDFd3CCtp31K-_jS4Xka2vwbQ%40mail.gmail.com

So, I think the process_syncing_tables() call should be put *after*
the stream_cleanup_files() in function apply_handle_stream_commit().
But to do this means the process_syncing_tables has to be removed from
your new apply_handle_commit_internal function.


(2) misleading comment

/*
 * Start a transaction on stream start, this transaction will be committed
 * on the stream stop. We need the transaction for handling the buffile,
 * used for serializing the streaming data and subxact info.
 */

That above comment (in the apply_handle_stream_start function) may now
be inaccurate/misleading for the case of tablesync worker, because I
think for tablesync you are explicitly avoiding the tx commit within
the apply_handle_stream_stop().

---

Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, Nov 25, 2020 at 11:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Nov 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> REVIEW COMMENTS
>
> Despite the tests working OK
>

Thanks for the tests.

> I thought there should be a couple small
> changes made for this patch, as follows.
>
> (1) process_sync_tables should be last
>
> IMO the process_syncing_tables should be always the *last* thing any
> apply handler does, because it seems a bad idea for the worker to
> change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
> there is still a little bit more processing remaining.
>

Hmm, what makes you think it is a bad idea, is there any comment which
makes you feel so? As far as I understand, the only thing tablesync
worker needs to ensure before reaching that state is it should apply
till the required lsn which is done by the time it is called in the
patch. I think doing it sooner is better because it will let apply
worker start its work. In any case, this is not related to this
bug-fix patch, so, if you want to build a case for doing it later then
we can discuss it separately.

>
> (2) misleading comment
>
> /*
>  * Start a transaction on stream start, this transaction will be committed
>  * on the stream stop. We need the transaction for handling the buffile,
>  * used for serializing the streaming data and subxact info.
>  */
>
> That above comment (in the apply_handle_stream_start function) may now
> be inaccurate/misleading for the case of tablesync worker, because I
> think for tablesync you are explicitly avoiding the tx commit within
> the apply_handle_stream_stop().
>

Okay, I think we can slightly adjust the comment as: "Start a
transaction on stream start, this transaction will be committed on the
stream stop unless it is a tablesync worker in which case it will be
committed after processing all the messages. We need the transaction
for handling the buffile, used for serializing the streaming data and
subxact info.".

I can update the patch once we agree on the previous point.

-- 
With Regards,
Amit Kapila.



On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I thought there should be a couple small
> > changes made for this patch, as follows.
> >
> > (1) process_sync_tables should be last
> >
> > IMO the process_syncing_tables should be always the *last* thing any
> > apply handler does, because it seems a bad idea for the worker to
> > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
> > there is still a little bit more processing remaining.
> >
>
> Hmm, what makes you think it is a bad idea, is there any comment which
> makes you feel so? As far as I understand, the only thing tablesync
> worker needs to ensure before reaching that state is it should apply
> till the required lsn which is done by the time it is called in the
> patch. I think doing it sooner is better because it will let apply
> worker start its work. In any case, this is not related to this
> bug-fix patch, so, if you want to build a case for doing it later then
> we can discuss it separately.

Yes, this review issue is independent of the initial streaming bug the
patch is solving. But by refactoring this exact line of code into a
new function apply_handle_commit_internal() I felt the patch is kind
of now taking ownership of it - so it is not *really* unrelated to
this patch anymore...

(However if you prefer to treat this review item as a separate issue
it is fine by me).

AFAIK the process_syncing_tables function acts like the main handshake
mechanism between the "tablesync" and "apply" workers.
e.g. From "tablesync" worker POV - am I finished? Should I tell the
apply worker I am DONE? Is it time to exit?
e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed?

I do think these handshake functions ought to be consistently located
(e.g always kept last in the apply handler functions where possible).

It is true, a slightly different placement (e.g. like current code)
still passes the tests, but I see more problems than benefits to keep
it that way. IIUC for the tablesync worker to be executing some of
these handlers at all is a quite a rare occurrence caused by unusual
timing. I think putting the process_syncing_tables() call earlier with
the objective to allow apply worker to proceed a few ms earlier (for
the already rare case) is not worth it.

OTOH, putting the handshake function consistently last does have benefits:
- the consistent code is easier to read
- none of the tablesync timing stuff is very easy to test (certainly
not with automated regression tests). Keeping code consistent means
less risk of introducing some unforeseen/untestable issue
- that stream_cleanup_files(...) called by
apply_handle_stream_commit() is supposed to be cleaning up file
resources. IIUC by allowing the tablesync worker to call
process_syncing_tables() before this cleanup it means the tablesync
worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit
the process! So in this scenario that file resource cleanup will never
get a chance to happen at all. Isn't that just a plain bug?

>
> >
> > (2) misleading comment
> >
> > /*
> >  * Start a transaction on stream start, this transaction will be committed
> >  * on the stream stop. We need the transaction for handling the buffile,
> >  * used for serializing the streaming data and subxact info.
> >  */
> >
> > That above comment (in the apply_handle_stream_start function) may now
> > be inaccurate/misleading for the case of tablesync worker, because I
> > think for tablesync you are explicitly avoiding the tx commit within
> > the apply_handle_stream_stop().
> >
>
> Okay, I think we can slightly adjust the comment as: "Start a
> transaction on stream start, this transaction will be committed on the
> stream stop unless it is a tablesync worker in which case it will be
> committed after processing all the messages. We need the transaction
> for handling the buffile, used for serializing the streaming data and
> subxact info.".

The reworded comment looks ok now.

---

Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Nov 26, 2020 at 6:47 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I thought there should be a couple small
> > > changes made for this patch, as follows.
> > >
> > > (1) process_sync_tables should be last
> > >
> > > IMO the process_syncing_tables should be always the *last* thing any
> > > apply handler does, because it seems a bad idea for the worker to
> > > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
> > > there is still a little bit more processing remaining.
> > >
> >
> > Hmm, what makes you think it is a bad idea, is there any comment which
> > makes you feel so? As far as I understand, the only thing tablesync
> > worker needs to ensure before reaching that state is it should apply
> > till the required lsn which is done by the time it is called in the
> > patch. I think doing it sooner is better because it will let apply
> > worker start its work. In any case, this is not related to this
> > bug-fix patch, so, if you want to build a case for doing it later then
> > we can discuss it separately.
>
> Yes, this review issue is independent of the initial streaming bug the
> patch is solving. But by refactoring this exact line of code into a
> new function apply_handle_commit_internal() I felt the patch is kind
> of now taking ownership of it - so it is not *really* unrelated to
> this patch anymore...
>
> (However if you prefer to treat this review item as a separate issue
> it is fine by me).
>

Let's try to see if we can have a common understanding before I push
this. I don't think this is an issue neither did you presented any
theory or test why you believe it is an issue.

> AFAIK the process_syncing_tables function acts like the main handshake
> mechanism between the "tablesync" and "apply" workers.
> e.g. From "tablesync" worker POV - am I finished? Should I tell the
> apply worker I am DONE? Is it time to exit?
> e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed?
>

Here, I think we need to have a common understanding of finished and
or DONE. It is not defined by where that function is called but when
it is appropriate to call it. I have tried to explain in my previous
response but not sure you have read it carefully.

> I do think these handshake functions ought to be consistently located
> (e.g always kept last in the apply handler functions where possible).
>

There is no harm in doing so but you haven't presented any theory
which makes me feel that is correct or more appropriate.

> It is true, a slightly different placement (e.g. like current code)
> still passes the tests,
>

Hmm, it is not only about passing the tests. It is not always true
that if tests pass, the code is correct especially in such cases where
writing test is not feasible. We need to understand the design
principle behind doing so which I am trying to explain.

> but I see more problems than benefits to keep
> it that way. IIUC for the tablesync worker to be executing some of
> these handlers at all is a quite a rare occurrence caused by unusual
> timing.

It is due to the reason that we want tablesync worker to apply the WAL
up to LSN apply worker has read by that time so that the apply worker
can continue from there.

> I think putting the process_syncing_tables() call earlier with
> the objective to allow apply worker to proceed a few ms earlier (for
> the already rare case) is not worth it.
>

I think that is not the primary objective. It is also that having it
in common function as the patch is doing allows us to not accidentally
remove it or forget calling it from some other similar place. OTOH, I
don't see any clear advantage of moving out of the common function.

> OTOH, putting the handshake function consistently last does have benefits:
> - the consistent code is easier to read
> - none of the tablesync timing stuff is very easy to test (certainly
> not with automated regression tests). Keeping code consistent means
> less risk of introducing some unforeseen/untestable issue
>
> - that stream_cleanup_files(...) called by
> apply_handle_stream_commit() is supposed to be cleaning up file
> resources. IIUC by allowing the tablesync worker to call
> process_syncing_tables() before this cleanup it means the tablesync
> worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit
> the process! So in this scenario that file resource cleanup will never
> get a chance to happen at all. Isn't that just a plain bug?
>

No, we clean up the files on process exit (via
SharedFileSetDeleteOnProcExit). If we don't have such a mechism, it
would be a problem in more number of cases then you describe. See
comments atop src/backend/replication/logical/worker.c (especially the
STREAMED TRANSACTIONS section).

> >
> > >
> > > (2) misleading comment
> > >
> > > /*
> > >  * Start a transaction on stream start, this transaction will be committed
> > >  * on the stream stop. We need the transaction for handling the buffile,
> > >  * used for serializing the streaming data and subxact info.
> > >  */
> > >
> > > That above comment (in the apply_handle_stream_start function) may now
> > > be inaccurate/misleading for the case of tablesync worker, because I
> > > think for tablesync you are explicitly avoiding the tx commit within
> > > the apply_handle_stream_stop().
> > >
> >
> > Okay, I think we can slightly adjust the comment as: "Start a
> > transaction on stream start, this transaction will be committed on the
> > stream stop unless it is a tablesync worker in which case it will be
> > committed after processing all the messages. We need the transaction
> > for handling the buffile, used for serializing the streaming data and
> > subxact info.".
>
> The reworded comment looks ok now.
>

Okay, will change once we agree on previous point.

-- 
With Regards,
Amit Kapila.



On Thu, Nov 26, 2020 at 1:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 26, 2020 at 6:47 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Nov 25, 2020 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > I thought there should be a couple small
> > > > changes made for this patch, as follows.
> > > >
> > > > (1) process_sync_tables should be last
> > > >
> > > > IMO the process_syncing_tables should be always the *last* thing any
> > > > apply handler does, because it seems a bad idea for the worker to
> > > > change state (e.g. to say it is SUBREL_STATE_SYNCDONE) if in fact
> > > > there is still a little bit more processing remaining.
> > > >
> > >
> > > Hmm, what makes you think it is a bad idea, is there any comment which
> > > makes you feel so? As far as I understand, the only thing tablesync
> > > worker needs to ensure before reaching that state is it should apply
> > > till the required lsn which is done by the time it is called in the
> > > patch. I think doing it sooner is better because it will let apply
> > > worker start its work. In any case, this is not related to this
> > > bug-fix patch, so, if you want to build a case for doing it later then
> > > we can discuss it separately.
> >
> > Yes, this review issue is independent of the initial streaming bug the
> > patch is solving. But by refactoring this exact line of code into a
> > new function apply_handle_commit_internal() I felt the patch is kind
> > of now taking ownership of it - so it is not *really* unrelated to
> > this patch anymore...
> >
> > (However if you prefer to treat this review item as a separate issue
> > it is fine by me).
> >
>
> Let's try to see if we can have a common understanding before I push
> this. I don't think this is an issue neither did you presented any
> theory or test why you believe it is an issue.
>

Right, there is no "issue", I only felt current code is more confusing
that it needs to be:

- Since process_syncing_tables is called many times by apply handlers
I prefered a common code pattern (e.g. always last) because I would
find it easier to understand. YMMV.

- And I also thought it would be neater to simply move the
process_syncing_tables by one line so you can do the stream file
resource cleanup explicitly the normal way instead of relying on
implicit cleanup as the tablesync process exits.

But I also understand you may choose to leave everything as-is and
that is still OK too.

> > AFAIK the process_syncing_tables function acts like the main handshake
> > mechanism between the "tablesync" and "apply" workers.
> > e.g. From "tablesync" worker POV - am I finished? Should I tell the
> > apply worker I am DONE? Is it time to exit?
> > e.g. From "apply" worker POV - have the tablesyncs caught up yet? Can I proceed?
> >
>
> Here, I think we need to have a common understanding of finished and
> or DONE. It is not defined by where that function is called but when
> it is appropriate to call it. I have tried to explain in my previous
> response but not sure you have read it carefully.
>
> > I do think these handshake functions ought to be consistently located
> > (e.g always kept last in the apply handler functions where possible).
> >
>
> There is no harm in doing so but you haven't presented any theory
> which makes me feel that is correct or more appropriate.
>
> > It is true, a slightly different placement (e.g. like current code)
> > still passes the tests,
> >
>
> Hmm, it is not only about passing the tests. It is not always true
> that if tests pass, the code is correct especially in such cases where
> writing test is not feasible. We need to understand the design
> principle behind doing so which I am trying to explain.
>
> > but I see more problems than benefits to keep
> > it that way. IIUC for the tablesync worker to be executing some of
> > these handlers at all is a quite a rare occurrence caused by unusual
> > timing.
>
> It is due to the reason that we want tablesync worker to apply the WAL
> up to LSN apply worker has read by that time so that the apply worker
> can continue from there.

I think it is not always/only "up to LSN apply worker has read". IIUC,
my tests of deliberately sending messages while tablesync is paused in
the debugger means the tablesync worker will get *ahead* of the apply
worker.

>
> > I think putting the process_syncing_tables() call earlier with
> > the objective to allow apply worker to proceed a few ms earlier (for
> > the already rare case) is not worth it.
> >
>
> I think that is not the primary objective. It is also that having it
> in common function as the patch is doing allows us to not accidentally
> remove it or forget calling it from some other similar place. OTOH, I
> don't see any clear advantage of moving out of the common function.
>
> > OTOH, putting the handshake function consistently last does have benefits:
> > - the consistent code is easier to read
> > - none of the tablesync timing stuff is very easy to test (certainly
> > not with automated regression tests). Keeping code consistent means
> > less risk of introducing some unforeseen/untestable issue
> >
> > - that stream_cleanup_files(...) called by
> > apply_handle_stream_commit() is supposed to be cleaning up file
> > resources. IIUC by allowing the tablesync worker to call
> > process_syncing_tables() before this cleanup it means the tablesync
> > worker may decide that it is SUBREL_STATE_SYNCDONE and then just exit
> > the process! So in this scenario that file resource cleanup will never
> > get a chance to happen at all. Isn't that just a plain bug?
> >
>
> No, we clean up the files on process exit (via
> SharedFileSetDeleteOnProcExit). If we don't have such a mechism, it
> would be a problem in more number of cases then you describe. See
> comments atop src/backend/replication/logical/worker.c (especially the
> STREAMED TRANSACTIONS section).

Got it. Thanks.

---

Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Nov 26, 2020 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
>
> - And I also thought it would be neater to simply move the
> process_syncing_tables by one line so you can do the stream file
> resource cleanup explicitly the normal way instead of relying on
> implicit cleanup as the tablesync process exits.
>

Okay, I see your point, so changed it accordingly and fixed the code
comments as suggested by you. Does this address all your
concerns/suggestions?

-- 
With Regards,
Amit Kapila.

Вложения
On Thu, Nov 26, 2020 at 11:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Okay, I see your point, so changed it accordingly and fixed the code
> comments as suggested by you. Does this address all your
> concerns/suggestions?

Yes. Now the v4 patch addresses everything I previously mentioned.

However, today I did notice one more small thing you may want to change.

===

REVIEW COMMENT:

(1) missing comment?

@@ -928,7 +917,9 @@ apply_handle_stream_abort(StringInfo s)

  /* write the updated subxact list */
  subxact_info_write(MyLogicalRepWorker->subid, xid);
- CommitTransactionCommand();
+
+ if (!am_tablesync_worker())
+ CommitTransactionCommand();
  }
 }

That new condition seems to be missing a comment saying "/* The
synchronization worker runs in a single transaction */". Such a
comment accompanies all other am_tablesync_worker() conditions that
look like this one.

===

I also re-ran the tablesync test using the v4 patch, and have stepped
to see the normal stream file normal cleanup being executed by the
tablesync worker.

So apart from that trivial missing comment, I think now it is all good.

---

Kind Regards,
Peter Smith.
Fujitsu Australia



On Fri, Nov 27, 2020 at 7:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Nov 26, 2020 at 11:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Okay, I see your point, so changed it accordingly and fixed the code
> > comments as suggested by you. Does this address all your
> > concerns/suggestions?
>
> Yes. Now the v4 patch addresses everything I previously mentioned.
>
> However, today I did notice one more small thing you may want to change.
>
> ===
>
> REVIEW COMMENT:
>
> (1) missing comment?
>
> @@ -928,7 +917,9 @@ apply_handle_stream_abort(StringInfo s)
>
>   /* write the updated subxact list */
>   subxact_info_write(MyLogicalRepWorker->subid, xid);
> - CommitTransactionCommand();
> +
> + if (!am_tablesync_worker())
> + CommitTransactionCommand();
>   }
>  }
>
> That new condition seems to be missing a comment saying "/* The
> synchronization worker runs in a single transaction */". Such a
> comment accompanies all other am_tablesync_worker() conditions that
> look like this one.
>

Yeah, I had intentionally left it because, in the same function, few
lines before we have that condition and comment, so adding it again
didn't appear to be a good idea to me.

> ===
>
> I also re-ran the tablesync test using the v4 patch, and have stepped
> to see the normal stream file normal cleanup being executed by the
> tablesync worker.
>
> So apart from that trivial missing comment, I think now it is all good.
>

Cool, I'm planning to push this in a few minutes time.

-- 
With Regards,
Amit Kapila.



On Fri, Nov 27, 2020 at 7:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Nov 27, 2020 at 7:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
>
> > ===
> >
> > I also re-ran the tablesync test using the v4 patch, and have stepped
> > to see the normal stream file normal cleanup being executed by the
> > tablesync worker.
> >
> > So apart from that trivial missing comment, I think now it is all good.
> >
>
> Cool, I'm planning to push this in a few minutes time.
>

Pushed.

-- 
With Regards,
Amit Kapila.