Обсуждение: pgsql: Allow multiple xacts during table sync in logical replication.

Поиск
Список
Период
Сортировка

pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
Allow multiple xacts during table sync in logical replication.

For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.

There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.

This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.

The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.

This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.

Bump catalog version due to change of state in the catalog
(pg_subscription_rel).

Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e

Modified Files
--------------
doc/src/sgml/catalogs.sgml                         |   1 +
doc/src/sgml/logical-replication.sgml              |  59 ++-
doc/src/sgml/ref/alter_subscription.sgml           |  18 +
doc/src/sgml/ref/drop_subscription.sgml            |   6 +-
src/backend/access/transam/xact.c                  |  11 -
src/backend/catalog/pg_subscription.c              |  38 ++
src/backend/commands/subscriptioncmds.c            | 467 ++++++++++++++++-----
.../libpqwalreceiver/libpqwalreceiver.c            |   8 +
src/backend/replication/logical/launcher.c         | 147 -------
src/backend/replication/logical/tablesync.c        | 236 +++++++++--
src/backend/replication/logical/worker.c           |  18 +-
src/backend/tcop/utility.c                         |   3 +-
src/include/catalog/catversion.h                   |   2 +-
src/include/catalog/pg_subscription_rel.h          |   2 +
src/include/commands/subscriptioncmds.h            |   2 +-
src/include/replication/logicallauncher.h          |   2 -
src/include/replication/slot.h                     |   3 +
src/include/replication/walreceiver.h              |   1 +
src/include/replication/worker_internal.h          |   3 +-
src/test/regress/expected/subscription.out         |  21 +
src/test/regress/sql/subscription.sql              |  22 +
src/test/subscription/t/004_sync.pl                |  21 +-
src/tools/pgindent/typedefs.list                   |   2 +-
23 files changed, 767 insertions(+), 326 deletions(-)


Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Fri, Feb 12, 2021 at 7:50 AM Amit Kapila <akapila@postgresql.org> wrote:
>
> Allow multiple xacts during table sync in logical replication.
>

I noticed one failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12

Checking the same.

-- 
With Regards,
Amit Kapila.



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> I noticed one failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
> Checking the same.

I think thorntail runs with nondefault prevailing wal_level, might be
related.

            regards, tom lane



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Fri, Feb 12, 2021 at 8:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 12, 2021 at 7:50 AM Amit Kapila <akapila@postgresql.org> wrote:
> >
> > Allow multiple xacts during table sync in logical replication.
> >
>
> I noticed one failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
>
> Checking the same.
>

I think I know the reason. It is because, in the test added by patch,
we are trying to connect to the publisher and
wal_level/max_wal_senders configuration is not suitable for it. I'll
look into changing the test.

-- 
With Regards,
Amit Kapila.



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Fri, Feb 12, 2021 at 8:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > I noticed one failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
> > Checking the same.
>
> I think thorntail runs with nondefault prevailing wal_level, might be
> related.
>

Right, it is due to max_wal_senders nondefault config parameter. I
think we need to slightly tweak the test so that we don't initially
try to connect, then enable the subscription later with different
command and then perform the rest of the commands.

-- 
With Regards,
Amit Kapila.



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Fri, Feb 12, 2021 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 12, 2021 at 8:47 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > I noticed one failure:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-02-12%2002%3A28%3A12
> > > Checking the same.
> >
> > I think thorntail runs with nondefault prevailing wal_level, might be
> > related.
> >
>
> Right, it is due to max_wal_senders nondefault config parameter. I
> think we need to slightly tweak the test so that we don't initially
> try to connect, then enable the subscription later with different
> command and then perform the rest of the commands.
>

I have pushed the fix for this and waiting for new results of
'thorntail'. In the meantime, I see another failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2021-02-12%2003%3A15%3A43

 select count(*) > 0 as ok from pg_ls_waldir();
- ok
-----
- t
-(1 row)
-
+ERROR:  could not stat file "pg_wal/000000010000000000000072":
Permission denied

I see the exact same failure 66 days ago on the same machine:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=walleye&dt=2020-12-07%2016%3A15%3A46

This doesn't look to be related to this commit and I am not sure why
it is failing occasionally, maybe due to some machine issues?

-- 
With Regards,
Amit Kapila.



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Tom Lane
Дата:
Various buildfarm members are complaining about this patch, eg

 caiman        | 2021-02-12 15:00:19 | tablesync.c:885:70: warning: argument 3 of type 'char[64]' with mismatched bound
[-Warray-parameter=]
 caiman        | 2021-02-12 15:00:19 | tablesync.c:904:72: warning: argument 3 of type 'char[64]' with mismatched bound
[-Warray-parameter=]

That's because of the inconsistency between

extern void ReplicationOriginNameForTablesync(Oid suboid, Oid relid, char *originname);

and

void
ReplicationOriginNameForTablesync(Oid suboid, Oid relid,
                                  char originname[NAMEDATALEN])

Don't do that.  Quite aside from the inconsistency, this is pretty
darn unsafe coding technique, because there is exactly nothing
guaranteeing that the caller passes a buffer of the length the
function expects.

I'm not real sure that you could expect a compiler warning for that
even if you'd put the length declaration where callers could see it.
So personally I'd avoid hard-wiring NAMEDATALEN into this API at all,
and have the caller pass sizeof(its buffer) instead.

            regards, tom lane



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Sat, Feb 13, 2021 at 4:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Don't do that.  Quite aside from the inconsistency, this is pretty
> darn unsafe coding technique, because there is exactly nothing
> guaranteeing that the caller passes a buffer of the length the
> function expects.
>
> I'm not real sure that you could expect a compiler warning for that
> even if you'd put the length declaration where callers could see it.
> So personally I'd avoid hard-wiring NAMEDATALEN into this API at all,
> and have the caller pass sizeof(its buffer) instead.
>

Thanks for pointing it out. I'll look into this. BTW, how one can
check such reports, do one need to check all make logs for buildfarm
members?

-- 
With Regards,
Amit Kapila.



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Thanks for pointing it out. I'll look into this. BTW, how one can
> check such reports, do one need to check all make logs for buildfarm
> members?

Yeah, I have a script that scans the most recent buildfarm runs for
compiler warnings.  I run it every few weeks typically (it's not
terribly cheap).  Today I was using it to look for complaints
about the 'no-sanitize' stuff, so I happened to notice these new
warnings from your patch.

            regards, tom lane



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Sat, Feb 13, 2021 at 6:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Feb 13, 2021 at 4:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Don't do that.  Quite aside from the inconsistency, this is pretty
> > darn unsafe coding technique, because there is exactly nothing
> > guaranteeing that the caller passes a buffer of the length the
> > function expects.
> >
> > I'm not real sure that you could expect a compiler warning for that
> > even if you'd put the length declaration where callers could see it.
> > So personally I'd avoid hard-wiring NAMEDATALEN into this API at all,
> > and have the caller pass sizeof(its buffer) instead.
> >
>
> Thanks for pointing it out. I'll look into this.
>

Attached should fix the reported warnings but as I am not getting
those warnings on my system so can't confirm but otherwise, the patch
works as expected. I am planning to push this unless you or Peter have
any comments.

-- 
With Regards,
Amit Kapila.

Вложения

Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Attached should fix the reported warnings but as I am not getting
> those warnings on my system so can't confirm but otherwise, the patch
> works as expected. I am planning to push this unless you or Peter have
> any comments.

Personally I'd get rid of the option for ReplicationSlotNameForTablesync
to allocate the result buffer.  There's only one caller using that,
so it's saving no code to have ReplicationSlotNameForTablesync do the
alloc rather than that one caller; and it seems mighty ugly/nonorthogonal
to have effectively two different APIs in one function.

            regards, tom lane



Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Sat, Feb 13, 2021 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Attached should fix the reported warnings but as I am not getting
> > those warnings on my system so can't confirm but otherwise, the patch
> > works as expected. I am planning to push this unless you or Peter have
> > any comments.
>
> Personally I'd get rid of the option for ReplicationSlotNameForTablesync
> to allocate the result buffer.
>

Sounds reasonable and the updated patch attached.

-- 
With Regards,
Amit Kapila.

Вложения

Re: pgsql: Allow multiple xacts during table sync in logical replication.

От
Amit Kapila
Дата:
On Sat, Feb 13, 2021 at 10:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Feb 13, 2021 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> > > Attached should fix the reported warnings but as I am not getting
> > > those warnings on my system so can't confirm but otherwise, the patch
> > > works as expected. I am planning to push this unless you or Peter have
> > > any comments.
> >
> > Personally I'd get rid of the option for ReplicationSlotNameForTablesync
> > to allocate the result buffer.
> >
>
> Sounds reasonable and the updated patch attached.
>

Pushed and I don't see those warnings on caiman.

-- 
With Regards,
Amit Kapila.