Re: Single transaction in the tablesync worker?

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Single transaction in the tablesync worker?
Дата
Msg-id CAA4eK1J+qSa9tbevE2YVOyv4X9zAytmqzJUx0Y7h7YYnG6m+bg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Single transaction in the tablesync worker?  (Craig Ringer <craig.ringer@enterprisedb.com>)
Список pgsql-hackers
On Mon, Dec 7, 2020 at 10:02 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote:
>>
>>
>> Basically, I was wondering why can't the "tablesync" worker just
>> gather messages in a similar way to how the current streaming feature
>> gathers messages into a "changes" file, so that they can be replayed
>> later.
>>
>
> See the related thread "Logical archiving"
>
> https://www.postgresql.org/message-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru
>
> where I addressed some parts of this topic in detail earlier today.
>
>> A) The "tablesync" worker (after the COPY) does not ever apply any of
>> the incoming messages, but instead it just gobbles them into a
>> "changes" file until it decides it has reached SYNCDONE state and
>> exits.
>
>
> This has a few issues.
>
> Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync
cutover.
>

In this idea, there is no need to change the end-of-sync cutover. It
will work as it is now. I am not sure what makes you think so.

> The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply
workerwill need to start applying changes on top of the resync'd table potentially as soon as the next transaction it
startsapplying, so it needs to see the rows there. 
>

The change here would be that the apply worker will check for changes
file and if it exists then apply them before it changes the relstate
to SUBREL_STATE_READY in process_syncing_tables_for_apply(). So, it
will not miss seeing any rows.

> Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL
thenheap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable
withouta really compelling reason. 
>

It will solve our problem of allowing decoding of prepared xacts in
pgoutput. I have explained the problem above [1]. The other idea which
we discussed is to allow having an additional state in
pg_subscription_rel, make the slot as permanent in tablesync worker,
and then process transaction-by-transaction in apply worker. Does that
approach sounds better? Is there any bigger change involved in this
approach (making tablesync slot permanent) which I am missing?

> With  the write multiplication comes disk space management issues for big transactions as well as the obvious
performance/throughputimpact. 
>
> It adds even more latency between upstream commit and downstream apply, something that is again already an issue for
logicalreplication. 
>
> Right now we don't have any concept of a durable and locally flushed spool.
>

I think we have a concept quite close to it for writing changes for
in-progress xacts as done in PG-14. It is not durable but that
shouldn't be a big problem if we allow syncing the changes file.

> It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the
logicalarchiving thread I think it'd be good to have something like this, and there are times the write multiplication
pricewould be well worth paying. But it's not easy. 
>
>> B) Then, when the "apply" worker proceeds, if it detects the existence
>> of the "changes" file it will replay/apply_dispatch all those gobbled
>> messages before just continuing as normal.
>
>
> That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't
bereceiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow.
Thatwill impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream
logicaldecoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as
aspike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network
throughputmax, and downstream apply speed max, it may take some time to catch up over the resulting lag. 
>

This is just for the initial tablesync phase. I think it is equivalent
to saying that during basebackup, we need to parallelly start physical
replication. I agree that sometimes it can take a lot of time to copy
large tables but it will be just one time and no worse than the other
situations like basebackup.

[1] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "k.jamison@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist
Следующее
От: Jesper Pedersen
Дата:
Сообщение: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey