Re: Single transaction in the tablesync worker?

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Single transaction in the tablesync worker?
Дата
Msg-id CAGRY4nyjhZgHG+mGEES+QaRQKy7ya8gDZkqoDMC4rHqRsQmneQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote:

Basically, I was wondering why can't the "tablesync" worker just
gather messages in a similar way to how the current streaming feature
gathers messages into a "changes" file, so that they can be replayed
later.


See the related thread "Logical archiving"


where I addressed some parts of this topic in detail earlier today.

A) The "tablesync" worker (after the COPY) does not ever apply any of
the incoming messages, but instead it just gobbles them into a
"changes" file until it decides it has reached SYNCDONE state and
exits.

This has a few issues.

Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync cutover. The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply worker will need to start applying changes on top of the resync'd table potentially as soon as the next transaction it starts applying, so it needs to see the rows there.

Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL then heap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable without a really compelling reason. With  the write multiplication comes disk space management issues for big transactions as well as the obvious performance/throughput impact.

It adds even more latency between upstream commit and downstream apply, something that is again already an issue for logical replication.

Right now we don't have any concept of a durable and locally flushed spool.

It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the logical archiving thread I think it'd be good to have something like this, and there are times the write multiplication price would be well worth paying. But it's not easy.

B) Then, when the "apply" worker proceeds, if it detects the existence
of the "changes" file it will replay/apply_dispatch all those gobbled
messages before just continuing as normal.

That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't be receiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow. That will impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream logical decoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as a spike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network throughput max, and downstream apply speed max, it may take some time to catch up over the resulting lag.

Not a big fan of that approach.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: RFC: Deadlock detector hooks for edge injection
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Proposed patch for key managment