Re: Single transaction in the tablesync worker?
От | Amit Kapila |
---|---|
Тема | Re: Single transaction in the tablesync worker? |
Дата | |
Msg-id | CAA4eK1J+qSa9tbevE2YVOyv4X9zAytmqzJUx0Y7h7YYnG6m+bg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Single transaction in the tablesync worker? (Craig Ringer <craig.ringer@enterprisedb.com>) |
Список | pgsql-hackers |
On Mon, Dec 7, 2020 at 10:02 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250@gmail.com> wrote: >> >> >> Basically, I was wondering why can't the "tablesync" worker just >> gather messages in a similar way to how the current streaming feature >> gathers messages into a "changes" file, so that they can be replayed >> later. >> > > See the related thread "Logical archiving" > > https://www.postgresql.org/message-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru > > where I addressed some parts of this topic in detail earlier today. > >> A) The "tablesync" worker (after the COPY) does not ever apply any of >> the incoming messages, but instead it just gobbles them into a >> "changes" file until it decides it has reached SYNCDONE state and >> exits. > > > This has a few issues. > > Most importantly, the sync worker must cooperate with the main apply worker to achieve a consistent end-of-sync cutover. > In this idea, there is no need to change the end-of-sync cutover. It will work as it is now. I am not sure what makes you think so. > The sync worker must have replayed the pending changes in order to make this cut-over, because the non-sync apply workerwill need to start applying changes on top of the resync'd table potentially as soon as the next transaction it startsapplying, so it needs to see the rows there. > The change here would be that the apply worker will check for changes file and if it exists then apply them before it changes the relstate to SUBREL_STATE_READY in process_syncing_tables_for_apply(). So, it will not miss seeing any rows. > Doing this would also add another round of write multiplication since the data would get spooled then applied to WAL thenheap. Write multiplication is already an issue for logical replication so adding to it isn't particularly desirable withouta really compelling reason. > It will solve our problem of allowing decoding of prepared xacts in pgoutput. I have explained the problem above [1]. The other idea which we discussed is to allow having an additional state in pg_subscription_rel, make the slot as permanent in tablesync worker, and then process transaction-by-transaction in apply worker. Does that approach sounds better? Is there any bigger change involved in this approach (making tablesync slot permanent) which I am missing? > With the write multiplication comes disk space management issues for big transactions as well as the obvious performance/throughputimpact. > > It adds even more latency between upstream commit and downstream apply, something that is again already an issue for logicalreplication. > > Right now we don't have any concept of a durable and locally flushed spool. > I think we have a concept quite close to it for writing changes for in-progress xacts as done in PG-14. It is not durable but that shouldn't be a big problem if we allow syncing the changes file. > It's not impossible to do as you suggest but the cutover requirement makes it far from simple. As discussed in the logicalarchiving thread I think it'd be good to have something like this, and there are times the write multiplication pricewould be well worth paying. But it's not easy. > >> B) Then, when the "apply" worker proceeds, if it detects the existence >> of the "changes" file it will replay/apply_dispatch all those gobbled >> messages before just continuing as normal. > > > That's going to introduce a really big stall in the apply worker's progress in many cases. During that time it won't bereceiving from upstream (since we don't spool logical changes to disk at this time) so the upstream lag will grow. Thatwill impact synchronous replication, pg_wal size management, catalog bloat, etc. It'll also leave the upstream logicaldecoding session idle, so when it resumes it may create a spike of I/O and CPU load as it catches up, as well as aspike of network traffic. And depending on how close the upstream write rate is to the max decode speed, network throughputmax, and downstream apply speed max, it may take some time to catch up over the resulting lag. > This is just for the initial tablesync phase. I think it is equivalent to saying that during basebackup, we need to parallelly start physical replication. I agree that sometimes it can take a lot of time to copy large tables but it will be just one time and no worse than the other situations like basebackup. [1] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "k.jamison@fujitsu.com"Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist
Следующее
От: Jesper PedersenДата:
Сообщение: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey