Re: Single transaction in the tablesync worker?

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: Single transaction in the tablesync worker?
Дата
Msg-id CAFiTN-v=a7RfnuQ+aOqzTREYUL8-JidN+a-w3mQLpyBJvh7EcQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Thu, Dec 10, 2020 at 3:19 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Yes, I observed this same behavior.
> > >
> > > IIUC the only way for the tablesync worker to go from CATCHUP mode to
> > > SYNCDONE is via the call to process_sync_tables.
> > >
> > > But a side-effect of this is, when messages arrive during this CATCHUP
> > > phase one tx will be getting handled by the tablesync worker before
> > > the process_sync_tables() is ever encountered.
> > >
> > > I have created and attached a simple patch which allows the tablesync
> > > to detect if there is anything to do *before* it enters the apply main
> > > loop. Calling process_sync_tables() before the apply main loop  offers
> > > a quick way out so the message handling will not be split
> > > unnecessarily between the workers.
> > >
> >
> > Yeah, this demonstrates the idea can work but as mentioned in my
> > previous email [1] this needs much more work to make the COPY and then
> > later fetching the changes from the publisher consistently. So, let me
> > summarize the discussion so far. We wanted to enhance the tablesync
> > phase of Logical Replication to enable decoding of prepared
> > transactions [2]. The problem was when we stream prepared transactions
> > in the tablesync worker, it will simply commit the same due to the
> > requirement of maintaining a single transaction for the entire
> > duration of copy and streaming of transactions afterward. We can't
> > simply disable the decoding of prepared xacts for tablesync workers
> > because it can skip some of the prepared xacts forever on subscriber
> > as explained in one of the emails above [3]. Now, while investigating
> > the solutions to enhance tablesync to support decoding at prepare
> > time, I found that due to the current design of tablesync we can see
> > partial data of transactions on subscribers which is also explained in
> > the email above with an example [4]. This problem of visibility is
> > there since the Logical Replication is introduced in PostgreSQL and
> > the only answer I got till now is that there doesn't seem to be any
> > other alternative which I think is not true and I have provided one
> > alternative as well.
> >
> > Next, we have discussed three different solutions all of which will
> > solve the first problem (allow the tablesync worker to decode
> > transactions at prepare time) and one of which solves both the first
> > and second problem (partial transaction data visibility).
> >
> > Solution-1: Allow the table-sync worker to use multiple transactions.
> > The reason for doing it in a single transaction is that if after
> > initial COPY we commit and then crash while streaming changes of other
> > transactions, the state of the table won't be known after the restart
> > as we are using temporary slot so we don't from where to restart
> > syncing the table.
> >
> > IIUC, we need to primarily do two things to achieve multiple
> > transactions, one is to have an additional state in the catalog (say
> > catch up) which will say that the initial copy is done. Then we need
> > to have a permanent slot using which we can track the progress of the
> > slot so that after restart (due to crash, connection break, etc.) we
> > can start from the appropriate position. Now, this will allow us to do
> > less work after recovering from a crash because we will know the
> > restart point. As Craig mentioned, it also allows the sync slot to
> > advance, freeing any held upstream resources before the whole sync is
> > done, which is good if the upstream is busy and generating lots of
> > WAL. Finally, committing as we go means we won't exceed the cid
> > increment limit in a single txn.
> >
> > Solution-2: The next solution we discussed is to make "tablesync"
> > worker just gather messages after COPY in a similar way to how the
> > current streaming of in-progress transaction feature gathers messages
> > into a "changes" file so that they can be replayed later by the apply
> > worker. Now, here as we don't need to replay the individual
> > transactions in tablesync worker in a single transaction, it will
> > allow us to send decode prepared to the subscriber. This has some
> > disadvantages such as each transaction processed by tablesync worker
> > needs to be durably written to file and it can also lead to some apply
> > lag later when we process the same by apply worker.
> >
> > Solution-3: Allow the table-sync workers to just perform initial COPY
> > and then once the COPY is done for all relations the apply worker will
> > stream all the future changes. Now, surely if large copies are
> > required for multiple relations then we would delay a bit to replay
> > transactions partially by the apply worker but don't know how much
> > that matters as compared to transaction visibility issue and anyway we
> > would have achieved the maximum parallelism by allowing copy via
> > multiple workers. This would reduce the load (both CPU and I/O) on the
> > publisher-side by allowing to decode the data only once instead of for
> > each table sync worker once and separately for the apply worker. I
> > think it will use fewer resources to finish the work.
> >
> > Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT
> > option which creates a transaction snapshot on the publisher, and then
> > we use the same snapshot for COPY from the publisher. After this, when
> > we try to receive the data from the publisher using the same slot, it
> > will be in sync with the COPY. I think to keep the same consistency
> > between COPY and the data we receive from the publisher in this
> > approach, we need to export the snapshot while creating a slot in the
> > apply worker by using CRS_EXPORT_SNAPSHOT and then use the same
> > snapshot by all the tablesync workers doing the copy. In tablesync
> > workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN
> > READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported
> > snapshot. That way the COPY will use the same snapshot as is used for
> > receiving the changes in apply worker and the data will be in sync.
> >
> > Then we also need a way to export snapshot while the apply worker is
> > already receiving the changes because users can use 'ALTER
> > SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be
> > synced. I think we need to introduce a new command in
> > exec_replication_command() to export the snapshot from the existing
> > slot and then use it by the new tablesync worker.
> >
> >
> > Among the above three solutions, the first two will solve the first
> > problem (allow the tablesync worker to decode transactions at prepare
> > time) and the third solution will solve both the first and second
> > problem (partial transaction data visibility). The third solution
> > requires quite some redesign of how the Logical Replication work is
> > synchronized between apply and tablesync workers and might turn out to
> > be a bigger implementation effort. I am tentatively thinking to go
> > with a first or second solution at this stage and anyway if later
> > people feel that we need some bigger redesign then we can go with
> > something on the lines of Solution-3.
> >
> > Thoughts?
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1%2BQC74wRQmbYT%2BMmOs%3DYbdUjuq0_A9CBbVoQMB1Ryi-OA%40mail.gmail.com
> > [2] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
> > [3] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com
> > [4] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com
> >
> > --
>
> Hi Amit,
>
> - Solution-3 has become too complicated to be attempted by me. Anyway,
> we may be better to just focus on eliminating the new problems exposed
> by the 2PC work [1], rather than burning too much effort to fix some
> other quirk which apparently has existed for years.
> [1] https://www.postgresql.org/message-id/CAHut%2BPtm7E5Jj92tJWPtnnjbNjJN60_%3DaGGKYW3h23b7J%3DqeDg%40mail.gmail.com
>
> - Solution-2 has some potential lag problems, and maybe file resource
> problems as well. This idea did not get a very favourable response
> when I first proposed it.
>
> - This leaves Solution-1 as the best viable option to fix the current
> known 2PC trouble.
>
> ~~
>
> So I will try to write a patch for the proposed Solution-1.

Yeah, even I think that the Solution-1 is best for solving the problem for 2PC.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Benoit Lobréau
Дата:
Сообщение: pg_shmem_allocations & documentation
Следующее
От: Andrey Borodin
Дата:
Сообщение: Re: Yet another fast GiST build