Re: Single transaction in the tablesync worker?

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Single transaction in the tablesync worker?
Дата
Msg-id CAA4eK1K+TuF7u_VQK4rUfz8VaSP+jnxkTqG6qQ0cdJ4=MM8Mww@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Single transaction in the tablesync worker?  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: Single transaction in the tablesync worker?  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > The tablesync worker in logical replication performs the table data
> > sync in a single transaction which means it will copy the initial data
> > and then catch up with apply worker in the same transaction. There is
> > a comment in LogicalRepSyncTableStart ("We want to do the table data
> > sync in a single transaction.") saying so but I can't find the
> > concrete theory behind the same. Is there any fundamental problem if
> > we commit the transaction after initial copy and slot creation in
> > LogicalRepSyncTableStart and then allow the apply of transactions as
> > it happens in apply worker? I have tried doing so in the attached (a
> > quick prototype to test) and didn't find any problems with regression
> > tests. I have tried a few manual tests as well to see if it works and
> > didn't find any problem. Now, it is quite possible that it is
> > mandatory to do the way we are doing currently, or maybe something
> > else is required to remove this requirement but I think we can do
> > better with respect to comments in this area.
>
> If we commit the initial copy, the data upto the initial copy's
> snapshot will be visible downstream. If we apply the changes by
> committing changes per transaction, the data visible to the other
> transactions will differ as the apply progresses.
>

It is not clear what you mean by the above.  The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).

> You haven't
> clarified whether we will respect the transaction boundaries in the
> apply log or not. I assume we will.
>

It will be transaction-by-transaction.

> Whereas if we apply all the
> changes in one go, other transactions either see the data before
> resync or after it without any intermediate states.
>

What is the problem even if the user is able to see the data after the
initial copy?

> That will not
> violate consistency, I think.
>

I am not sure how consistency will be broken.

> That's all I can think of as the reason behind doing a whole resync as
> a single transaction.
>

Thanks for sharing your thoughts.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josef Šimánek
Дата:
Сообщение: Re: Github Actions (CI)
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Corner-case bug in pg_rewind