Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION
Дата
Msg-id CAA4eK1Lr5bYT=JPKzsfxM0O0VdkO4cr-4jjY1SNZEuYMDZozcw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Ответы Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Список pgsql-hackers
On Tue, Jan 3, 2023 at 2:14 PM Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> > The point which is not completely clear from your description is the
> > timing of missing records. In one of your previous emails, you seem to
> > have indicated that the data missed from Table B is from the time when
> > the initial sync for Table B was in-progress, right? Also, from your
> > description, it seems there is no error or restart that happened
> > during the time of initial sync for Table B. Is that understanding
> > correct?
>
> Yes and yes.
> * B sync started - 08:08:34
> * lost records are created - 09:49:xx
> * B initial sync finished - 10:19:08
> * I/O error with WAL - 10:19:22
> * SIGTERM - 10:35:20
>
> "Finished" here is `logical replication table synchronization worker
> for subscription "cloud_production_main_sub_v4", table "B" has
> finished`.
> As far as I know, it is about COPY command.
>
> > I am not able to see how these steps can lead to the problem.
>
> One idea I have here - it is something related to the patch about
> forbidding of canceling queries while waiting for synchronous
> replication acknowledgement [1].
> It is applied to Postgres in the cloud we were using [2]. We started
> to see such errors in 10:24:18:
>
>       `The COMMIT record has already flushed to WAL locally and might
> not have been replicated to the standby. We must wait here.`
>

Does that by any chance mean you are using a non-community version of
Postgres which has some other changes?

> I wonder could it be some tricky race because of downtime of
> synchronous replica and queries stuck waiting for ACK forever?
>

It is possible but ideally, in that case, the client should request
such a transaction again.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Time delayed LR (WAS Re: logical replication restrictions)
Следующее
От: vignesh C
Дата:
Сообщение: Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions