Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAA4eK1L8+UsK6SkqRAL+n7Z=kx=mmLCuJ2ndwmbKaS=q=2iuPA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Список pgsql-hackers
On Tue, Aug 1, 2023 at 9:44 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
>
> FYI, here is some more information about ERRORs seen.
>
> The patches were re-tested -- applied in stages (and also against the
> different scripts) to identify where the problem was introduced. Below
> are the observations:
>
> ~~~
>
> Using original test scripts
>
> 1. Using only patch v21-0001
> - no errors
>
> 2. Using only patch v21-0001+0002
> - no errors
>
> 3. Using patch v21-0001+0002+0003
> - no errors
>
> ~~~
>
> Using the "busy loop" test scripts for long transactions
>
> 1. Using only patch v21-0001
> - no errors
>
> 2. Using only patch v21-0001+0002
> - gives errors for "no copy in progress issue"
> e.g. ERROR:  could not send data to WAL stream: no COPY in progress
>
> 3. Using patch v21-0001+0002+0003
> - gives the same "no copy in progress issue" errors as above
> e.g. ERROR:  could not send data to WAL stream: no COPY in progress
> - and also gives slot consistency point errors
> e.g. ERROR:  could not create replication slot
> "pg_16700_sync_16514_7261998170966054867": ERROR:  could not find
> logical decoding starting point
> e.g. LOG:  could not drop replication slot
> "pg_16700_sync_16454_7261998170966054867" on publisher: ERROR:
> replication slot "pg_16700_sync_16454_7261998170966054867" does not
> exist
>

I think we are getting the error (ERROR:  could not find logical
decoding starting point) because we wouldn't have waited for WAL to
become available before reading it. It could happen due to the
following code:
WalSndWaitForWal()
{
...
if (streamingDoneReceiving && streamingDoneSending &&
!pq_is_send_pending())
break;
..
}

Now, it seems that in 0003 patch, instead of resetting flags
streamingDoneSending, and streamingDoneReceiving before start
replication, we should reset before create logical slots because we
need to read the WAL during that time as well to find the consistent
point.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: add timing information to pg_upgrade
Следующее
От: 正华吕
Дата:
Сообщение: Re: PG_CATCH used without PG_RETHROW