Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Поиск
Список
Период
Сортировка
От Melih Mutlu
Тема Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Дата
Msg-id CAGPVpCTuwTwAh8V8EcaKyea+RTk32CWUVX5Der13jrgk8wB5_Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Список pgsql-hackers
Hi,

Amit Kapila <amit.kapila16@gmail.com>, 2 Ağu 2023 Çar, 12:01 tarihinde şunu yazdı:
I think we are getting the error (ERROR:  could not find logical
decoding starting point) because we wouldn't have waited for WAL to
become available before reading it. It could happen due to the
following code:
WalSndWaitForWal()
{
...
if (streamingDoneReceiving && streamingDoneSending &&
!pq_is_send_pending())
break;
..
}

Now, it seems that in 0003 patch, instead of resetting flags
streamingDoneSending, and streamingDoneReceiving before start
replication, we should reset before create logical slots because we
need to read the WAL during that time as well to find the consistent
point.

Thanks for the suggestion Amit. I've been looking into this recently and couldn't figure out the cause until now.
I quickly made the fix in 0003. Seems like it resolved the "could not find logical decoding starting point" errors.

vignesh C <vignesh21@gmail.com>, 1 Ağu 2023 Sal, 09:32 tarihinde şunu yazdı:
I agree that  "no copy in progress issue" issue has nothing to do with
0001 patch. This issue is present with the 0002 patch.
In the case when the tablesync worker has to apply the transactions
after the table is synced, the tablesync worker sends the feedback of
writepos, applypos and flushpos which results in "No copy in progress"
error as the stream has ended already. Fixed it by exiting the
streaming loop if the tablesync worker is done with the
synchronization. The attached 0004 patch has the changes for the same.
The rest of v22 patches are the same patch that were posted by Melih
in the earlier mail.

Thanks for the fix. I placed it into 0002 with a slight change as follows: 

- send_feedback(last_received, false, false);
+ if (!MyLogicalRepWorker->relsync_completed)
+ send_feedback(last_received, false, false);
 
IMHO relsync_completed means simply the same with streaming_done, that's why I wanted to check that flag instead of an additional goto statement. Does it make sense to you as well?

Thanks,
--
Melih Mutlu
Microsoft
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiro Ikeda
Дата:
Сообщение: Re: Support to define custom wait events for extensions
Следующее
От: Andrey Lepikhov
Дата:
Сообщение: Re: [PoC] Reducing planning time when tables have many partitions