Re: Stopping logical replication protocol

Поиск
Список
Период
Сортировка
От Vladimir Gordiychuk
Тема Re: Stopping logical replication protocol
Дата
Msg-id CAFgjRd1LgVbtH=9O9_xvKQjvUP7aRF-edxqwKfaNs9hMFW_4gw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Stopping logical replication protocol  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: Stopping logical replication protocol  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
What's your PostgreSQL community username?
gordiychuk

It seems like what you're also trying to allow interruption deeper than that, when we're in the middle of processing a reorder buffer commit record and streaming that to the client. You're introducing an is_active member (actually a callback, though name suggests it's a flag) in struct ReorderBuffer to check whether a CopyDone is received, and you're skipping ReorderBuffer commit processing when the callback returns false. The callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e. it's false if either end has sent CopyDone. streamingDoneSending and streamingDoneSending are only set in ProcessRepliesIfAny, called by WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're waiting for WAL from XLogSendLogical we skip processing of any commit records and exit.

That seems overcomplicated.

When WalSndWaitForWAL is called by logical_read_xlog_page, logical_read_xlog_page can just test streamingDoneReceiving and streamingDoneSending. If they're set it can skip the page read and return -1, which will cause the xlogreader to return a null record to XLogSendLogical. That'll skip the decoding calls and return to WalSndLoop, where we'll notice it's time to exit.

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during send data we should also check message from client(client can send CopyDone, KeepAlive, Terminate). 

@@ -1086,14 +1089,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
  memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
    tmpbuf.data, sizeof(int64));
 
- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;
-


The main idea is that we can get CopyDone from client in the next functions: WalSdnLoop, WalSndWaitForWal, WalSndWriteData. All of this methods can take a long time, because WalSndWaitForWal can wait new transaction and on not active db it can take long enough, WalSndWriteData can send big transaction that also lead to ignore messages from client until long time(In my example above for 1 million object changes, walsender ignore messages 13 seconds and not allow reuse connection). When client send CopyDone they don't want receive message anymore for current LSN. For example physical replication can be interrupt in the middle of transaction that affect more than one LSN. 

Maybe I not correct undestand documentation, but I want reuse same connection without reopen it, because open new connection takes too long. Is it correct use case or CopyDOne it side effect of copy protocol and for complete replication need use always Terminate package and reopen connection?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Declarative partitioning