Re: logical replication empty transactions
От | Peter Smith |
---|---|
Тема | Re: logical replication empty transactions |
Дата | |
Msg-id | CAHut+PukQ8yyfSmDCNisVHxSjA632SmzSP_krQg-NJBdZ7gJaQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: logical replication empty transactions (Ajin Cherian <itsajin@gmail.com>) |
Ответы |
Re: logical replication empty transactions
(Ajin Cherian <itsajin@gmail.com>)
|
Список | pgsql-hackers |
On Fri, Aug 13, 2021 at 9:01 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > > Let's first split the patch for prepared and non-prepared cases as > > that will help to focus on each of them separately. BTW, why haven't > > you considered implementing point 1b as explained by Andres in his > > email [1]? I think we can send a keepalive message in case of > > synchronous replication when we skip an empty transaction, otherwise, > > it might delay in responding to transactions synchronous_commit mode. > > I think in the tests done in the thread, it might not have been shown > > because we are already sending keepalives too frequently. But what if > > someone disables wal_sender_timeout or kept it to a very large value? > > See WalSndKeepaliveIfNecessary. The other thing you might want to look > > at is if the reason for frequent keepalives is the same as described > > in the email [2]. > > > > I have tried to address the comment here by modifying the > ctx->update_progress callback function (WalSndUpdateProgress) provided > for plugins. I have added an option > by which the callback can specify if it wants to send keep_alives. And > when the callback is called with that option set, walsender updates a > flag force_keep_alive_syncrep. > The Walsender in the WalSndWaitForWal for loop, checks this flag and > if synchronous replication is enabled, then sends a keep alive. > Currently this logic > is added as an else to the current logic that is already there in > WalSndWaitForWal, which is probably considered unnecessary and a > source of the keep alive flood > that you talked about. So, I can change that according to how that fix > shapes up there. I have also added an extern function in syncrep.c > that makes it possible > for walsender to query if synchronous replication is turned on. > > The reason I had to turn on a flag and rely on the WalSndWaitForWal to > send the keep alive in its next iteration is because I tried doing > this directly when a > commit is skipped but it didn't work. The reason for this is that when > the commit is being decoded the sentptr at the moment is at the commit > LSN and the keep alive > will be sent for the commit LSN but the syncrep wait is waiting for > end_lsn of the transaction which is the next LSN. So, sending a keep > alive at the moment the > commit is decoded doesn't seem to solve the problem of the waiting > synchronous reply. > > > Few other miscellaneous comments: > > 1. > > static void > > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn, > > - XLogRecPtr commit_lsn) > > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, > > + TimestampTz prepare_time) > > { > > + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; > > + > > OutputPluginUpdateProgress(ctx); > > > > + /* > > + * If the BEGIN PREPARE was not yet sent, then it means there were no > > + * relevant changes encountered, so we can skip the COMMIT PREPARED > > + * message too. > > + */ > > + if (txndata) > > + { > > + bool skip = !txndata->sent_begin_txn; > > + pfree(txndata); > > + txn->output_plugin_private = NULL; > > > > How is this supposed to work after the restart when prepared is sent > > before the restart and we are just sending commit_prepared after > > restart? Won't this lead to sending commit_prepared even when the > > corresponding prepare is not sent? Can we think of a better way to > > deal with this? > > > > I have tried to resolve this by adding logic in worker,c to silently > ignore spurious commit_prepareds. But this change required checking if > the prepare exists on the > subscriber before attempting the commit_prepared but the current API > that checks this requires prepare time and transaction end_lsn. But > for this I had to > change the protocol of commit_prepared, and I understand that this > would break backward compatibility between subscriber and publisher > (you have raised this issue as well). > I am not sure how else to handle this, let me know if you have any > other ideas. One option could be to have another API to check if the > prepare exists on the subscriber with > the prepared 'gid' alone, without checking prepare_time or end_lsn. > Let me know if this idea works. > > I have left out the patch 0002 for prepared transactions until we > arrive at a decision on how to address the above issue. > > Peter, > I have also addressed the comments you've raised on patch 0001, please > have a look and confirm. I have reviewed the v13-0001 patch. Apply / build / test was all OK Below are my code review comments. ////////// Comments for v13-0001 ===================== 1. Patch comment => Probably this comment should include some description for the new "keepalive" logic as well. ------ 2. src/backend/replication/syncrep.c - new function @@ -330,6 +330,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) } /* + * Check if Sync Rep is enabled + */ +bool +SyncRepEnabled(void) +{ + if (SyncRepRequested() && ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) + return true; + else + return false; +} + 2a. Function comment => Why abbreviations in the comment? Why not say "synchronous replication" instead of "Sync Rep". ~~ 2b. if/else => Remove the if/else. e.g. return SyncRepRequested() && ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined; ~~ 2c. Call the new function => There is some existing similar code in SyncRepWaitForLSN(), e.g. if (!SyncRepRequested() || !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) return; Now that you have a new function you maybe can call it from here, e.g. if (!SyncRepEnabled()) return; ------ 3. src/backend/replication/walsender.c - whitespace + if (send_keep_alive) + force_keep_alive_syncrep = true; + + => Extra blank line? ------ 4. src/backend/replication/walsender.c - call keepalive if (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && !waiting_for_ping_response) + { WalSndKeepalive(false); + } + else + { + if (force_keep_alive_syncrep && SyncRepEnabled()) + WalSndKeepalive(false); + } 4a. Move the SynRepEnabled() call => I think it is not necessary to call the SynRepEnabled() here. Instead, it might be better if this is called back when you assign the force_keep_alive_syncrep flag. So change the WalSndUpdateProgress, e.g. BEFORE if (send_keep_alive) force_keep_alive_syncrep = true; AFTER force_keep_alive_syncrep = send_keep_alive && SyncRepEnabled(); Note: Also, that assignment also deserves a big comment to say what it is doing. ~~ 4b. Change the if/else => If you make the change for 4a. then perhaps the keepalive if/else is overkill and could be changed.e.g. if (force_keep_alive_syncrep || MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && !waiting_for_ping_response) WalSndKeepalive(false); ------ Kind Regards, Peter Smith. Fujitsu Australia
В списке pgsql-hackers по дате отправления:
Следующее
От: "houzj.fnst@fujitsu.com"Дата:
Сообщение: RE: Skipping logical replication transactions on subscriber side