On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> Hi Wang,
> Thanks for working on this. One of our customer faced a similar
> situation when running BDR with PostgreSQL.
>
> I tested your patch and it solves the problem.
>
> Please find some review comments below
Thanks for your testing and comments.
> +/*
> + * Helper function for ReorderBufferProcessTXN for updating progress.
> + */
> +static inline void
> +ReorderBufferUpdateProgress(ReorderBuffer *rb, ReorderBufferTXN *txn,
> + ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = rb->private_data;
> + static int changes_count = 0;
>
> It's not easy to know that a variable is static when reading the code which
> uses it. So it's easy to interpret code wrong. I would probably track it
> through logical decoding context itself OR through a global variable like other
> places where we track the last timestamps. But there's more below on this.
I'm not sure if we need to add global variables or member variables for a
cumulative count that is only used here. How would you feel if I add some
comments when declaring this static variable?
> +
> + if (!ctx->update_progress)
> + return;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* set output state */
> + ctx->accept_writes = false;
> + ctx->write_xid = txn->xid;
> + ctx->write_location = change->lsn;
> + ctx->end_xact = false;
>
> This patch reverts many of the changes of the previous commit which tried to
> fix this issue i.e. 55558df2374. end_xact was introduced by the same commit but
> without much explanation of that in the commit message. Its only user,
> WalSndUpdateProgress(), is probably making a wrong assumption as well.
>
> * We don't have a mechanism to get the ack for any LSN other than end
> * xact LSN from the downstream. So, we track lag only for end of
> * transaction LSN.
>
> IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
> sent downstream through a keep alive message. Downstream may
> acknowledge this
> LSN. So we do get ack for any LSN, not just commit LSN.
>
> So I propose removing end_xact as well.
We didn't track the lag during a transaction because it could make the
calculations of lag functionality inaccurate. If we track every lsn, it could
fail to record important lsn information because of
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
Please see details in [1] and [2].
Regards,
Wang Wei
[1] -
https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com
[2] -
https://www.postgresql.org/message-id/OS3PR01MB627514AE0B3040D8F55A68B99EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com