On Mon, Dec 26, 2022 at 1:22 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, December 23, 2022 5:20 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > I noticed a CFbot failure in one of the new testcases in 015_stream.pl which
> > comes from old 032_xx.pl. It's because I slightly adjusted the change size in a
> > transaction in last version which cause the transaction's size not to exceed the
> > decoding work mem, so the transaction is not being applied as expected as
> > streaming transactions(it is applied as a non-stremaing transaction) which cause
> > the failure. Attach the new version patch which fixed this miss.
> >
>
> Since the GUC used to force stream changes has been committed, I removed that
> patch from the patch set here and rebased the testcases based on that commit.
> Here is the rebased patch set.
>
Thank you for updating the patches. Here are some comments for 0001
and 0002 patches:
I think it'd be better to write logs when the leader enters the
serialization mode. It would be helpful for investigating issues.
---
+ if (!pa_can_start(xid))
+ return;
+
+ /* First time through, initialize parallel apply worker state
hashtable. */
+ if (!ParallelApplyTxnHash)
+ {
+ HASHCTL ctl;
+
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(TransactionId);
+ ctl.entrysize = sizeof(ParallelApplyWorkerEntry);
+ ctl.hcxt = ApplyContext;
+
+ ParallelApplyTxnHash = hash_create("logical
replication parallel apply workershash",
+
16, &ctl,
+
HASH_ELEM |HASH_BLOBS | HASH_CONTEXT);
+ }
+
+ /*
+ * It's necessary to reread the subscription information
before assigning
+ * the transaction to a parallel apply worker. Otherwise, the
leader may
+ * not be able to reread the subscription information if streaming
+ * transactions keep coming and are handled by parallel apply workers.
+ */
+ maybe_reread_subscription();
pa_can_start() checks if the skiplsn is an invalid xid or not, and
then maybe_reread_subscription() could update the skiplsn to a valid
value. As the comments in pa_can_start() says, it won't work. I think
we should call maybe_reread_subscription() in
apply_handle_stream_start() before calling pa_allocate_worker().
---
+static inline bool
+am_leader_apply_worker(void)
+{
+ return (!OidIsValid(MyLogicalRepWorker->relid) &&
+ !isParallelApplyWorker(MyLogicalRepWorker));
+}
How about using !am_tablesync_worker() instead of
!OidIsValid(MyLogicalRepWorker->relid) for better readability?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com