> On 02/13/2021 11:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 12, 2021 at 10:00 PM <er@xs4all.nl> wrote:
> >
> > > On 02/12/2021 1:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Feb 12, 2021 at 6:04 PM Erik Rijkers <er@xs4all.nl> wrote:
> > > >
> > > > I am seeing errors in replication in a test program that I've been running for years with very little change
(since2017, really [1]).
> >
> > Hi,
> >
> > Here is a test program. Careful, it deletes stuff. And it will need some changes:
> >
>
> Thanks for sharing the test. I think I have found the problem.
> Actually, it was an existing code problem exposed by the commit
> ce0fdbfe97. In pgoutput_begin_txn(), we were sometimes sending the
> prepare_write ('w') message but then the actual message was not being
> sent. This was the case when we didn't found the origin of a txn. This
> can happen after that commit because we have now started using origins
> for tablesync workers as well and those origins are removed once the
> tablesync workers are finished. We might want to change the behavior
> related to the origin messages as indicated in the comments but for
> now, fixing the existing code.
>
> Can you please test if the attached fixes the problem at your end as well?
> [fix_origin_message_1.patch]
I compiled just now a binary from HEAD, and a binary from HEAD+patch
HEAD is still broken; your patch rescues it, so yes, fixed.
Maybe a test (check or check-world) should be added to run a second replica? (Assuming that would have caught this
bug)
Thanks,
Erik Rijkers
>
> --
> With Regards,
> Amit Kapila.