Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Дата
Msg-id CAD21AoC1E09F_swzOPQQE9P4tf3ADr0cyATesGHA3JrySAytNA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > > yu, could you please test it again with this patch?
> > > >
> > >
> > > Can you explain the cause of the failure and your fix for the same?
> >
> > @@ -1694,6 +1788,8 @@ out:
> >     /* be tidy */
> >     if (ondisk)
> >         pfree(ondisk);
> > +   if (catchange_xip)
> > +       pfree(catchange_xip);
> >
> > Regarding the above code in the previous version patch, looking at the
> > generated assembler code shared by Shi yu offlist, I realized that the
> > “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> > because we dereference catchange_xip before null-pointer check as
> > follow:
> >
> > +   /* copy catalog modifying xacts */
> > +   sz = sizeof(TransactionId) * catchange_xcnt;
> > +   memcpy(ondisk_c, catchange_xip, sz);
> > +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> > +   ondisk_c += sz;
> >
> > Since sz is 0 in this case, memcpy doesn’t do anything actually.
> >
> > By checking the assembler code, I’ve confirmed that gcc does the
> > optimization for these code and setting
> > -fno-delete-null-pointer-checks flag prevents the if statement from
> > being folded. Also, I’ve confirmed that adding the check if
> > "catchange.xcnt > 0” before the null-pointer check also can prevent
> > that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> > added a similar check for builder->committed.xcnt as well for
> > consistency. builder->committed.xip could have no transactions.
> >
>
> Good work. I wonder without comments this may create a problem in the
> future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> freeing the memory any less robust. Also, for consistency, we can use
> a similar check based on xcnt in the SnapBuildRestore to free the
> memory in the below code:
> + /* set catalog modifying transactions */
> + if (builder->catchange.xip)
> + pfree(builder->catchange.xip);

I would hesitate to add comments about preventing the particular
optimization. I think we do null-pointer-check-then-pfree many place.
It seems to me that checking the array length before memcpy is more
natural than checking both the array length and the array existence
before pfree.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [PoC] Let libpq reject unexpected authentication requests
Следующее
От: "osumi.takamichi@fujitsu.com"
Дата:
Сообщение: RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns