Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Дата
Msg-id CAD21AoC=jJEaNx5ersq_nxUQtuE2KcH_Rip0rNOWnvoYwVdSOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Thank you for the comments!

>
> At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > Good work. I wonder without comments this may create a problem in the
> > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > freeing the memory any less robust. Also, for consistency, we can use
> > a similar check based on xcnt in the SnapBuildRestore to free the
> > memory in the below code:
> > + /* set catalog modifying transactions */
> > + if (builder->catchange.xip)
> > + pfree(builder->catchange.xip);
>
> But xip must be positive there.  We can add a comment explains that.
>

Yes, if we add the comment for it, probably we need to explain a gcc's
optimization but it seems to be too much to me.

>
> +        * Array of transactions and subtransactions that had modified catalogs
> +        * and were running when the snapshot was serialized.
> +        *
> +        * We normally rely on HEAP2_NEW_CID and XLOG_XACT_INVALIDATIONS records to
> +        * know if the transaction has changed the catalog. But it could happen that
> +        * the logical decoding decodes only the commit record of the transaction.
> +        * This array keeps track of the transactions that have modified catalogs
>
> (Might be only me, but) "track" makes me think that xids are added and
> removed by activities. On the other hand the array just remembers
> catalog-modifying xids in the last life until the all xids in the list
> gone.
>
> +        * and were running when serializing a snapshot, and this array is used to
> +        * add such transactions to the snapshot.
> +        *
> +        * This array is set once when restoring the snapshot, xids are removed
>
> (So I want to add "only" between "are removed").
>
> +        * from the array when decoding xl_running_xacts record, and then eventually
> +        * becomes empty.

Agreed. WIll fix.

>
>
> +       catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);
>
> catchange_xip is allocated in the current context, but ondisk is
> allocated in builder->context.  I see it kind of inconsistent (even if
> the current context is same with build->context).

Right. I thought that since the lifetime of catchange_xip is short,
until the end of SnapBuildSerialize() function we didn't need to
allocate it in builder->context. But given ondisk, we need to do that
for catchange_xip as well. Will fix it.

>
>
> +       if (builder->committed.xcnt > 0)
> +       {
>
> It seems to me comitted.xip is always non-null, so we don't need this.
> I don't strongly object to do that, though.

But committed.xcnt could be 0, right? We don't need to copy anything
by calling memcpy with size = 0 in this case. Also, it looks more
consistent with what we do for catchange_xcnt.

>
> -        * Remove TXN from its containing list.
> +        * Remove TXN from its containing lists.
>
> The comment body only describes abut txn->nodes. I think we need to
> add that for catchange_node.

Will add.

>
>
> +       Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
>
> (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> assert just requires that catchange_txns and catchange_ntxns are
> consistent so it should be checked just after dlist_empty.. I think.
>

If we want to check if catchange_txns and catchange_ntxns are
consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
This function requires the caller to use rb->catchange_ntxns as the
length of the returned array. I think this assertion ensures that the
actual length of the array is consistent with the length we
pre-calculated.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Costing elided SubqueryScans more nearly correctly
Следующее
От: Morris de Oryx
Дата:
Сообщение: Re: System column support for partitioned tables using heap