Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Дата
Msg-id CAA4eK1Jt4v37D4XBOu=h+tu+wzijpjh1Bw5h-_eeLEkRbxrSEQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Okay, I've attached an updated patch that does the above idea. Could
> > you please do the performance tests again to see if the idea can help
> > reduce the overhead, Shi yu?
> >
>
> While reviewing the patch for HEAD, I have changed a few comments. See
> attached, if you agree with these changes then include them in the
> next version.
>

I have another comment on this patch:
SnapBuildPurgeOlderTxn()
{
...
+ if (surviving_xids > 0)
+ memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
+ surviving_xids * sizeof(TransactionId))
...

For this code to hit, we must have a situation where one or more of
the xacts in this array must be still running. And, if that is true,
we would not have started from the restart point where the
corresponding snapshot (that contains the still running xacts) has
been serialized because we advance the restart point to not before the
oldest running xacts restart_decoding_lsn. This may not be easy to
understand so let me take an example to explain. Say we have two
transactions t1 and t2, and both have made catalog changes. We want a
situation where one of those gets purged and the other remains in
builder->catchange.xip array. I have tried variants of the below
sequence to see if I can get into the required situation but am not
able to make it.

Session-1
Checkpoint -1;
T1
DDL

Session-2
T2
DDL

Session-3
Checkpoint-2;
pg_logical_slot_get_changes()
 -- Here when we serialize the snapshot corresponding to
CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
as catalog-changing xacts.

Session-1
T1
Commit;

Checkpoint;
pg_logical_slot_get_changes()
 -- Here we will restore from Checkpoint-1's serialized snapshot and
won't be able to move restart_point to Checkpoint-2 because T2 is
still open.

Now, as per my understanding, it is only possible to move
restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
which case we will never have that in surviving_xids array after the
purge.

It is possible I am missing something here. Do let me know your thoughts.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: [Commitfest 2022-07] Patch Triage: Waiting on Author
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Hash index build performance tweak from sorting