Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Дата	5 июля 2022 г. 11:00:06
Msg-id	CAA4eK1KMsU5PFHOvTD=3jHQ6aPa8N39eGwAVMVA6S0sXw1kMdw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 4, 2022 at 6:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached three POC patches:
> >
>
> I think it will be a good idea if you can add a short commit message
> at least to say which patch is proposed for HEAD and which one is for
> back branches. Also, it would be good if you can add some description
> of the fix in the commit message. Let's remove poc* from the patch
> name.
>
> Review poc_add_running_catchanges_xacts_to_serialized_snapshot
> =====================================================

Few more comments:
1.
+
+ /* This array must be sorted in xidComparator order */
+ TransactionId *xip;
+ } catchanges;
 };

This array contains the transaction ids for subtransactions as well. I
think it is better mention the same in comments.

2. Are we anytime removing transaction ids from catchanges->xip array?
If not, is there a reason for the same? I think we can remove it
either at commit/abort or even immediately after adding the xid/subxid
to committed->xip array.

3.
+ if (readBytes != sz)
+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+
+ if (readBytes < 0)
+ {
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m", path)));
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("could not read file \"%s\": read %d of %zu",
+ path, readBytes, sz)));
+ }

This is the fourth instance of similar error handling code in
SnapBuildRestore(). Isn't it better to extract this into a separate
function?

4.
+TransactionId *
+ReorderBufferGetCatalogChangesXacts(ReorderBuffer *rb, size_t *xcnt_p)
+{
+ HASH_SEQ_STATUS hash_seq;
+ ReorderBufferTXNByIdEnt *ent;
+ TransactionId *xids;
+ size_t xcnt = 0;
+ size_t xcnt_space = 64; /* arbitrary number */
+
+ xids = (TransactionId *) palloc(sizeof(TransactionId) * xcnt_space);
+
+ hash_seq_init(&hash_seq, rb->by_txn);
+ while ((ent = hash_seq_search(&hash_seq)) != NULL)
+ {
+ ReorderBufferTXN *txn = ent->txn;
+
+ if (!rbtxn_has_catalog_changes(txn))
+ continue;

It would be better to allocate memory the first time we have to store
xids. There is a good chance that many a time this function will do
just palloc without having to store any xid.

5. Do you think we should do some performance testing for a mix of
ddl/dml workload to see if it adds any overhead in decoding due to
serialize/restore doing additional work? I don't think it should add
some meaningful overhead but OTOH there is no harm in doing some
testing of the same.

-- 
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns