RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()
Дата
Msg-id TYCPR01MB120776F4E85D6066B2FE096C2F5222@TYCPR01MB12077.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()  (ocean_li_996 <ocean_li_996@163.com>)
Ответы Re:RE: Re:BUG #18369: logical decoding core on AssertTXNLsnOrder()  (ocean_li_996 <ocean_li_996@163.com>)
Список pgsql-bugs
Dear Haiyang Li,

Thanks for reporting. I could reproduce the  issue on PG12-15.
I thought you have already known the reason, but let me share my analysis for
the confirmation. The root cause is missing consideration for temporary tables.

## Premise

INSERT/UPDATE/DELETE operations for temporary tables are not recorded to WAL files,
but xid would be involved for such operations.
So there is a possibility that a transaction does not have related WAL records
if only temp tables are modified within the transaction.

Basically such transactions would not be decoded.

## Found issue

### Empty transaction is decoded on PG14 and PG15

However, there is a room for generating ReorderBufferTxn for empty transactions,
which was introduced by 6b77048e5. Conditions are:

1. There are sub transactions which modify only temp tables, and
2. the top transaction modifies the catalog.

The call-stack toward the generation is below.

```
ReorderBufferTXNByXid(create = true, create_as_top = true)
ReorderBufferXidSetCatalogChanges()    // for sub transactions
SnapBuildXidSetCatalogChanges()          // for top transaction
DecodeCommit()                                      // for top transaction
```

The path has been introduced by 6b77048e5.
Previously, calling ReorderBufferXidSetCatalogChanges() for sub transactions
would be skipped, if they do not have catalog changes or they have not decoded yet.
However, the commit ensures sub transactions must be marked as containing
catalog changes, and this also enforces to decode transactions even if it is
empty.

### Assertion failure

The empty transactions would be created as top transactions. At that time,
AssertTXNLsnOrder() is called so that we ensured that first_lsn of top-transactions
must be strictly higher than previous. But they can be the same if there are more
than two empty transactions. It led an assertion failure.

### Considerations on PG12 and PG13

Same failure can be occurred on the PG12 and 13, and the background is bit different.
343afa967 removed a ReorderBufferAssignChild() from SnapBuildXidSetCatalogChanges().
The function allowed empty transactions being marked as sub-trans, so there had
been no problem in past. After the commit, assignments were removed, so that the
empty transactions would be generated as top-transactions.

## Possible solutions

I think there are several solutions.
Note that I assumed here that fixes for all the versions should be almost the same.

* Ease the condition in AssertTXNLsnOrder(). If the decoded transaction is empty,
  it can be allowed that the first_lsn is same as previous one.
  PSA file to see my consideration.
* Generate a ReorderBufferTXN as sub transaction when we are in this path.
  The approach has already been shared by you. However, note that this needs to
  extend the ReorderBufferXidSetCatalogChanges function, and breaks ABI
  compatibility [1].
* Avoid calling ReorderBufferXidSetCatalogChanges() if the target transaction
  has not been decoded. An concern is that ReorderBuffer does not provide an API
  for checking whether the transaction has been already decoded or not.

I will keep analyzing more and share further updates if found.
Thought?

[1]: https://wiki.postgresql.org/wiki/Committing_checklist#Maintaining_ABI_compatibility_while_backpatching

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/


Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: 费长红
Дата:
Сообщение: Re: BUG #18371: There are wrong constraint residues when detach hash partiton concurrently
Следующее
От: Tender Wang
Дата:
Сообщение: Re: BUG #18314: PARALLEL UNSAFE function does not prevent parallel index build