Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables

Поиск
Список
Период
Сортировка
От feichanghong
Тема Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Дата
Msg-id tencent_74E3C9448DE302814689E2DD2E83BC93FF05@qq.com
обсуждение исходный текст
Ответ на Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Список pgsql-hackers
Thank you for your attention and suggestions.

On Jul 6, 2024, at 00:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

<feichanghong@qq.com> writes:
PostgreSQL maintains a list of temporary tables for 'on commit
drop/delete rows' via an on_commits list in the session. Once a
transaction accesses a temp table or namespace, the
XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the
PreCommit_on_commit_actions function truncates all 'commit delete
rows' temp tables, even those not accessed in the current transaction.
Commit performance can degrade if there are many such temp tables.

Hmm.  I can sympathize with wanting to improve the performance of
this edge case, but it is an edge case: you are the first to
complain about it.  You cannot trash the performance of more typical
cases in order to get there ...
In the attached patch (based on HEAD):
- A Bloom filter (can also be a list or hash table) maintains
the temp tables accessed by the current transaction.

... and I'm afraid this proposal may do exactly that.  Our bloom
filters are pretty heavyweight objects, so making one in situations
where it buys nothing is likely to add a decent amount of overhead.
(I've not tried to quantify that for this particular patch.)
Yes, this is an edge case, but we have more than one customer facing the issue,
and unfortunately, they are not willing to modify their service code.
We should indeed avoid negatively impacting typical cases:
- Each connection requires an extra 1KB for the filter (the original bloom filter
  implementation had a minimum of 1MB, which I've adjusted to this smaller value).
- The filter is reset at the start of each transaction, which is unnecessary for
  sessions that do not access temporary tables.
- In the PreCommit_on_commit_actions function, each 'on commit delete rows'
  temporary table has to be filtered through the bloom filter, which incurs some
  CPU overhead. However, this might be negligible compared to the IO cost of
  truncation.

Adding a threshold for using the bloom filter is a good idea. We can create the
bloom filter only when the current number of OnCommitItems exceeds the threshold
at the start of a transaction, which should effectively avoid affecting typical
cases. I will provide a new patch later to implement this.

I wonder if we could instead add marker fields to the OnCommitItem
structs indicating whether their rels were touched in the current
transaction, and use those to decide whether we need to truncate.
Adding a flag to OnCommitItem to indicate whether the temp table was accessed
by the current transaction is feasible. But, locating the OnCommitItem by relid
efficiently when opening a relation may require an extra hash table to map relids
to OnCommitItems.

Another possibility is to make the bloom filter only when the
number of OnCommitItems exceeds some threshold (compare d365ae705).

BTW, I wonder if we could improve PreCommit_on_commit_actions by
having it just quit immediately if XACT_FLAGS_ACCESSEDTEMPNAMESPACE
is not set.  I think that must be set if any ON COMMIT DROP tables
have been made, so there should be nothing to do if not.  In normal
cases that's not going to buy much because the OnCommitItems list
is short, but in your scenario maybe it could win.
I also think when XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set, it's unnecessary
to iterate over on_commits (unless I'm overlooking something), which would be
beneficial for the aforementioned scenarios as well.

Best Regards,
Fei Changhong

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Matthias van de Meent
Дата:
Сообщение: Re: First draft of PG 17 release notes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Improving PL/Tcl's error context reports