Обсуждение: Optimize commit performance with a large number of 'on commit delete rows' temp tables

Поиск
Список
Период
Сортировка
Hi hackers,
  
# Background

PostgreSQL maintains a list of temporary tables for 'on commit
drop/delete rows' via an on_commits list in the session. Once a
transaction accesses a temp table or namespace, the
XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the
PreCommit_on_commit_actions function truncates all 'commit delete
rows' temp tables, even those not accessed in the current transaction.
Commit performance can degrade if there are many such temp tables.

In practice, users created many 'commit delete rows' temp tables in a
session, but each transaction only accessed a few. With varied access
frequency, users were reluctant to change to 'on commit drop'.

Below is an example showing the effect of the number of temp tables
on commit performance:
```
-- 100
DO $$
DECLARE
    begin
        FOR i IN 1..100 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 1.325 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 1.330 ms
```

```
-- 1000
DO $$
DECLARE
    begin
        FOR i IN 1..1000 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 10.939 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 10.955 ms
```

```
-- 10000
DO $$
DECLARE
    begin
        FOR i IN 1..10000 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 110.253 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 175.875 ms
```

# Solution

An intuitive solution is to truncate only the temp tables that
the current process has accessed upon transaction commit.

In the attached patch (based on HEAD):
- A Bloom filter (can also be a list or hash table) maintains
the temp tables accessed by the current transaction.
- Only temp tables filtered through the Bloom filter need
truncation. False positives may occur, but they are
acceptable.
- The Bloom filter is reset at the start of the transaction,
indicating no temp tables have been accessed by the
current transaction yet.

After optimization, the performance for the same case is as
follows:
```
-- 100
DO $$
DECLARE
    begin
        FOR i IN 1..100 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 0.447 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 0.453 ms
```

```
-- 1000
DO $$
DECLARE
    begin
        FOR i IN 1..1000 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 0.531 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 0.567 ms
```

```
-- 10000
DO $$
DECLARE
    begin
        FOR i IN 1..10000 LOOP
            EXECUTE format('CREATE TEMP TABLE temp_table_%s (id int) on commit delete ROWS', i) ;
        END LOOP;
    END;
$$;
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 1.370 ms
postgres=# insert into temp_table_1 select 1;
INSERT 0 1
Time: 1.362 ms
```

Hoping for some suggestions from hackers.

Best Regards,
Fei Changhong
 
Вложения
"=?ISO-8859-1?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
> PostgreSQL maintains a list of temporary tables for 'on commit
> drop/delete rows' via an on_commits list in the session. Once a
> transaction accesses a temp table or namespace, the
> XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the
> PreCommit_on_commit_actions function truncates all 'commit delete
> rows' temp tables, even those not accessed in the current transaction.
> Commit performance can degrade if there are many such temp tables.

Hmm.  I can sympathize with wanting to improve the performance of
this edge case, but it is an edge case: you are the first to
complain about it.  You cannot trash the performance of more typical
cases in order to get there ...

> In the attached patch (based on HEAD):
> - A Bloom filter (can also be a list or hash table) maintains
> the temp tables accessed by the current transaction.

... and I'm afraid this proposal may do exactly that.  Our bloom
filters are pretty heavyweight objects, so making one in situations
where it buys nothing is likely to add a decent amount of overhead.
(I've not tried to quantify that for this particular patch.)

I wonder if we could instead add marker fields to the OnCommitItem
structs indicating whether their rels were touched in the current
transaction, and use those to decide whether we need to truncate.

Another possibility is to make the bloom filter only when the
number of OnCommitItems exceeds some threshold (compare d365ae705).

BTW, I wonder if we could improve PreCommit_on_commit_actions by
having it just quit immediately if XACT_FLAGS_ACCESSEDTEMPNAMESPACE
is not set.  I think that must be set if any ON COMMIT DROP tables
have been made, so there should be nothing to do if not.  In normal
cases that's not going to buy much because the OnCommitItems list
is short, but in your scenario maybe it could win.

            regards, tom lane



Thank you for your attention and suggestions.

On Jul 6, 2024, at 00:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

<feichanghong@qq.com> writes:
PostgreSQL maintains a list of temporary tables for 'on commit
drop/delete rows' via an on_commits list in the session. Once a
transaction accesses a temp table or namespace, the
XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag is set. Before committing, the
PreCommit_on_commit_actions function truncates all 'commit delete
rows' temp tables, even those not accessed in the current transaction.
Commit performance can degrade if there are many such temp tables.

Hmm.  I can sympathize with wanting to improve the performance of
this edge case, but it is an edge case: you are the first to
complain about it.  You cannot trash the performance of more typical
cases in order to get there ...
In the attached patch (based on HEAD):
- A Bloom filter (can also be a list or hash table) maintains
the temp tables accessed by the current transaction.

... and I'm afraid this proposal may do exactly that.  Our bloom
filters are pretty heavyweight objects, so making one in situations
where it buys nothing is likely to add a decent amount of overhead.
(I've not tried to quantify that for this particular patch.)
Yes, this is an edge case, but we have more than one customer facing the issue,
and unfortunately, they are not willing to modify their service code.
We should indeed avoid negatively impacting typical cases:
- Each connection requires an extra 1KB for the filter (the original bloom filter
  implementation had a minimum of 1MB, which I've adjusted to this smaller value).
- The filter is reset at the start of each transaction, which is unnecessary for
  sessions that do not access temporary tables.
- In the PreCommit_on_commit_actions function, each 'on commit delete rows'
  temporary table has to be filtered through the bloom filter, which incurs some
  CPU overhead. However, this might be negligible compared to the IO cost of
  truncation.

Adding a threshold for using the bloom filter is a good idea. We can create the
bloom filter only when the current number of OnCommitItems exceeds the threshold
at the start of a transaction, which should effectively avoid affecting typical
cases. I will provide a new patch later to implement this.

I wonder if we could instead add marker fields to the OnCommitItem
structs indicating whether their rels were touched in the current
transaction, and use those to decide whether we need to truncate.
Adding a flag to OnCommitItem to indicate whether the temp table was accessed
by the current transaction is feasible. But, locating the OnCommitItem by relid
efficiently when opening a relation may require an extra hash table to map relids
to OnCommitItems.

Another possibility is to make the bloom filter only when the
number of OnCommitItems exceeds some threshold (compare d365ae705).

BTW, I wonder if we could improve PreCommit_on_commit_actions by
having it just quit immediately if XACT_FLAGS_ACCESSEDTEMPNAMESPACE
is not set.  I think that must be set if any ON COMMIT DROP tables
have been made, so there should be nothing to do if not.  In normal
cases that's not going to buy much because the OnCommitItems list
is short, but in your scenario maybe it could win.
I also think when XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set, it's unnecessary
to iterate over on_commits (unless I'm overlooking something), which would be
beneficial for the aforementioned scenarios as well.

Best Regards,
Fei Changhong

The patch in the attachment, compared to the previous one, adds a threshold for
using the bloom filter. The current ON_COMMITS_FILTER_THRESHOLD is set to 64,
which may not be the optimal value. Perhaps this threshold could be configured
as a GUC parameter?

Best Regards,
Fei Changhong
Вложения
Hi feichanghong
     Thanks for updating the patch ,I think could be configured as a GUC parameter,PostgreSQL has too many static variables that are written to death and explicitly stated in the code comments may later be designed as parameters. Now that more and more applications that previously used oracle are migrating to postgresql, there will be more and more scenarios where temporary tables are heavily used.Because oracle will global temporary tablespace optimised for this business scenario, which works well in oracle, migrating to pg faces very tricky performance issues,I'm sure the patch has vaule

Best Regards

feichanghong <feichanghong@qq.com> 于2024年7月6日周六 03:40写道:
The patch in the attachment, compared to the previous one, adds a threshold for
using the bloom filter. The current ON_COMMITS_FILTER_THRESHOLD is set to 64,
which may not be the optimal value. Perhaps this threshold could be configured
as a GUC parameter?

Best Regards,
Fei Changhong
Hi wenhui,

Thank you for your suggestions. I have supplemented some performance tests.

Here is the TPS performance data for different numbers of temporary tables
under different thresholds, as compared with the head (98347b5a). The testing
tool used is pgbench, with the workload being to insert into one temporary
table (when the number of temporary tables is 0, the workload is SELECT 1):

| table num     | 0            | 1            | 5           | 10          | 100         | 1000        |
|---------------|--------------|--------------|-------------|-------------|-------------|-------------|
| head 98347b5a | 39912.722209 | 10064.306268 | 7452.071689 | 5641.487369 | 1073.203851 | 114.530958  |
| threshold 1   | 40332.367414 | 7078.117192  | 7044.951156 | 7020.249434 | 6893.652062 | 5826.597260 |
| threshold 5   | 40173.562744 | 10017.532933 | 7023.770203 | 7024.283577 | 6919.769315 | 5806.314494 |

Here is the TPS performance data for different numbers of temporary tables
at a threshold of 5, compared with the head (commit 98347b5a). The testing tool
is pgbench, with the workload being to insert into all temporary tables:

| table num     | 1           | 5           | 10          | 100        | 1000      |
|---------------|-------------|-------------|-------------|------------|-----------|
| head 98347b5a | 7243.945042 | 3627.290594 | 2262.594766 | 297.856756 | 27.745808 |
| threshold 5   | 7287.764656 | 3130.814888 | 2038.308763 | 288.226032 | 27.705149 |

According to test results, the patch does cause some performance loss with
fewer temporary tables, but benefits are substantial when many temporary tables
are used. The specific threshold could be set to 10 (HDDs may require a smaller
one).

I've provided two patches in the attachments, both with a default threshold of 10.
One has the threshold configured as a GUC parameter, while the other is hardcoded
to 10.

Best Regards,
Fei Changhong
Вложения
Hi feichanghong
    I don't think it's acceptable to introduce a patch to fix a problem that leads to performance degradation, or can we take tom's suggestion to optimise PreCommit_on_commit_actions?  I think it to miss the forest for the trees



Best Regards,

feichanghong <feichanghong@qq.com> 于2024年7月8日周一 10:35写道:
Hi wenhui,

Thank you for your suggestions. I have supplemented some performance tests.

Here is the TPS performance data for different numbers of temporary tables
under different thresholds, as compared with the head (98347b5a). The testing
tool used is pgbench, with the workload being to insert into one temporary
table (when the number of temporary tables is 0, the workload is SELECT 1):

| table num     | 0            | 1            | 5           | 10          | 100         | 1000        |
|---------------|--------------|--------------|-------------|-------------|-------------|-------------|
| head 98347b5a | 39912.722209 | 10064.306268 | 7452.071689 | 5641.487369 | 1073.203851 | 114.530958  |
| threshold 1   | 40332.367414 | 7078.117192  | 7044.951156 | 7020.249434 | 6893.652062 | 5826.597260 |
| threshold 5   | 40173.562744 | 10017.532933 | 7023.770203 | 7024.283577 | 6919.769315 | 5806.314494 |

Here is the TPS performance data for different numbers of temporary tables
at a threshold of 5, compared with the head (commit 98347b5a). The testing tool
is pgbench, with the workload being to insert into all temporary tables:

| table num     | 1           | 5           | 10          | 100        | 1000      |
|---------------|-------------|-------------|-------------|------------|-----------|
| head 98347b5a | 7243.945042 | 3627.290594 | 2262.594766 | 297.856756 | 27.745808 |
| threshold 5   | 7287.764656 | 3130.814888 | 2038.308763 | 288.226032 | 27.705149 |

According to test results, the patch does cause some performance loss with
fewer temporary tables, but benefits are substantial when many temporary tables
are used. The specific threshold could be set to 10 (HDDs may require a smaller
one).

I've provided two patches in the attachments, both with a default threshold of 10.
One has the threshold configured as a GUC parameter, while the other is hardcoded
to 10.

Best Regards,
Fei Changhong
Hi wenhui,

On Jul 8, 2024, at 12:18, wenhui qiu <qiuwenhuifx@gmail.com> wrote:

Hi feichanghong
    I don't think it's acceptable to introduce a patch to fix a problem that leads to performance degradation, or can we take tom's suggestion to optimise PreCommit_on_commit_actions?  I think it to miss the forest for the trees

You're right, any performance regression is certainly unacceptable. That's why

we've introduced a threshold. The bloom filter optimization is only applied

when the number of temporary tables exceeds this threshold. Test data also

reveals that with a threshold of 10, barring cases where all temporary tables

are implicated in a transaction, there's hardly any performance loss.


"Improve PreCommit_on_commit_actions by having it just quit immediately if

XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set" can only reduce the overhead of

traversing the OnCommitItem List but still doesn't address the issue with

temporary table truncation.


Looking forward to more suggestions!


Best Regards,
Fei Changhong

Hi feichanghong
    I think adding an intercept this way is better than implementing a global temp table,there is  a path to  implement a global temporary table (https://www.postgresql.org/message-id/1A1A6EDC-D0EC-47B0-BD21-C2ACBAEA65E4@alibaba-inc.com),you can consult with them ,they work at Alibaba


Best Regards,

feichanghong <feichanghong@qq.com> 于2024年7月8日周一 12:42写道:
Hi wenhui,

On Jul 8, 2024, at 12:18, wenhui qiu <qiuwenhuifx@gmail.com> wrote:

Hi feichanghong
    I don't think it's acceptable to introduce a patch to fix a problem that leads to performance degradation, or can we take tom's suggestion to optimise PreCommit_on_commit_actions?  I think it to miss the forest for the trees

You're right, any performance regression is certainly unacceptable. That's why

we've introduced a threshold. The bloom filter optimization is only applied

when the number of temporary tables exceeds this threshold. Test data also

reveals that with a threshold of 10, barring cases where all temporary tables

are implicated in a transaction, there's hardly any performance loss.


"Improve PreCommit_on_commit_actions by having it just quit immediately if

XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set" can only reduce the overhead of

traversing the OnCommitItem List but still doesn't address the issue with

temporary table truncation.


Looking forward to more suggestions!


Best Regards,
Fei Changhong

Hi wenhui,


I carefully analyzed the reason for the performance regression with fewer

temporary tables in the previous patch (v1-0002-): the k_hash_funcs determined

by the bloom_create function were 10(MAX_HASH_FUNCS), which led to an excessive

calculation overhead for the bloom filter.


Based on the calculation formula for the bloom filter, when the number of items

is 100 and k_hash_funcs is 2, the false positive rate for a 1KB bloom filter is

0.0006096; when the number of items is 1000, the false positive rate is

0.048929094. Therefore, k_hash_funcs of 2 can already achieve a decent false

positive rate, while effectively reducing the computational overhead of the

bloom filter.


I have re-implemented a bloom_create_v2 function to create a bloom filter with

a specified number of hash functions and specified memory size.


From the test data below, it can be seen that the new patch in the attachment

(v1-0003-) does not lead to performance regression in any scenario.

Furthermore, the default threshold value can be lowered to 2.


Here is the TPS performance data for different numbers of temporary tables

under different thresholds, as compared with the head (98347b5a). The testing

tool used is pgbench, with the workload being to insert into one temporary

table (when the number of temporary tables is 0, the workload is SELECT 1):


|tablenum      |0           |1           |2          |5          |10         |100        |1000       |

|--------------|------------|------------|-----------|-----------|-----------|-----------|-----------|

|head(98347b5a)|39912.722209|10064.306268|9183.871298|7452.071689|5641.487369|1073.203851|114.530958 |

|threshold-2   |40097.047974|10009.598155|9982.172866|9955.651235|9999.338901|9785.626296|8278.828828|


Here is the TPS performance data for different numbers of temporary tables

at a threshold of 2, compared with the head (commit 98347b5a). The testing tool

is pgbench, with the workload being to insert into all temporary tables:


|table num     |1          |2          | 5         |10         |100       |1000     |

|--------------|-----------|-----------|-----------|-----------|----------|---------|

|head(98347b5a)|7243.945042|5734.545012|3627.290594|2262.594766|297.856756|27.745808|

|threshold-2   |7289.171381|5740.849676|3626.135510|2207.439931|293.145036|27.020953|



I have previously researched the implementation of the Global Temp Table (GTT)

you mentioned, and it have been used in Alibaba Cloud's PolarDB (Link [1]).

GTT can prevent truncation operations on temporary tables that have not been

accessed by the current session (those not in the OnCommitItem List), but GTT

that have been accessed by the current session still need to be truncated at

commit time.Therefore, GTT also require the optimizations mentioned in the

above patch.



[1] https://www.alibabacloud.com/help/en/polardb/polardb-for-oracle/using-global-temporary-tables?spm=a3c0i.23458820.2359477120.1.66e16e9bUpV7cK


Best Regards,
Fei Changhong
Вложения