Обсуждение: temp table on commit delete rows performance issue

Поиск
Список
Период
Сортировка

temp table on commit delete rows performance issue

От
Floris Van Nee
Дата:
Hi hackers,

I'm looking for some input on an issue I've observed. A common pattern
I've seen is using temporary tables to put data in before updating the
real tables. Something roughly like:

On session start:
CREATE TEMP TABLE temp_t1 (...) ON COMMIT DELETE ROWS;

On update:
BEGIN;
COPY temp_t1 FROM STDIN (FORMAT BINARY);
INSERT INTO t1 (SELECT * FROM temp_t1 ...) ON CONFLICT DO UPDATE SET ...;
-- potentially some other operations on temp table to put data into real table t1
COMMIT;

This pattern starts to break down under certain exceptional circumstances of
high concurrency. The "ON COMMIT DELETE ROWS" does a truncate that is
fairly expensive and doesn't work well in high-concurrency scenarios. It's
especially noticeable under following circumstances:
- high max_connections setting
- high number of temp tables per session
- concurrent writers at fairly short intervals
Impact is on both TPS on primary as well as that the WAL replay process
on replica becomes completely overloaded (100% cpu even though not
a lot of WAL is being generated)

A very simple pgbench example that showcases degradation (taken
with max_connections=2000 to clearly show it).

test_truncate.sql

begin;
set client_min_messages=warning;
create temp table if not exists ut0 (a int, b text, c text) on commit delete rows;
commit;

test_no_truncate.sql

begin;
set client_min_messages=warning;
create temp table if not exists ut0 (a int, b text, c text); -- do not do anything on commit
commit;

pgbench -n -f  test_truncate.sql -T 10 postgres -c 1 -j 1
pgbench (18devel)
tps = 2693.806376 (without initial connection time)

pgbench -n -f test_truncate.sql -T 10 postgres -c10 -j 10
pgbench (18devel)
tps = 12119.358458 (without initial connection time)

pgbench -n -f test_no_truncate.sql -T 10 postgres -c 1 -j 1
pgbench (18devel)
tps = 22685.877281 (without initial connection time)

pgbench -n -f test_no_truncate.sql -T 10 postgres -c10 -j 10
pgbench (18devel)
tps = 200359.458154 (without initial connection time)

For the test_truncate.sql with 10 threads, WAL replay process is spinning at 100% CPU.

The reason seems to be that for a large part, the 'on commit delete rows'
follows the same code path as a regular truncate.
This means:
* taking an exclusive lock on the temp table (WAL logged)
* invalidating rel cache (WAL logged)

On replica process, these exclusive locks are very expensive to replay,
leading to spending replica most of its cycles trying to get LWLocks on non-fast-path.
On primary, degradation on concurrency I believes comes from the rel cache invalidation.

There doesn't seem to be a good reason for this behavior, as the replica
has nothing to do for a temporary table truncation, so this lock doesn't
need to be WAL logged.
The generated WAL is just fully this stuff. LOCK+INVALIDATION+COMMIT.
And it's spinning at 100% CPU for trying to replay this.

rmgr: Standby     len (rec/tot):     42/    42, tx:    8154875, lsn: 0/3AFFE270, prev 0/3AFFE218, desc: LOCK xid
8154875db 5 rel 18188  
rmgr: Standby     len (rec/tot):     42/    42, tx:    8154875, lsn: 0/3AFFE2A0, prev 0/3AFFE270, desc: LOCK xid
8154875db 5 rel 18191  
rmgr: Standby     len (rec/tot):     42/    42, tx:    8154875, lsn: 0/3AFFE2D0, prev 0/3AFFE2A0, desc: LOCK xid
8154875db 5 rel 18192  
rmgr: Transaction len (rec/tot):     62/    62, tx:    8154875, lsn: 0/3AFFE300, prev 0/3AFFE2D0, desc: INVALIDATION ;
invalmsgs: relcache 18191 relcache 18192 
rmgr: Transaction len (rec/tot):     82/    82, tx:    8154875, lsn: 0/3AFFE340, prev 0/3AFFE300, desc: COMMIT
2024-07-1612:56:42.878147 CEST; inval msgs: relcache 18191 relcache 18192 
rmgr: Standby     len (rec/tot):     42/    42, tx:    8154876, lsn: 0/3AFFE398, prev 0/3AFFE340, desc: LOCK xid
8154876db 5 rel 18188  
rmgr: Standby     len (rec/tot):     42/    42, tx:    8154876, lsn: 0/3AFFE3C8, prev 0/3AFFE398, desc: LOCK xid
8154876db 5 rel 18191  
rmgr: Standby     len (rec/tot):     42/    42, tx:    8154876, lsn: 0/3AFFE3F8, prev 0/3AFFE3C8, desc: LOCK xid
8154876db 5 rel 18192  
rmgr: Transaction len (rec/tot):     62/    62, tx:    8154876, lsn: 0/3AFFE428, prev 0/3AFFE3F8, desc: INVALIDATION ;
invalmsgs: relcache 18191 relcache 18192 
rmgr: Transaction len (rec/tot):     82/    82, tx:    8154876, lsn: 0/3AFFE468, prev 0/3AFFE428, desc: COMMIT
2024-07-1612:56:42.878544 CEST; inval msgs: relcache 18191 relcache 18192 

Have people observed this behavior before? Would the community be
open to accepting a patch that changes the behavior of ON COMMIT ROWS
to not WAL log the lock taken and/or the relcache inval?
I think:
* An exclusive lock needs to be taken on primary, but does not
need to be WAL logged
* Rel cache invalidation should not be necessary I think (it currently
happens just on the TOAST table+index, not on the regular TEMP table)

-Floris




Re: temp table on commit delete rows performance issue

От
Aleksander Alekseev
Дата:
Hi,

> I'm looking for some input on an issue I've observed. A common pattern
> I've seen is using temporary tables to put data in before updating the
> real tables. Something roughly like:
>
> On session start:
> CREATE TEMP TABLE temp_t1 (...) ON COMMIT DELETE ROWS;
>
> On update:
> BEGIN;
> COPY temp_t1 FROM STDIN (FORMAT BINARY);
> INSERT INTO t1 (SELECT * FROM temp_t1 ...) ON CONFLICT DO UPDATE SET ...;
> -- potentially some other operations on temp table to put data into real table t1
> COMMIT;
>
> This pattern starts to break down under certain exceptional circumstances of
> high concurrency. The "ON COMMIT DELETE ROWS" does a truncate that is
> fairly expensive and doesn't work well in high-concurrency scenarios. It's
> especially noticeable under following circumstances:
> - high max_connections setting
> - high number of temp tables per session
> - concurrent writers at fairly short intervals
> Impact is on both TPS on primary as well as that the WAL replay process
> on replica becomes completely overloaded (100% cpu even though not
> a lot of WAL is being generated)
>
> [...]

I didn't investigate your particular issue but generally speaking
creating a table, even a temporary one, is an expensive operation.

Note that it's far from being a seperate file on the disk. It affects
catalog tables, shared buffers, all the corresponding locks, etc. If
you have indexes for a temporary table it makes the situation ever
worse. Sooner or later VACUUM will happen for your bloated catalog,
and this is not fun under heavy load.

Is there any particular reason why you don't want to simply change the
target table directly? If you do it in a transaction you are safe.

-- 
Best regards,
Aleksander Alekseev



RE: temp table on commit delete rows performance issue

От
Floris Van Nee
Дата:
> 
> I didn't investigate your particular issue but generally speaking creating a
> table, even a temporary one, is an expensive operation.
> 
> Note that it's far from being a seperate file on the disk. It affects catalog
> tables, shared buffers, all the corresponding locks, etc. If you have indexes
> for a temporary table it makes the situation ever worse. Sooner or later
> VACUUM will happen for your bloated catalog, and this is not fun under
> heavy load.

Yes, creation is expensive, but note that this test case does not create 
the temp table for every transaction. It creates it once on startup of the
connection. So there'll be no catalog bloat as the connections are generally
long-lived. The part that remains to be expensive (but which was unexpected
to me) is the truncate that does happen for every transaction.

> 
> Is there any particular reason why you don't want to simply change the target
> table directly? If you do it in a transaction you are safe.
> 

This is a fair point which I didn't explain in my earlier message. We find that for
various use cases the temp table approach is a really convenient way of interacting
with Postgres. You can just easily binary COPY whatever data you want into it without
looking too much at data size and then handling it from there on the db-side.
Doing the same without this, you'd need to resort to passing the data as query parameters, but 
this has complications like a limit in number of bound params per query. It's definitely possible
to use as well, but the binary COPY to temp table is quite convenient.

-Floris