[BUG] Excessive memory usage with update on STORED generated columns.
| От | Anton A. Melnikov |
|---|---|
| Тема | [BUG] Excessive memory usage with update on STORED generated columns. |
| Дата | |
| Msg-id | ddc34dbd-1efc-4710-824c-e101e7eb63e6@postgrespro.ru обсуждение исходный текст |
| Ответы |
Re: [BUG] Excessive memory usage with update on STORED generated columns.
Re: [BUG] Excessive memory usage with update on STORED generated columns. |
| Список | pgsql-hackers |
Hi!
My colleagues found that a queries like that:
\timing
DROP TABLE IF EXISTS t;
CREATE TABLE t (
id int,
a int,
b int,
g text GENERATED ALWAYS AS (
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0')
) STORED
);
INSERT INTO t
SELECT 1, 100, 0
FROM generate_series(1, 1000000);
UPDATE t SET id = 2; -- < problem query
lead to excessive memory consumption up to 10Gb in this example and
query execution time up to ~1,5min.
Bisect shows that the problem appeared after commit 83ea6c540
(Virtual generated columns).
Before this commit the update query took only ~8s and the memory
consumption did not exceed 150Mb for this backend.
MemoryContextStats reports only a small amount of memory usage, while
malloc_stats() confirms large allocations outside PostgreSQL memory
contexts.
With help of massif tool i found repeated allocations originating from:
ExecInitGenerated
→ build_column_default
→ stringToNode
This indicates that generated expressions are reparsed multiple times,
once per row to be updated instead of being reused.
There is a problem call stack during UPDATE t SET id = 2;
execution: see attached bt.txt, please.
Before the above-mentioned commit, ExecInitGenerated() was effectively
invoked once per ResultRelInfo, so this behavior was not observable.
I would like to propose a fix that add a caching of the the parsed
expression trees (Node *) in ResultRelInfo, so that build_column_default()
and stringToNode() are executed at most once per attribute per query.
With this fix, the query execution time
and memory consumption return to normal:
postgres=# UPDATE t SET id = 2;
UPDATE 1000000
Time: 11522,621 ms (00:11,523)
A patch for this approach for current master is attached here.
Would be glad for any feedback.
Best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: