Обсуждение: BUG #19363: PostgreSQL shared memory exhaustion during query execution involving views and parallel workers

Поиск
Список
Период
Сортировка

BUG #19363: PostgreSQL shared memory exhaustion during query execution involving views and parallel workers

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19363
Logged by:          Jinhui Lai
Email address:      jinhui.lai@qq.com
PostgreSQL version: 18.1
Operating system:   ubuntu 22.04
Description:

Hi, PostgreSQL developers:

Thanks for reading my report. I may find a bug. You can reproduce it as
follows:

CREATE TABLE t0(c0 REAL);
CREATE TABLE t1(c0 REAL);
CREATE TABLE t2(c0 REAL) WITH (parallel_workers=1);
CREATE TABLE t3(c0 REAL) WITH (parallel_workers=2);
CREATE VIEW v0(c0, c1, c2) AS (SELECT t0.c0, t1.c0, t2.c0 FROM t0, t1, t2);
SELECT t3.c0 FROM t3 WHERE NOT EXISTS (SELECT 1 FROM v0 WHERE t3.c0 =
v0.c0);
ERROR:  could not resize shared memory segment "/PostgreSQL.3961899888" to
33624064 bytes: No space left on device

When executing a query with a NOT EXISTS subquery referencing a view that
joins multiple tables, PostgreSQL fails with a shared memory allocation
error ("could not resize shared memory segment ... No space left on
device"). The issue appears to be related to parallel query execution, as it
occurs when one of the underlying tables (t3) has parallel_workers set to 2,
while another table (t2) has parallel_workers set to 1. The problem
manifests specifically during the query's execution phase, where the planner
attempts to allocate shared memory for parallel workers, suggesting a
possible resource leak or inadequate cleanup of shared memory segments when
parallel queries involve views. The error persists despite sufficient
available system memory, indicating an internal PostgreSQL resource
management issue rather than a system-level memory constraint.

Best regards,
Jinhui Lai


On Sat, Dec 27, 2025 at 12:35 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> CREATE TABLE t0(c0 REAL);
> CREATE TABLE t1(c0 REAL);
> CREATE TABLE t2(c0 REAL) WITH (parallel_workers=1);
> CREATE TABLE t3(c0 REAL) WITH (parallel_workers=2);
> CREATE VIEW v0(c0, c1, c2) AS (SELECT t0.c0, t1.c0, t2.c0 FROM t0, t1, t2);
> SELECT t3.c0 FROM t3 WHERE NOT EXISTS (SELECT 1 FROM v0 WHERE t3.c0 =
> v0.c0);
> ERROR:  could not resize shared memory segment "/PostgreSQL.3961899888" to
> 33624064 bytes: No space left on device

I don't think this is a resource leak issue.  From the snippet of the
plan:

->  Parallel Hash Anti Join  (cost=281992257.25..320092889.87 rows=531 width=4)
      Hash Cond: (t3.c0 = t0.c0)
      ->  Parallel Seq Scan on t3  (cost=0.00..20.62 rows=1062 width=4)
      ->  Parallel Hash  (cost=121969796.25..121969796.25
rows=9753750000 width=4)
            ->  Nested Loop  (cost=0.00..121969796.25 rows=9753750000 width=4)

The executor needs to build a hash table for 9.7 billion rows.  That
requires a lot of memory.  In practice, the executor splits the work
into 8192 batches, that still results in 1.2 million rows per batch.
To manage that many rows, the executor allocated 4194304 buckets.
This means the executor needs to allocate 32 MB just for the bucket
array, assuming you are on a 64-bit system.  I guess your available
shared memory was less than 32 MB at that moment.

I kind of wonder why the planner did not choose a Right Anti Join to
build the hash table on the smaller side.  This is likely due to a
cost estimation issue.  However, I don't feel very motivated to debug
a cost estimation issue on tables that haven't even run ANALYZE.

- Richard



Richard Guo <guofenglinux@gmail.com> writes:
> The executor needs to build a hash table for 9.7 billion rows.  That
> requires a lot of memory.  In practice, the executor splits the work
> into 8192 batches, that still results in 1.2 million rows per batch.
> To manage that many rows, the executor allocated 4194304 buckets.
> This means the executor needs to allocate 32 MB just for the bucket
> array, assuming you are on a 64-bit system.  I guess your available
> shared memory was less than 32 MB at that moment.

On my machine, the query does complete, but it's ridiculously slow:
about 15 sec to compute an empty result.  AFAICT nearly all of that
is being spent in hash table setup.

What I'm wondering is why this plan was accepted at all.  Does PHJ
not consider work_mem or a similar limit on hash table size?

            regards, tom lane



I wrote:
> What I'm wondering is why this plan was accepted at all.  Does PHJ
> not consider work_mem or a similar limit on hash table size?

Ah, found it.  See

https://www.postgresql.org/message-id/2380165.1766871097%40sss.pgh.pa.us

            regards, tom lane