Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

Поиск

Список

Период

Сортировка

От	Frank Heikens
Тема	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Дата	3 апреля 07:19:34
Msg-id	540A0AD2-7CAA-4535-82F7-FC9055437B20@elevarq.com обсуждение исходный текст
Ответ на	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-performance

Дерево обсуждения

Hi,

I've been investigating a performance problem with user-defined aggregates
that use array_append as the transition function (SFUNC = array_append,
STYPE = bigint[]). These aggregates become catastrophically slow with
HashAggregate when many groups are present -- a workload demonstrated.

Root cause
----------

hash_agg_check_limits() calls MemoryContextMemAllocated(ctx, true) after
every new group addition. The recursive flag causes it to traverse all
child memory contexts, which is O(C) per call.

array_append creates an expanded-array object per group, each owning a
private AllocSet child context. With N groups, the total traversal cost
becomes O(N * C) ≈ O(N^2). This completely dominates the actual
aggregate computation once the group count reaches tens of thousands.

Reproducer (Tom's test case)
---------------------------------

CREATE AGGREGATE array_agg(
      BASETYPE = bigint, SFUNC = array_append,
      STYPE = bigint[], INITCOND = '{}');

CREATE TABLE array_agg_test(
      product_id bigint NOT NULL,
      region_id bigint NOT NULL,
      available boolean NOT NULL);

INSERT INTO array_agg_test
SELECT generate_series(1, 50000),
         (ARRAY[1,2,3,4])[floor(random()*4)+1], true;
-- (repeat 3 more times with region arrays [11..14], [111..114], [1111..1114])

VACUUM ANALYZE array_agg_test;
SET work_mem = '200MB';

EXPLAIN (ANALYZE, BUFFERS)
SELECT product_id, array_agg(region_id)
FROM array_agg_test GROUP BY product_id;

-- 50K groups, ~4 rows per group
-- Without patch: ~5 seconds
-- With patch: ~150 ms
-- Built-in array_agg: ~100 ms

Fix
---

The attached patch throttles the recursive memory check: once the group
count exceeds 1024, the full check only runs every 1024th new group.
Below 1024 groups, behavior is unchanged.

This caps the spill-detection latency to at most 1024 groups' worth of
memory. For typical work_mem settings, that's a few megabytes of
potential overshoot -- well within the existing tolerance of the check,
which was already described as "imperfect" in its own comments.

The patch is purely local to hash_agg_check_limits() in nodeAgg.c:
about 15 lines of functional change plus comments.

Benchmarks (PG 19devel, debug build, -O0)
------------------------------------------

Tom Lane reproducer (50K groups):
    custom array_agg 200MB: 4.6 s -> 152 ms (30x)
    custom array_agg 4MB:    388 ms -> 283 ms   (no regression)
    built-in array_agg:       102 ms -> 103 ms   (unchanged)

Synthetic (100K groups x 50 elems):
    custom 256MB: 35.0 s -> 2.9 s (12x)
    custom 4MB:     6.1 s -> 5.9 s (unchanged)

Extreme (500K groups x 2 elems):
    custom 256MB: 672 s -> 1.7 s (395x)

Non-array aggregates (count, sum, string_agg): unchanged
Built-in array_agg: unchanged
Spill behavior (batches, disk usage): unchanged

Regards,
Frank Heikens

> On Apr 2, 2026, at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Scott Carey <scott.carey@algonomy.com> writes:
>> I have discovered the root cause.
>> This database is old. It pre-dates Postgres 8.4 which introduced
>> array_agg.   Apparently, in some version prior to 8.4 array_agg was added
>> as a user function, defined as below for bigint:
>
>> create AGGREGATE array_agg(
>>      BASETYPE = bigint,
>>      SFUNC = array_append,
>>      STYPE = bigint[],
>>      INITCOND = '{}'
>> );
>
>> So if you create a test database and run the previous test, performance
>> will be fine and the query will be fast. Then run:
>> create AGGREGATE array_agg(BASETYPE = bigint, SFUNC = array_append,STYPE =
>> bigint[], INITCOND = '{}');
>
>> It will be slow and reproduce this behavior.
>
> Thank you for running that to ground! I confirm your results that v13
> and up are far slower for this example than v12 was.
>
>> Why would this run so much more slowly after updating from postgres 12 to
>> 17?   It is a user defined aggregate, although maybe not as optimized as
>> the intrinsic one it shouldn't behave this way.
>
> I did some bisecting using the attached simplified test case, and found
> that the query execution time jumps from circa 60ms to circa 7500ms here:
>
> 1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
> commit 1f39bce021540fde00990af55b4432c55ef4b3c7
> Author: Jeff Davis <jdavis@postgresql.org>
> Date:   Wed Mar 18 15:42:02 2020 -0700
>
>    Disk-based Hash Aggregation.
>
>    While performing hash aggregation, track memory usage when adding new
>    groups to a hash table. If the memory usage exceeds work_mem, enter
>    "spill mode".
>
> (Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
> maybe not directly comparable to production.)
>
> I'm bemused as to why: the test case has work_mem set high enough that
> we shouldn't be triggering spill mode, so why did this change affect
> it at all?
>
>                        regards, tom lane
>
> CREATE AGGREGATE array_agg(
>      BASETYPE = bigint,
>      SFUNC = array_append,
>      STYPE = bigint[],
>      INITCOND = '{}'
> );
>
> drop table if exists array_agg_test;
>
> create table array_agg_test(product_id bigint not null, region_id bigint
> not null, available boolean not null);
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[1,2,3,4])[floor(random()*4)+1] as region_id, true as available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[11,12,13,14])[floor(random()*4)+1] as region_id, true as available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[111,112,113,114])[floor(random()*4)+1] as region_id, true as
> available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[1111,1112,1113,1114])[floor(random()*4)+1] as region_id, true as
> available;
>
> vacuum analyze array_agg_test;
>
> \set ECHO all
>
> -- set hash_mem_multiplier = 2;
> set work_mem = "200MB";
>
> explain (analyze, buffers) select product_id, array_agg(region_id) from
> array_agg_test group by product_id;

Вложения

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

Вложения