Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

Поиск
Список
Период
Сортировка
От Frank Heikens
Тема Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Дата
Msg-id 540A0AD2-7CAA-4535-82F7-FC9055437B20@elevarq.com
обсуждение исходный текст
Ответ на Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hi,

I've been investigating a performance problem with user-defined aggregates
that use array_append as the transition function (SFUNC = array_append,
STYPE = bigint[]).  These aggregates become catastrophically slow with
HashAggregate when many groups are present -- a workload demonstrated.

Root cause
----------

hash_agg_check_limits() calls MemoryContextMemAllocated(ctx, true) after
every new group addition.  The recursive flag causes it to traverse all
child memory contexts, which is O(C) per call.

array_append creates an expanded-array object per group, each owning a
private AllocSet child context.  With N groups, the total traversal cost
becomes O(N * C) ≈ O(N^2).  This completely dominates the actual
aggregate computation once the group count reaches tens of thousands.

Reproducer (Tom's test case)
---------------------------------

  CREATE AGGREGATE array_agg(
      BASETYPE = bigint, SFUNC = array_append,
      STYPE = bigint[], INITCOND = '{}');

  CREATE TABLE array_agg_test(
      product_id bigint NOT NULL,
      region_id bigint NOT NULL,
      available boolean NOT NULL);

  INSERT INTO array_agg_test
  SELECT generate_series(1, 50000),
         (ARRAY[1,2,3,4])[floor(random()*4)+1], true;
  -- (repeat 3 more times with region arrays [11..14], [111..114], [1111..1114])

  VACUUM ANALYZE array_agg_test;
  SET work_mem = '200MB';

  EXPLAIN (ANALYZE, BUFFERS)
  SELECT product_id, array_agg(region_id)
  FROM array_agg_test GROUP BY product_id;

  -- 50K groups, ~4 rows per group
  -- Without patch: ~5 seconds
  -- With patch: ~150 ms
  -- Built-in array_agg: ~100 ms

Fix
---

The attached patch throttles the recursive memory check: once the group
count exceeds 1024, the full check only runs every 1024th new group.
Below 1024 groups, behavior is unchanged.

This caps the spill-detection latency to at most 1024 groups' worth of
memory.  For typical work_mem settings, that's a few megabytes of
potential overshoot -- well within the existing tolerance of the check,
which was already described as "imperfect" in its own comments.

The patch is purely local to hash_agg_check_limits() in nodeAgg.c:
about 15 lines of functional change plus comments.

Benchmarks (PG 19devel, debug build, -O0)
------------------------------------------

  Tom Lane reproducer (50K groups):
    custom array_agg  200MB:  4.6 s  ->  152 ms  (30x)
    custom array_agg  4MB:    388 ms ->  283 ms   (no regression)
    built-in array_agg:       102 ms ->  103 ms   (unchanged)

  Synthetic (100K groups x 50 elems):
    custom  256MB:  35.0 s ->  2.9 s  (12x)
    custom  4MB:     6.1 s ->  5.9 s  (unchanged)

  Extreme (500K groups x 2 elems):
    custom  256MB:  672 s  ->  1.7 s  (395x)

  Non-array aggregates (count, sum, string_agg): unchanged
  Built-in array_agg: unchanged
  Spill behavior (batches, disk usage): unchanged

Regards,
Frank Heikens



> On Apr 2, 2026, at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Scott Carey <scott.carey@algonomy.com> writes:
>> I have discovered the root cause.
>> This database is old.  It pre-dates Postgres 8.4 which introduced
>> array_agg.   Apparently, in some version prior to 8.4 array_agg was added
>> as a user function, defined as below for bigint:
>
>> create AGGREGATE array_agg(
>>      BASETYPE = bigint,
>>      SFUNC = array_append,
>>      STYPE = bigint[],
>>      INITCOND = '{}'
>> );
>
>> So if you create a test database and run the previous test, performance
>> will be fine and the query will be fast.  Then run:
>> create AGGREGATE array_agg(BASETYPE = bigint, SFUNC = array_append,STYPE =
>> bigint[], INITCOND = '{}');
>
>> It will be slow and reproduce this behavior.
>
> Thank you for running that to ground!  I confirm your results that v13
> and up are far slower for this example than v12 was.
>
>> Why would this run so much more slowly after updating from postgres 12 to
>> 17?   It is a user defined aggregate, although maybe not as optimized as
>> the intrinsic one it shouldn't behave this way.
>
> I did some bisecting using the attached simplified test case, and found
> that the query execution time jumps from circa 60ms to circa 7500ms here:
>
> 1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
> commit 1f39bce021540fde00990af55b4432c55ef4b3c7
> Author: Jeff Davis <jdavis@postgresql.org>
> Date:   Wed Mar 18 15:42:02 2020 -0700
>
>    Disk-based Hash Aggregation.
>
>    While performing hash aggregation, track memory usage when adding new
>    groups to a hash table. If the memory usage exceeds work_mem, enter
>    "spill mode".
>
> (Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
> maybe not directly comparable to production.)
>
> I'm bemused as to why: the test case has work_mem set high enough that
> we shouldn't be triggering spill mode, so why did this change affect
> it at all?
>
>                        regards, tom lane
>
> CREATE AGGREGATE array_agg(
>      BASETYPE = bigint,
>      SFUNC = array_append,
>      STYPE = bigint[],
>      INITCOND = '{}'
> );
>
> drop table if exists array_agg_test;
>
> create table array_agg_test(product_id bigint not null, region_id bigint
> not null, available boolean not null);
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[1,2,3,4])[floor(random()*4)+1] as region_id, true as available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[11,12,13,14])[floor(random()*4)+1] as region_id, true as available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[111,112,113,114])[floor(random()*4)+1] as region_id, true as
> available;
>
> insert into array_agg_test (product_id, region_id, available) SELECT
> generate_series(1, 50000) as product_id,
> (ARRAY[1111,1112,1113,1114])[floor(random()*4)+1] as region_id, true as
> available;
>
> vacuum analyze array_agg_test;
>
> \set ECHO all
>
> -- set hash_mem_multiplier = 2;
> set work_mem = "200MB";
>
> explain (analyze, buffers) select product_id, array_agg(region_id) from
> array_agg_test group by product_id;


Вложения

В списке pgsql-performance по дате отправления: