Re: Extremely slow HashAggregate in simple UNION query

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Extremely slow HashAggregate in simple UNION query
Дата
Msg-id 20190820173219.54dp463d7pkgg6un@alap3.anarazel.de
обсуждение исходный текст
Ответ на Extremely slow HashAggregate in simple UNION query  (Felix Geisendörfer <felix@felixge.de>)
Ответы Re: Extremely slow HashAggregate in simple UNION query  (Felix Geisendörfer <felix@felixge.de>)
Список pgsql-performance
Hi,

On 2019-08-20 17:11:58 +0200, Felix Geisendörfer wrote:
> today I debugged a query that was executing about 100x slower than expected, and was very surprised by what I found.
>
> I'm posting to this list to see if this might be an issue that should be fixed in PostgreSQL itself.
>
> Below is a simplified version of the query in question:
>
> SET work_mem='64MB';
> EXPLAIN ANALYZE
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b
> UNION
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b;
>
> HashAggregate  (cost=80020.01..100020.01 rows=2000000 width=8) (actual time=19.349..23.123 rows=1 loops=1)

FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
that doesn't actually address your concern...


> 1. The query overestimates the final output rows by a factor of 2 million. [1]

Right. There's not really that much we can do about that in
general. That'll always be possible. Although we can obviously improve
the estimates a good bit more.


> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
> wondering if this could be fixed by taking a more dynamic approach for
> allocating HashAggregate hash tables?

Under-sizing the hashtable just out of caution will have add overhead to
a lot more common cases. That requires copying data around during
growth, which is far far from free. Or you can use hashtables that don't
need to copy, but they're also considerably slower in the more common
cases.


> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of the query execution instead of 45ms [5].

Well, there's plenty work that's not attributed to nodes. IIRC we don't
track executor startup/shutdown overhead on a per-node basis. So I don't
really think this is necessarily something that suspicious.  Which
indeed seems to be what's happening here (this is with 11, to be able to
hit the problem with your reproducer):

+   33.01%  postgres  postgres          [.] tuplehash_iterate
-   18.39%  postgres  libc-2.28.so      [.] __memset_avx2_erms
   - 90.94% page_fault
        __memset_avx2_erms
        tuplehash_allocate
        tuplehash_create
        BuildTupleHashTableExt
        build_hash_table
        ExecInitAgg
        ExecInitNode
        InitPlan
        standard_ExecutorStart

Greetings,

Andres Freund



В списке pgsql-performance по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Extremely slow HashAggregate in simple UNION query
Следующее
От: Felix Geisendörfer
Дата:
Сообщение: Re: Extremely slow HashAggregate in simple UNION query