Aggregate node doesn't include cost for sorting

Поиск
Список
Период
Сортировка
От David Geier
Тема Aggregate node doesn't include cost for sorting
Дата
Msg-id 2691f881-d8de-dd57-5793-381d7d13e3cc@gmail.com
обсуждение исходный текст
Ответы Re: Aggregate node doesn't include cost for sorting  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi hackers,

The cost of an Aggregate node seems to be the same regardless of the 
input being pre-sorted or not. If however the input is not sorted, the 
Aggregate node must additionally perform a sort which can impact runtime 
significantly. Here is an example:

CREATE TABLE foo(col0 INT, col1 TEXT);
INSERT INTO foo SELECT generate_series(1, 100000), 
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' || md5(random()::TEXT);
CREATE INDEX foo_idx ON foo(col1);
SET max_parallel_workers_per_gather = 0;
SET enable_bitmapscan = FALSE;

-- Unsorted input. Aggregate node must additionally sort all rows.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=2584.00..2584.01 rows=1 width=8) (actual 
time=1684.705..1684.809 rows=1 loops=1)
    ->  Seq Scan on foo  (cost=0.00..2334.00 rows=100000 width=71) 
(actual time=0.018..343.280 rows=100000 loops=1)
  Planning Time: 0.317 ms
  Execution Time: 1685.543 ms


-- Pre-sorted input. Aggregate node doesn't have to sort all rows.
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=6756.42..6756.43 rows=1 width=8) (actual 
time=819.028..819.041 rows=1 loops=1)
    ->  Index Only Scan using foo_idx on foo (cost=6506.42..6506.42 
rows=100000 width=71) (actual time=0.046..404.260 rows=100000 loops=1)
          Heap Fetches: 100000
          Heap Prefetches: 1334
  Planning Time: 0.438 ms
  Execution Time: 819.515 ms

The cost of the Aggregate node is in both cases the same (250.0) while 
its runtime is 1.3s in the unsorted case and 0.4s in the pre-sorted case.

Also, why does the Aggregate node sort itself? Why don't we instead emit 
an explicit Sort node in the plan so that it's visible to the user what 
is happening? As soon as there's also a GROUP BY in the query, a Sort 
node occurs in the plan. This seems inconsistent.

-- 
David Geier
(ServiceNow)




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: on placeholder entries in view rule action query's range table