Re: Why don't we consider explicit Incremental Sort?

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	Re: Why don't we consider explicit Incremental Sort?
Дата	24 сентября 2024 г. 05:21:28
Msg-id	CAMbWs4-3Z-pcHCWq3gBVHp=43kqaF3Quhwe5cUg-x9O54Lv2CA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Why don't we consider explicit Incremental Sort? (Tomas Vondra <tomas@vondra.me>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Sep 23, 2024 at 10:01 PM Tomas Vondra <tomas@vondra.me> wrote:
> You don't need any skew. Consider this perfectly uniform dataset:
>
> create table t (a int, b int);
> insert into t select 100000 * random(), 100 * random()
>   from generate_series(1,1000000) s(i);
> create index on t (a);
> vacuum analyze;
> checkpoint;
>
> explain analyze select * from t order by a, b;

I think if we want to compare the performance of incremental sort vs.
full sort on this dataset, we need to ensure that other variables are
kept constant, ie we need to ensure that both methods use the same
subpath, whether it's Index Scan or Seq Scan.

This is especially true in the scenario addressed by this patch, as
for a mergejoin, we only consider outerrel's cheapest_total_path when
we assume that an explicit sort atop is needed.  That is to say, the
subpath has already been chosen when it comes to determine whether to
use incremental sort or full sort.

According to this theory, here is what I got on this same dataset:

-- incremental sort
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=1.02..68838.65 rows=1000000 width=8) (actual
time=1.292..1564.265 rows=1000000 loops=1)
   Sort Key: a, b
   Presorted Key: a
   Full-sort Groups: 27022  Sort Method: quicksort  Average Memory:
25kB  Peak Memory: 25kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.408..1018.785 rows=1000000
loops=1)
 Planning Time: 0.998 ms
 Execution Time: 1602.861 ms
(7 rows)


-- full sort
set enable_incremental_sort to off;
set enable_seqscan to off;
explain analyze select * from t order by a, b;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=150576.54..153076.54 rows=1000000 width=8) (actual
time=1760.090..1927.598 rows=1000000 loops=1)
   Sort Key: a, b
   Sort Method: external merge  Disk: 17704kB
   ->  Index Scan using t_a_idx on t  (cost=0.42..37244.20
rows=1000000 width=8) (actual time=0.531..1010.931 rows=1000000
loops=1)
 Planning Time: 0.980 ms
 Execution Time: 1970.287 ms
(6 rows)

So incremental sort outperforms full sort on this dataset.

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Why don't we consider explicit Incremental Sort?