Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

Поиск
Список
Период
Сортировка
От Dimitrios Apostolou
Тема Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Дата
Msg-id 69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net
обсуждение исходный текст
Ответ на SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (Dimitrios Apostolou <jimis@gmx.net>)
Ответы Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Список pgsql-general
On Fri, 10 May 2024, Dimitrios Apostolou wrote:

> I noticed that the following query is very very slow (too long to wait for it
> to finish):
>
> SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;

Update: even the simplest SELECT DISTINCT query shows similar behaviour:

EXPLAIN
SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10;

  Limit  (cost=724518979.52..724518979.92 rows=10 width=4)
    ->  Unique  (cost=724518979.52..724518987.52 rows=200 width=4)
          ->  Sort  (cost=724518979.52..724518983.52 rows=1600 width=4)               Sort Key:
test_runs_raw.workitem_n
                ->  Gather  (cost=724518732.37..724518894.37 rows=1600 width=4)
                      Workers Planned: 4
                      ->  HashAggregate  (cost=724517732.37..724517734.37 rows=200 width=4)
                            Group Key: test_runs_raw.workitem_n
                            ->  Parallel Append  (cost=0.00..704131546.90 rows=8154474186 width=4)
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max9600k_pkey on
test_runs_raw__part_max9600ktest_runs_raw_480  (cost=0.57..1429238.50 rows=16811660 width=4) 
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max10140k_pkey on
test_runs_raw__part_max10140ktest_runs_raw_507  (cost=0.57..1081827.27 rows=12896836 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475
(cost=0.00..2717185.06rows=32060806 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559
(cost=0.00..2610814.95rows=30806095 width=4) 


It also takes ages to return, so I have to interrupt it.

I believe it should exit early, as soon as it finds 10 distinct values
(which should be rather easy even with parallel seqscans, given the
pattern followed when inserting the data).


Thanks,
Dimitris



В списке pgsql-general по дате отправления:

Предыдущее
От: Dimitrios Apostolou
Дата:
Сообщение: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Следующее
От: Dimitrios Apostolou
Дата:
Сообщение: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions