Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

Поиск

Список

Период

Сортировка

От	Dimitrios Apostolou
Тема	Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Дата	10 мая 2024 г. 19:24:00
Msg-id	69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net обсуждение исходный текст
Ответ на	SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions (Dimitrios Apostolou <jimis@gmx.net>)
Ответы	Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Список	pgsql-general

Дерево обсуждения

On Fri, 10 May 2024, Dimitrios Apostolou wrote:

> I noticed that the following query is very very slow (too long to wait for it
> to finish):
>
> SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;

Update: even the simplest SELECT DISTINCT query shows similar behaviour:

EXPLAIN
SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10;

  Limit  (cost=724518979.52..724518979.92 rows=10 width=4)
    ->  Unique  (cost=724518979.52..724518987.52 rows=200 width=4)
          ->  Sort  (cost=724518979.52..724518983.52 rows=1600 width=4)               Sort Key:
test_runs_raw.workitem_n
                ->  Gather  (cost=724518732.37..724518894.37 rows=1600 width=4)
                      Workers Planned: 4
                      ->  HashAggregate  (cost=724517732.37..724517734.37 rows=200 width=4)
                            Group Key: test_runs_raw.workitem_n
                            ->  Parallel Append  (cost=0.00..704131546.90 rows=8154474186 width=4)
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max9600k_pkey on
test_runs_raw__part_max9600ktest_runs_raw_480  (cost=0.57..1429238.50 rows=16811660 width=4) 
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max10140k_pkey on
test_runs_raw__part_max10140ktest_runs_raw_507  (cost=0.57..1081827.27 rows=12896836 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475
(cost=0.00..2717185.06rows=32060806 width=4) 
                                  ->  Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559
(cost=0.00..2610814.95rows=30806095 width=4) 


It also takes ages to return, so I have to interrupt it.

I believe it should exit early, as soon as it finds 10 distinct values
(which should be rather easy even with parallel seqscans, given the
pattern followed when inserting the data).


Thanks,
Dimitris

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions