Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions

Поиск
Список
Период
Сортировка
От Dimitrios Apostolou
Тема Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Дата
Msg-id 3efce60b-48c0-160d-0444-474b02f76739@gmx.net
обсуждение исходный текст
Ответ на Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (Dimitrios Apostolou <jimis@gmx.net>)
Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-general
On Sat, 11 May 2024, David Rowley wrote:

> On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Indeed that's an awful estimate, the table has more than 1M of unique
>> values in that column. Looking into pg_stat_user_tables, I can't see the
>> partitions having been vacuum'd or analyzed at all. I think they should
>> have been auto-analyzed, since they get a ton of INSERTs
>> (no deletes/updates though) and I have the default autovacuum settings.
>> Could it be that autovacuum starts, but never
>> finishes? I can't find something in the logs.
>
> It's not the partitions getting analyzed you need to worry about for
> an ndistinct estimate on the partitioned table. It's auto-analyze or
> ANALYZE on the partitioned table itself that you should care about.
>
> If you look at [1], it says "Tuples changed in partitions and
> inheritance children do not trigger analyze on the parent table."

Thanks

>
>> In any case, even after the planner decides to execute the terrible plan
>> with the parallel seqscans, why doesn't it finish right when it finds 10
>> distinct values?
>
> It will. It's just that Sorting requires fetching everything from its subnode.

Isn't it plain wrong to have a sort step in the plan than? The different
partitions contain different value ranges with no overlap, and the last
query I posted doesn't even contain an ORDER BY clause, just a DISTINCT
clause on an indexed column.

Even with bad estimates, even with seq scan instead of index scan, the
plan should be such that it concludes all parallel work as soon as it
finds the 10 distinct values. And this is actually achieved if I disable
parallel plans. Could it be a bug in the parallel plan generation?


Dimitris




В списке pgsql-general по дате отправления:

Предыдущее
От: Dimitrios Apostolou
Дата:
Сообщение: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Следующее
От: Dimitrios Apostolou
Дата:
Сообщение: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions