Re: Odd (slow) plan choice with min/max

Поиск
Список
Период
Сортировка
От Yoan SULTAN
Тема Re: Odd (slow) plan choice with min/max
Дата
Msg-id CAPJFe0Y10nUvdBWf9ti4g2RDojXMwGgSpS5tAn16_1wqWF4DwQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Odd (slow) plan choice with min/max  (Paul McGarry <paul@paulmcgarry.com>)
Список pgsql-performance
Another workaround could be :

explain analyze select min(risk_id),max(risk_id) from (select * from risk where time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00') t;  

in order to force the planner to use first the timestamp index.

However, I agree with you; we meet a planner bad behavior here.

Regards,
Yoan SULTAN

Le mar. 23 mars 2021 à 22:38, Paul McGarry <paul@paulmcgarry.com> a écrit :


On Wed, 24 Mar 2021 at 00:07, Rick Otten <rottenwindfish@gmail.com> wrote:

Yes, the columns are highly correlated, but that alone doesn't seem like it should be sufficient criteria to choose this plan.
Ie the selection criteria (1 day of data about a year ago) has a year+ worth of data after it and probably a decade of data before it, so anything walking a correlated index from top or bottom is going to have to walk past a lot of data before it gets to data that fits the criteria.


I assume you have a statistic on the correlated columns, ie `create statistic` ?

I didn't, but adding
======
CREATE STATISTICS risk_risk_id_time_correlation_stats ON risk_id,time FROM risk;
analyze risk;
======
doesn't seem to help.
I get the same plan before/after. Second run was faster, but just because data was hot.

 
If you can't use partitions on your date column, can you use partial indexes instead?   Or a functional index with min() over day and max() over day?

I don't particularly want to add more weird indexes to solve this one particular query. as the existing risk_id index should make it efficient enough if only the planner chose to use it. This is part of an archiving job, identifying sections of historical data, so not a query that needs to be super optimised, but essentially doing a full table scan backwards/forwards as it is now is doing a lot of unnecessary IO that would be best left free for more time sensitive queries.My count(() workaround works so we can use that.
I'm more interested in understanding why the planner makes what seems to be an obviously bad choice.

Paul 


--
Regards,
Yo.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)
Следующее
От: Geervan Hayatnagarkar
Дата:
Сообщение: High-volume writes - what is the max throughput possible