Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id CAApHDvq9QWL6az9veW1-HaW4LwGenW0V8YyKXFLZYoaF_4AQ+Q@mail.gmail.com
обсуждение исходный текст
Ответ на a wrong index choose when statistics is out of date  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: a wrong index choose when statistics is out of date
Re: a wrong index choose when statistics is out of date
Re: a wrong index choose when statistics is out of date
Список pgsql-hackers
On Sun, 3 Mar 2024 at 20:08, Andy Fan <zhihuifan1213@163.com> wrote:
> The issue can be reproduced with the following steps:
>
> create table x_events (.., created_at timestamp, a int, b int);
>
> create index idx_1 on t(created_at, a);
> create index idx_2 on t(created_at, b);
>
> query:
> select * from t where create_at = current_timestamp and b = 1;
>
> index (created_at, a) rather than (created_at, b) may be chosen for the
> above query if the statistics think "create_at = current_timestamp" has
> no rows, then both index are OK, actually it is true just because
> statistics is out of date.

I don't think there's really anything too special about the fact that
the created_at column is always increasing. We commonly get 1-row
estimates after multiplying the selectivities from individual stats.
Your example just seems like yet another reason that this could
happen.

I've been periodically talking about introducing "risk" as a factor
that the planner should consider.  I did provide some detail in [1]
about the design that was in my head at that time.  I'd not previously
thought that it could also solve this problem, but after reading your
email, I think it can.

I don't think it would be right to fudge the costs in any way, but I
think the risk factor for IndexPaths could take into account the
number of unmatched index clauses and increment the risk factor, or
"certainty_factor" as it is currently in my brain-based design. That
way add_path() would be more likely to prefer the index that matches
the most conditions.

The exact maths to calculate the certainty_factor for this case I
don't quite have worked out yet. I plan to work on documenting the
design of this and try and get a prototype patch out sometime during
this coming southern hemisphere winter so that there's at least a full
cycle of feedback opportunity before the PG18 freeze.

We should do anything like add column options in the meantime. Those
are hard to remove once added.

David

[1] https://www.postgresql.org/message-id/CAApHDvo2sMPF9m%3Di%2BYPPUssfTV1GB%3DZ8nMVa%2B9Uq4RZJ8sULeQ%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amul Sul
Дата:
Сообщение: Re: Add system identifier to backup manifest
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Some shared memory chunks are allocated even if related processes won't start