Re: postgres chooses objectively wrong index
| От | Alexey Ermakov |
|---|---|
| Тема | Re: postgres chooses objectively wrong index |
| Дата | |
| Msg-id | 381e7d47-8021-458d-9a64-0cdb80b004f0@gmail.com обсуждение |
| Ответ на | Re: postgres chooses objectively wrong index (Merlin Moncure <mmoncure@gmail.com>) |
| Ответы |
Re: postgres chooses objectively wrong index
|
| Список | pgsql-performance |
On 2026-03-18 04:52, Merlin Moncure wrote:
On Tue, Mar 17, 2026 at 4:16 PM Alexey Ermakov <alexius.work@gmail.com> wrote:On 2026-03-18 03:01, Merlin Moncure wrote:I've been maintaining an airflow style orchestrator in pl/pgsql, and it's revealed a performance issue I just can't solve. There is a table, task, which may normally contain billions of rows, but only a tiny portion is interesting for specific reasons—a common pattern in task-type systems....I'm wondering if there are other tricks that might apply here, for example, multi column index statistics...curious if anyone has thoughts on that.Any suggestions?merlinHello. I think planner doesn't have information about distribution of async.task_execution_state(task) unless it's part of any full index. I would try to give that with extended statistics (postgresql 14+):
create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task; analyze async.task;If that won't help - please show distribution from pg_stats_ext view for extended statistic above.
This unfortunately fails, probably because the table type includes system columns (despite not using them).orchestrator_service_user@orchestrator=> create statistics task_stats (mcv) on (async.task_execution_state(task)) from async.task;
ERROR: statistics creation on system columns is not supportedThis would require some refactoring to fix.
Interesting... In that case functional index should help (as it also makes statistic for the planner):
create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));
analyze async.task;
Perhaps multicolumn index will also help for queries but hard to say without knowing distributions. We could check state distribution info after index creation and analyze with query like this:
select * from pg_stats where tablename = 'task_task_execution_state_idx' \gx
--
Alexey Ermakov
В списке pgsql-performance по дате отправления: