Re: Strange choice of general index over partial index

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Strange choice of general index over partial index
Дата
Msg-id CAMkU=1yDabjJSDZQtwc8QZ71tUxLs-RWTdAE8Q58-ugVmzK_TQ@mail.gmail.com
обсуждение исходный текст
Ответ на Strange choice of general index over partial index  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-performance
On Thu, Jan 15, 2015 at 2:30 PM, Josh Berkus <josh@agliodbs.com> wrote:
This is an obfuscation and mock up, but:

table files (
        id serial pk,
        filename text not null,
        state varchar(20) not null
        ... 18 more columns
)

index file_state on (state)
        (35GB in size)
index file_in_flight_state (state) where state in (
'waiting','assigning', 'processing' )
        (600MB in size)
... 10 more indexes

More important facts:
* state = 'done' 95% of the time.  thereform the partial index
represents only 5% of the table
* all indexes and the table are very bloated
* server has 128GB RAM
* Version 9.2.

Given this setup, I would expect the planner to *always* choose
file_in_flight_state over file_state for this query:

SELECT id, filename FROM files WHERE state = 'waiting';

... and yet it keeps selecting file_state based on extremely small
changes to the stats.   This is important because the same query, using
file_state, is 20X to 50X slower, because that index frequently gets
pushed out of memory.

What am I missing?  Or is this potentially a planner bug for costing?


I wonder if this could be related to 3e9960e9d935e7e7c12e78441, which first appeared in 9.2.3.

But I don't know why the small index *should* be better.  If this query is frequent, it should have no problem keeping just those leaf pages that contain the 'waiting' rows out of the full index in memory, without having to keep the 'done' leaf pages around.  And if it is not frequent, then it would have just as much problem keeping the smaller index in memory as it would a small portion of the large index.

Of course if it randomly switches back and forth, now you have to keep twice as much data in memory, the relevant parts of both indexes.

What is the point of having the full index at all, in this case?

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: shared_buffers vs Linux file cache
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: Strange choice of general index over partial index