Re: Query with order by and limit is very slow - wrong index used

Поиск
Список
Период
Сортировка
От Gregg Jaskiewicz
Тема Re: Query with order by and limit is very slow - wrong index used
Дата
Msg-id CAJY59_grzRJUBurDZniuVtj2OgUNhvpKDD9HR0RGuT7D2H_mxg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query with order by and limit is very slow - wrong index used  (Nowak Michał <michal.nowak@me.com>)
Список pgsql-performance
2011/10/3 Nowak Michał <michal.nowak@me.com>:
>> How many rows do you have in that table?
>
> a9-dev=> select count(*) from records;
>  count
> ---------
> 3620311
> (1 row)


>
> a9-dev=> select source_id, count(*) from records where source_id =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'or source_id =
'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'group by source_id; 
>                      source_id                        | count
> --------------------------------------------------------+--------
> http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml    | 427254
> http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
> (2 rows)

So the second one is roughly 27% of the table.  I don't know the
actual condition under which planner changes over the seqscan, but
that value seems quite common it seems.
The other thing planner is going to look at is the correlation, most
common values, most common frequencies.
In other words, if the value is 27% of all values, but is evenly
spread across - I think planner will go for seq scan regardless.

At the end of the day (afaik), index scan only pics pages for narrowed
down seqscan really. So imagine if your index scan returned all the
pages, you would still have to do a seqscan on all of them. Planner is
trying to avoid that by weighting the costs of both operations.
If it is too slow to run the current queries, you could try
normalizing the table by splitting source_id into separate one and
referencing it by an id. Very often what you'd find is that doing so
lowers I/O required, hence saves a lot of time in queries. Downside
is, that it is bit harder to add/update the tables. But that's where
triggers and procedures come handy.


>
>> ALTER TABLE records ALTER id SET source_id 1000; vacuum analyze verbose records;
> Did you mean ALTER TABLE records ALTER id SET STATISTICS 1000;?

Yup, that's what I meant. Sorry.


--
GJ

В списке pgsql-performance по дате отправления:

Предыдущее
От: Venkat Balaji
Дата:
Сообщение: Re: : Performance Improvement Strategy
Следующее
От: Nowak Michał
Дата:
Сообщение: Fwd: Query with order by and limit is very slow - wrong index used