Re: slow seqscan

Поиск
Список
Период
Сортировка
От Stephan Szabo
Тема Re: slow seqscan
Дата
Msg-id 20040421072156.K55057@megazone.bigpanda.com
обсуждение исходный текст
Ответ на Re: slow seqscan  (Edoardo Ceccarelli <eddy@axa.it>)
Ответы Re: slow seqscan
Список pgsql-performance
On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote:

>
> > What happens if you go:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> > LOWER(testo));
> >
> > or even just:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
> >
> I wasn't able to make this 2 field index with lower:
>
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

That's a 7.4 feature I think (and I think the version with two columns
may need extra parens around the lower()). I think the only way to do
something equivalent in 7.3 is to make a function that concatenates the
two in some fashion after having applied the lower to the one part and
then using that in the queries as well.  Plus, if you're not in "C"
locale, I'm not sure that it'd help in 7.3 anyway.

> >> But the strangest thing ever is that if I change the filter with
> >> another one that represent a smaller amount of data  it uses the
> >> index scan!!!
> >
> >
> > What's strange about that?  The less data is going to be retrieved,
> > the more likely postgres is to use the index.
> >
> can't understand this policy:
>
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
>  count
> -------
>   6753
> (1 row)
>
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
>  count
> -------
>   2165
> (1 row)
>
> so it's using the index on 2000 rows and not for 6000?  it's not that
> big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request.  If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.

Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Paul Thomas
Дата:
Сообщение: Re: MySQL vs PG TPC-H benchmarks
Следующее
От: Dave Cramer
Дата:
Сообщение: Re: Wierd context-switching issue on Xeon