Обсуждение: how to enforce index sub-select over filter+seqscan

Поиск
Список
Период
Сортировка

how to enforce index sub-select over filter+seqscan

От
Dmitry Teslenko
Дата:
Hello!

I have this table:

create table test (
    s1 varchar(255),
    s2 varchar(255),
    i1 integer,
    i2 integer,

... over 100 other fields

);

table contains over 8 million records

there's these indexes:

create index is1 on test (s1);
create index is2 on test (s2);
create index ii1 on test (i1);
create index ii2 on test (i2);
create index ii3 on test (i1, i2);

and then i run this query:

select
*
from (
    select  *
    from test
    where
        is1 = 'aa' or is2 = 'aa'
    )
where
    is1 = 1
    or (is1  = 1
        and is2 = 1)
    or (is1  = 2
        and is2 = 2)
    or (is1  = 3
        and is2 = 3)


where part of outer query can have different count of
    "or (is1  = N
        and is2 = M)"
expressions, lets name this number X.

When X is low planner chooses index scan using is1 and is2,
then BitmapAnd that with index scan using  ii1, ii2 or ii3.

But when X is big enough (> 15) planner chooses seqscan and filter on
i1, i2, s1, s2.
Seqscan is very slow and I want to avoid it. Subquery is very fast
and i don't know why postgres chooses that plan.

I know I can set enable_seqscan = off.
Is there other ways to enforce index usage?

postgres pg_class have right estimate of rowcount.

--
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Re: how to enforce index sub-select over filter+seqscan

От
"Kevin Grittner"
Дата:
Dmitry Teslenko <dteslenko@gmail.com> wrote:

> Seqscan is very slow and I want to avoid it. Subquery is very fast
> and i don't know why postgres chooses that plan.
>
> I know I can set enable_seqscan = off.
> Is there other ways to enforce index usage?

If you come at it from that angle, you probably won't get the best
resolution.  PostgreSQL can see the alternative plans, and develops
estimated costs of running each.  It uses the one that it thinks
will be fastest.  If it's wrong, there's probably something wrong
with the statistics it uses for estimating, or with the costing
information.  (There are some cases where it's not able to
accurately estimate costs even if these are right, but let's check
the more common cases first.)

Please provide a little more information, like PostgreSQL version,
the postgresql.conf contents (excluding comments), OS, hardware, and
the EXPLAIN ANALYZE output of the query with and without
enable_seqscan = off.

Other useful ideas here:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: how to enforce index sub-select over filter+seqscan

От
Merlin Moncure
Дата:
On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko <dteslenko@gmail.com> wrote:
> Hello!
>
> I have this table:
>
> create table test (
>        s1 varchar(255),
>        s2 varchar(255),
>        i1 integer,
>        i2 integer,
>
> ... over 100 other fields
>
> );
>
> table contains over 8 million records
>
> there's these indexes:
>
> create index is1 on test (s1);
> create index is2 on test (s2);
> create index ii1 on test (i1);
> create index ii2 on test (i2);
> create index ii3 on test (i1, i2);
>
> and then i run this query:
>
> select
> *
> from (
>        select  *
>        from test
>        where
>                is1 = 'aa' or is2 = 'aa'
>        )
> where
>        is1 = 1
>        or (is1  = 1
>                and is2 = 1)
>        or (is1  = 2
>                and is2 = 2)
>        or (is1  = 3
>                and is2 = 3)

hm, I think you meant to say:
s1 = 'aa' or s2 = 'aa', i1 = 1 ... etc. details are important!

Consider taking the combination of 'correct' pair of i1 and i2 and
building a table with 'values' and joining to that:

select  * from test
  join
  (
    values (2,2),  (3,3), ...
  ) q(i1, i2) using(i1,i2)
  where  s1 = 'aa' or s2 = 'aa' or i1=1

merlin

Re: how to enforce index sub-select over filter+seqscan

От
Robert Haas
Дата:
On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko <dteslenko@gmail.com> wrote:
> I know I can set enable_seqscan = off.
> Is there other ways to enforce index usage?

Not really, but I suspect random_page_cost and seq_page_cost might
help the planner make better decisions.  Is your data by any chance
mostly cached in memory?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company