Обсуждение: Determining which index to create

Поиск
Список
Период
Сортировка

Determining which index to create

От
Eric Cholet
Дата:
I have the following table :

 Attribute |           Type           | Modifier
-----------+--------------------------+----------
 motid     | integer                  | not null
 objid     | integer                  | not null
 date      | timestamp with time zone | not null

...with 140 million rows. For each distinct value of
motid there are many rows (with different objid/dates).
I would like to optimize the following query:

=> select * from dico_frs where motid=4742 order by date desc limit 10;

Creating an index on 'date' makes the query use that index:

Limit  (cost=0.00..17591.91 rows=10 width=16)
  ->  Index Scan Backward using dico_frs_date on dico_frs
(cost=0.00..20023641.63 rows=11382 width=16)

But it's still quite slow. I'm thinking an index on (motid, date desc)
would be best
but that doesn't seem to be possible. How can I optimize this query?

--
Eric Cholet


Re: Determining which index to create

От
Martijn van Oosterhout
Дата:
On Wed, Nov 21, 2001 at 12:23:07PM +0100, Eric Cholet wrote:
> I would like to optimize the following query:
>
> => select * from dico_frs where motid=4742 order by date desc limit 10;
>
> But it's still quite slow. I'm thinking an index on (motid, date desc)
> would be best
> but that doesn't seem to be possible. How can I optimize this query?

Indexes (at least btree ones) can be scanned in either forward or backward
directions. So an index on (motid,date) should be fine.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Determining which index to create

От
Eric Cholet
Дата:
--On mercredi 21 novembre 2001 22:48 +1100 Martijn van Oosterhout
<kleptog@svana.org> wrote:

> On Wed, Nov 21, 2001 at 12:23:07PM +0100, Eric Cholet wrote:
>> I would like to optimize the following query:
>>
>> => select * from dico_frs where motid=4742 order by date desc limit 10;
>>
>> But it's still quite slow. I'm thinking an index on (motid, date desc)
>> would be best
>> but that doesn't seem to be possible. How can I optimize this query?
>
> Indexes (at least btree ones) can be scanned in either forward or backward
> directions. So an index on (motid,date) should be fine.

I should have mentionned I tried that, but it isn't being used:

=> \d dico_frs_motid_date
     Index "dico_frs_motid_date"
 Attribute |           Type
-----------+--------------------------
 motid     | integer
 date      | timestamp with time zone
btree

=> explain select * from dico_frs where motid=4742 order by date desc limit
10;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..17591.91 rows=10 width=16)
  ->  Index Scan Backward using dico_frs_date on dico_frs
(cost=0.00..20023641.63 rows=11382 width=16)

--
Eric Cholet


Re: Determining which index to create

От
Martijn van Oosterhout
Дата:
On Wed, Nov 21, 2001 at 12:53:09PM +0100, Eric Cholet wrote:
> I should have mentionned I tried that, but it isn't being used:
>
> => \d dico_frs_motid_date
>      Index "dico_frs_motid_date"
>  Attribute |           Type
> -----------+--------------------------
>  motid     | integer
>  date      | timestamp with time zone
> btree
>
> => explain select * from dico_frs where motid=4742 order by date desc limit
> 10;
> NOTICE:  QUERY PLAN:
>
> Limit  (cost=0.00..17591.91 rows=10 width=16)
>   ->  Index Scan Backward using dico_frs_date on dico_frs
> (cost=0.00..20023641.63 rows=11382 width=16)

Well, it is doing the scan backwards, which is good. But it's not using the
index. If you drop dico_frs_date index, does it do it then?

Oh, and what version of postgres was this again?
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Determining which index to create

От
Eric Cholet
Дата:

--On jeudi 22 novembre 2001 01:04 +1100 Martijn van Oosterhout
<kleptog@svana.org> wrote:

> On Wed, Nov 21, 2001 at 12:53:09PM +0100, Eric Cholet wrote:
>> I should have mentionned I tried that, but it isn't being used:
>>
>> => \d dico_frs_motid_date
>>      Index "dico_frs_motid_date"
>>  Attribute |           Type
>> -----------+--------------------------
>>  motid     | integer
>>  date      | timestamp with time zone
>> btree
>>
>> => explain select * from dico_frs where motid=4742 order by date desc
>> limit  10;
>> NOTICE:  QUERY PLAN:
>>
>> Limit  (cost=0.00..17591.91 rows=10 width=16)
>>   ->  Index Scan Backward using dico_frs_date on dico_frs
>> (cost=0.00..20023641.63 rows=11382 width=16)
>
> Well, it is doing the scan backwards, which is good. But it's not using
> the index. If you drop dico_frs_date index, does it do it then?

=> explain select * from dico_frs where motid=4742 order by date desc limit
10;
NOTICE:  QUERY PLAN:

Limit  (cost=46172.25..46172.25 rows=10 width=16)
  ->  Sort  (cost=46172.25..46172.25 rows=11382 width=16)
        ->  Index Scan using dico_frs_motid_date on dico_frs
(cost=0.00..45405.39 rows=11382 width=16)


It's a bit better but still quite long, depending on how many rows for a
particular motid.
Dropping the "desc" in the "order by date" clause makes things much faster,
but I need the
results in reverse chronological order!

> Oh, and what version of postgres was this again?

=> select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.1.3 on i386-unknown-freebsd4.4, compiled by GCC 2.95.3
(1 row)


Thanks for your help,
--
Eric Cholet


Re: Determining which index to create

От
Stephan Szabo
Дата:
On Wed, 21 Nov 2001, Eric Cholet wrote:

> => explain select * from dico_frs where motid=4742 order by date desc limit
> 10;
> NOTICE:  QUERY PLAN:
>
> Limit  (cost=46172.25..46172.25 rows=10 width=16)
>   ->  Sort  (cost=46172.25..46172.25 rows=11382 width=16)
>         ->  Index Scan using dico_frs_motid_date on dico_frs
> (cost=0.00..45405.39 rows=11382 width=16)
>
>
> It's a bit better but still quite long, depending on how many rows for a
> particular motid.
> Dropping the "desc" in the "order by date" clause makes things much faster,
> but I need the
> results in reverse chronological order!

Hmm, it looks like the sort is the expensive bit even though it's
estimating something low for it (relative to the index scan).
Have you tried setting sort_mem higher than the defaults (which are
really low) to see if it's just going out to disk for the sort.



Re: Determining which index to create

От
Martijn van Oosterhout
Дата:
On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote:
> => explain select * from dico_frs where motid=4742 order by date desc limit
> 10;
> NOTICE:  QUERY PLAN:
>
> Limit  (cost=46172.25..46172.25 rows=10 width=16)
>   ->  Sort  (cost=46172.25..46172.25 rows=11382 width=16)
>         ->  Index Scan using dico_frs_motid_date on dico_frs
> (cost=0.00..45405.39 rows=11382 width=16)

That's wrong. It doesn't seem to realise that a reverse scan on the index
would give the right answer. Note that that's only true because you're
selecting only a single motid. If there were multiple, a reverse scan would
definitly not be appropriate.

Not sure how to fix this though.

Functional index?
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Determining which index to create

От
Hiroshi Inoue
Дата:
Martijn van Oosterhout wrote:
>
> On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote:
> > => explain select * from dico_frs where motid=4742 order by date desc limit
> > 10;
> > NOTICE:  QUERY PLAN:
> >
> > Limit  (cost=46172.25..46172.25 rows=10 width=16)
> >   ->  Sort  (cost=46172.25..46172.25 rows=11382 width=16)
> >         ->  Index Scan using dico_frs_motid_date on dico_frs
> > (cost=0.00..45405.39 rows=11382 width=16)
>
> That's wrong. It doesn't seem to realise that a reverse scan on the index
> would give the right answer. Note that that's only true because you're
> selecting only a single motid. If there were multiple, a reverse scan would
> definitly not be appropriate.

Please try
  select * from dico_frs where motid=4742 order by motid desc,
  date desc limit 10;

regards,
Hiroshi Inoue

Re: Determining which index to create

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> That's wrong. It doesn't seem to realise that a reverse scan on the index
> would give the right answer. Note that that's only true because you're
> selecting only a single motid.

Yeah, there's currently little connection between the planner's code that
recognizes "you could use this index to meet these WHERE conditions"
and the code that recognizes "you could use this index to produce this
sort ordering".  In particular it has no clue that exact equality
constraints on earlier index columns might allow it to consider the
indexscan result as being sorted by later index columns.

> Not sure how to fix this though.

Offhand it doesn't seem like a trivial change :-(

I'd be wary of expending a lot of planning cycles to detect this,
because it doesn't seem like a very common case.  But if we can find
a way to do it cheaply, it'd be a cool optimization.

            regards, tom lane

Re: Determining which index to create

От
Eric Cholet
Дата:
--On jeudi 22 novembre 2001 11:00 +0900 Hiroshi Inoue <Inoue@tpf.co.jp>
wrote:

> Martijn van Oosterhout wrote:
>>
>> On Wed, Nov 21, 2001 at 04:09:52PM +0100, Eric Cholet wrote:
>> > => explain select * from dico_frs where motid=4742 order by date desc
>> > limit 10;
>> > NOTICE:  QUERY PLAN:
>> >
>> > Limit  (cost=46172.25..46172.25 rows=10 width=16)
>> >   ->  Sort  (cost=46172.25..46172.25 rows=11382 width=16)
>> >         ->  Index Scan using dico_frs_motid_date on dico_frs
>> > (cost=0.00..45405.39 rows=11382 width=16)
>>
>> That's wrong. It doesn't seem to realise that a reverse scan on the index
>> would give the right answer. Note that that's only true because you're
>> selecting only a single motid. If there were multiple, a reverse scan
>> would definitly not be appropriate.
>
> Please try
>   select * from dico_frs where motid=4742 order by motid desc,
>   date desc limit 10;

Wow, I am speechless. Sub-second response time, whether the result set
is large or very small. Very impressive. I have resisted pressure to
use Oracle for this application, trusting open source software would do
an equivalent or better job (this table has 140 million records).

Thank you very much.

--
Eric Cholet