Обсуждение: Index not being used in MAX function (7.2.3)

Поиск

Список

Период

Сортировка

Index not being used in MAX function (7.2.3)

От

Paulo Jan

Дата:

10 июня 2003 г., 17:25:16

Hi all:

    I have here a table belonging to a message board (Phorum 3.3), and
there's an index in it that is not being used for reasons that I don't
understand. The table is:


                         Table "todocinetv"
    Column    |            Type             |      Modifiers
-------------+-----------------------------+----------------------
  id          | integer                     | not null default '0'
  datestamp   | timestamp without time zone | not null
  thread      | integer                     | not null default '0'
  parent      | integer                     | not null default '0'
  author      | character(37)               | not null default ''
  subject     | character(255)              | not null default ''
  email       | character(200)              | not null default ''
  attachment  | character(64)               | default ''
  host        | character(50)               | not null default ''
  email_reply | character(1)                | not null default 'N'
  approved    | character(1)                | not null default 'N'
  msgid       | character(100)              | not null default ''
  modifystamp | integer                     | not null default '0'
  userid      | integer                     | not null default '0'
Indexes: todocinetv_approved,
          todocinetv_author,
          todocinetv_datestamp,
          todocinetv_modifystamp,
          todocinetv_msgid,
          todocinetv_parent,
          todocinetv_subject,
          todocinetv_thread,
          todocinetv_userid,
          todocinetvpri_key


    And the index "todocinetvpri_key" is created on the primary key (id).
Yet when I do:

explain select max(id) from todocinetv;
NOTICE:  QUERY PLAN:

Aggregate  (cost=30939.22..30939.22 rows=1 width=4)
   ->  Seq Scan on todocinetv  (cost=0.00..30882.98 rows=22498 width=4)


    It doesn't use the index, and surely, it takes forever. I have tried
with VACUUM ANALYZE, and also dropping the index, creating it again and
VACUUMing it, and it never uses it. The only explanation I can come up
with is that the MAX() function doesn't use indices; I have tried with
tables in other databases (running Postgres 7.2.1), and it doesn't use
the indices in any of them.
    Is this the right behaviour? Or there is something else going on? The
table mentioned above is in a Postgres 7.2.3 server, while the other
ones that I used for testing were, as I said, in 7.2.1.



                    Paulo Jan.
                    DDnet.

Re: Index not being used in MAX function (7.2.3)

От

Dmitry Tkach

Дата:

10 июня 2003 г., 18:06:29

Yep. It's a "feature" :-)
I had hard time understanding why too...
Apparently, there is nothing special about max() aggregate from the
query planner's standpoint, compared to any other aggregate function. It
doesn't know its specifics, and thus is not able to figure out that all
it really needs is to grab the first entry from the index...

To mak it quicker do this instead:

select id from todocinetv order by id desc limit 1;

I hope, it helps...

Dima

Paulo Jan wrote:

> Hi all:
>
>     I have here a table belonging to a message board (Phorum 3.3), and
> there's an index in it that is not being used for reasons that I don't
> understand. The table is:
>
>
>                         Table "todocinetv"
>    Column    |            Type             |      Modifiers
> -------------+-----------------------------+----------------------
>  id          | integer                     | not null default '0'
>  datestamp   | timestamp without time zone | not null
>  thread      | integer                     | not null default '0'
>  parent      | integer                     | not null default '0'
>  author      | character(37)               | not null default ''
>  subject     | character(255)              | not null default ''
>  email       | character(200)              | not null default ''
>  attachment  | character(64)               | default ''
>  host        | character(50)               | not null default ''
>  email_reply | character(1)                | not null default 'N'
>  approved    | character(1)                | not null default 'N'
>  msgid       | character(100)              | not null default ''
>  modifystamp | integer                     | not null default '0'
>  userid      | integer                     | not null default '0'
> Indexes: todocinetv_approved,
>          todocinetv_author,
>          todocinetv_datestamp,
>          todocinetv_modifystamp,
>          todocinetv_msgid,
>          todocinetv_parent,
>          todocinetv_subject,
>          todocinetv_thread,
>          todocinetv_userid,
>          todocinetvpri_key
>
>
>     And the index "todocinetvpri_key" is created on the primary key
> (id). Yet when I do:
>
> explain select max(id) from todocinetv;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=30939.22..30939.22 rows=1 width=4)
>   ->  Seq Scan on todocinetv  (cost=0.00..30882.98 rows=22498 width=4)
>
>
>     It doesn't use the index, and surely, it takes forever. I have
> tried with VACUUM ANALYZE, and also dropping the index, creating it
> again and VACUUMing it, and it never uses it. The only explanation I
> can come up with is that the MAX() function doesn't use indices; I
> have tried with tables in other databases (running Postgres 7.2.1),
> and it doesn't use the indices in any of them.
>     Is this the right behaviour? Or there is something else going on?
> The table mentioned above is in a Postgres 7.2.3 server, while the
> other ones that I used for testing were, as I said, in 7.2.1.
>
>
>
>                     Paulo Jan.
>                     DDnet.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: Index not being used in MAX function (7.2.3)

От

Jonathan Bartlett

Дата:

10 июня 2003 г., 18:27:30

Is your index a hash or btree?

Jon

On Tue, 10 Jun 2003, Paulo Jan wrote:

> Hi all:
>
>     I have here a table belonging to a message board (Phorum 3.3), and
> there's an index in it that is not being used for reasons that I don't
> understand. The table is:
>
>
>                          Table "todocinetv"
>     Column    |            Type             |      Modifiers
> -------------+-----------------------------+----------------------
>   id          | integer                     | not null default '0'
>   datestamp   | timestamp without time zone | not null
>   thread      | integer                     | not null default '0'
>   parent      | integer                     | not null default '0'
>   author      | character(37)               | not null default ''
>   subject     | character(255)              | not null default ''
>   email       | character(200)              | not null default ''
>   attachment  | character(64)               | default ''
>   host        | character(50)               | not null default ''
>   email_reply | character(1)                | not null default 'N'
>   approved    | character(1)                | not null default 'N'
>   msgid       | character(100)              | not null default ''
>   modifystamp | integer                     | not null default '0'
>   userid      | integer                     | not null default '0'
> Indexes: todocinetv_approved,
>           todocinetv_author,
>           todocinetv_datestamp,
>           todocinetv_modifystamp,
>           todocinetv_msgid,
>           todocinetv_parent,
>           todocinetv_subject,
>           todocinetv_thread,
>           todocinetv_userid,
>           todocinetvpri_key
>
>
>     And the index "todocinetvpri_key" is created on the primary key (id).
> Yet when I do:
>
> explain select max(id) from todocinetv;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=30939.22..30939.22 rows=1 width=4)
>    ->  Seq Scan on todocinetv  (cost=0.00..30882.98 rows=22498 width=4)
>
>
>     It doesn't use the index, and surely, it takes forever. I have tried
> with VACUUM ANALYZE, and also dropping the index, creating it again and
> VACUUMing it, and it never uses it. The only explanation I can come up
> with is that the MAX() function doesn't use indices; I have tried with
> tables in other databases (running Postgres 7.2.1), and it doesn't use
> the indices in any of them.
>     Is this the right behaviour? Or there is something else going on? The
> table mentioned above is in a Postgres 7.2.3 server, while the other
> ones that I used for testing were, as I said, in 7.2.1.
>
>
>
>                     Paulo Jan.
>                     DDnet.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Re: Index not being used in MAX function (7.2.3)

От

Ernest E Vogelsinger

Дата:

10 июня 2003 г., 21:52:58

At 18:39 10.06.2003, Paulo Jan said:
--------------------[snip]--------------------
>Hi all:
>
>       I have here a table belonging to a message board (Phorum 3.3), and
>there's an index in it that is not being used for reasons that I don't
>understand. The table is:
>
> ...
>
> select max(id) from todocinetv;
--------------------[snip]--------------------

Your question has already been answered, but:

I suspect that you're doing this to retrieve the row ID of a newly inserted
row. This may not be foolproof since others could already have inserted
rows in between your insertion and this ID lookup.

Assuming that you're using a sequence to provide the primary key (which you
should) you may safely query its current value:
    SELECT currval('todocinetv_id_seq') as "newid"

This is guaranteed to return the last value **for your connection** only,
regardless if the sequence has actually been incremented by others or not.
And it's lightning fast.

Just my 2c :)


--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/

Re: Index not being used in MAX function (7.2.3)

От

Paulo Jan

Дата:

11 июня 2003 г., 12:30:26

Jonathan Bartlett wrote:
> Is your index a hash or btree?
>
> Jon
>


    It's a btree, but anyway, I see that others have already answered my
question. So it's a "feature" and not a bug? Hrmpf.
    (BTW, the code I was running wasn't written by me; it was part of
Phorum, a PHP web posting board application. I'll try to patch it to
make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).



                    Paulo Jan.
                    DDnet.

doing VALID UNTIL programmatically in SQL ?

От

Karsten Hilbert

Дата:

11 июня 2003 г., 13:00:24

Hi all,

in a psql script for GnuMed (www.gnumed.org) I am using a
snippet like the following for setting up predefined test
accounts:

CREATE USER "test-doc"
    WITH PASSWORD 'test-doc'
    IN GROUP "gm-doctors", "gm-public"
    VALID UNTIL '2003-09-30'
;

I would like to constrain their validity to, say, six months. I
have tried but not found a way to tell the VALID UNTIL clause
something like

    now() + '6 months'::interval

Anyone have a suggestion (short of calculating in the client at
runtime and substituting) on how to do this in plain SQL ?

Thanks a lot,

Karsten Hilbert, MD
GnuMed i18n coordinator
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: doing VALID UNTIL programmatically in SQL ?

От

Tom Lane

Дата:

11 июня 2003 г., 13:32:24

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> I have tried but not found a way to tell the VALID UNTIL clause
> something like
>     now() + '6 months'::interval

CREATE USER, like pretty much all utility statements in Postgres,
won't do any expression evaluation --- the parameters have to be
simple literal constants.

The only workaround I can think of is to do the CREATE USER inside
a helper plpgsql function that does the time calculation and then
uses EXECUTE to run the command with the appropriate values plugged
in.  It seems unlikely that this is a cleaner solution than doing it
in the client code ...

            regards, tom lane

Re: doing VALID UNTIL programmatically in SQL ?

От

"Jimmie H. Apsey"

Дата:

11 июня 2003 г., 13:52:04

I think it has to be done like this:  "now() + '@ 6 month' as
six_months_from_now;" which yields

six_months_from_now
------------------------
 2003-12-11 09:38:24-05
(1 row)

Jim Apsey
--------------------------------------------------------------------------------------
Karsten Hilbert wrote:

>Hi all,
>
>in a psql script for GnuMed (www.gnumed.org) I am using a
>snippet like the following for setting up predefined test
>accounts:
>
>CREATE USER "test-doc"
>    WITH PASSWORD 'test-doc'
>    IN GROUP "gm-doctors", "gm-public"
>    VALID UNTIL '2003-09-30'
>;
>
>I would like to constrain their validity to, say, six months. I
>have tried but not found a way to tell the VALID UNTIL clause
>something like
>
>    now() + '6 months'::interval
>
>Anyone have a suggestion (short of calculating in the client at
>runtime and substituting) on how to do this in plain SQL ?
>
>Thanks a lot,
>
>Karsten Hilbert, MD
>GnuMed i18n coordinator
>
>

Re: Index not being used in MAX function (7.2.3)

От

Chris Gamache

Дата:

11 июня 2003 г., 15:44:34

Wouldn't it make sense to optimize max() and min() for indexed columns? I don't
know if I'm barking up the wrong tree, but would it be possible to create an
aggregate (o_max, o_min) to make the query planner treat it differently from
other aggregates? IMO, (if possible...) this would be a more elegant solution
than SQL'ing around the "feature". If it is possible, it might be a nifty
contrib module, poised for inclusion in the production code. Any
takers/thoughts? :)

CG

--- Paulo Jan <admin@digital.ddnet.es> wrote:
> Jonathan Bartlett wrote:
> > Is your index a hash or btree?
> >
> > Jon
> >
>
>
>     It's a btree, but anyway, I see that others have already answered my
> question. So it's a "feature" and not a bug? Hrmpf.
>     (BTW, the code I was running wasn't written by me; it was part of
> Phorum, a PHP web posting board application. I'll try to patch it to
> make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).
>
>
>
>                     Paulo Jan.
>                     DDnet.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Re: Index not being used in MAX function (7.2.3)

От

"listrec"

Дата:

11 июня 2003 г., 15:51:20

I think this is a wonderful idea. I often wondered why max(column) makes a
full table scan.

B.T.W.: If your max column is indexed, use

select column from table order by column desc limit 1

which gives you the maximum value in no time at all.

Detlef

-----Ursprungliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Chris Gamache
Gesendet: Mittwoch, 11. Juni 2003 17:44
An: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Index not being used in MAX function (7.2.3)

Wouldn't it make sense to optimize max() and min() for indexed columns? I
don't
know if I'm barking up the wrong tree, but would it be possible to create an
aggregate (o_max, o_min) to make the query planner treat it differently from
other aggregates? IMO, (if possible...) this would be a more elegant
solution
than SQL'ing around the "feature". If it is possible, it might be a nifty
contrib module, poised for inclusion in the production code. Any
takers/thoughts? :)

CG

--- Paulo Jan <admin@digital.ddnet.es> wrote:
> Jonathan Bartlett wrote:
> > Is your index a hash or btree?
> >
> > Jon
> >
>
>
>     It's a btree, but anyway, I see that others have already answered my
> question. So it's a "feature" and not a bug? Hrmpf.
>     (BTW, the code I was running wasn't written by me; it was part of
> Phorum, a PHP web posting board application. I'll try to patch it to
> make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).
>
>
>
>                     Paulo Jan.
>                     DDnet.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

11 июня 2003 г., 15:59:32

On Wed, Jun 11, 2003 at 02:21:00PM +0200, Paulo Jan wrote:
>     It's a btree, but anyway, I see that others have already answered my
> question. So it's a "feature" and not a bug? Hrmpf.
>     (BTW, the code I was running wasn't written by me; it was part of
> Phorum, a PHP web posting board application. I'll try to patch it to
> make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).

Not to drag this out further, but you might want to hold off on that
patch. 7.4 is supposed to use indexes for max/min.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

Tom Lane

Дата:

11 июня 2003 г., 17:03:12

"Jim C. Nasby" <jim@nasby.net> writes:
> Not to drag this out further, but you might want to hold off on that
> patch. 7.4 is supposed to use indexes for max/min.

Where did you get that idea?

There's been no change in the basic problem, which is that no one has
proposed a reasonably general method of translating aggregates into
index manipulations.  Postgres has an extremely general, extensible
concept of aggregates, and we're not going to mess it up with some
poorly-designed hack.  But show me a clean design and implementation,
and it'll go in.

            regards, tom lane

Re: Index not being used in MAX function (7.2.3)

От

"Dann Corbit"

Дата:

11 июня 2003 г., 17:11:47

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, June 11, 2003 10:03 AM
> To: jim@nasby.net
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)
>
>
> "Jim C. Nasby" <jim@nasby.net> writes:
> > Not to drag this out further, but you might want to hold
> off on that
> > patch. 7.4 is supposed to use indexes for max/min.
>
> Where did you get that idea?
>
> There's been no change in the basic problem, which is that no
> one has proposed a reasonably general method of translating
> aggregates into index manipulations.  Postgres has an
> extremely general, extensible concept of aggregates, and
> we're not going to mess it up with some poorly-designed hack.
>  But show me a clean design and implementation, and it'll go in.

Is this a poorly designed hack:

    Select max(expression) from <join> where <filter>

Becomes:

    If (non_hashed_index_exists_on_expression) then
         /* Transform expression to: */
         select (expression) from <join> where <filter> order by
<expression> limit to 1 rows
      else
         do_what_you_are_doing_right_now
      endif
?

Re: Index not being used in MAX function (7.2.3)

От

"Dann Corbit"

Дата:

11 июня 2003 г., 17:15:20

> -----Original Message-----
> From: Dann Corbit
> Sent: Wednesday, June 11, 2003 10:12 AM
> To: Tom Lane; jim@nasby.net
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)
>
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Wednesday, June 11, 2003 10:03 AM
> > To: jim@nasby.net
> > Cc: pgsql-general@postgresql.org
> > Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)
> >
> >
> > "Jim C. Nasby" <jim@nasby.net> writes:
> > > Not to drag this out further, but you might want to hold
> > off on that
> > > patch. 7.4 is supposed to use indexes for max/min.
> >
> > Where did you get that idea?
> >
> > There's been no change in the basic problem, which is that no
> > one has proposed a reasonably general method of translating
> > aggregates into index manipulations.  Postgres has an
> > extremely general, extensible concept of aggregates, and
> > we're not going to mess it up with some poorly-designed hack.
> >  But show me a clean design and implementation, and it'll go in.
>
> Is this a poorly designed hack:
>
>     Select max(expression) from <join> where <filter>
>
> Becomes:
>
>     If (non_hashed_index_exists_on_expression) then
>          /* Transform expression to: */
>          select (expression) from <join> where <filter> order
> by <expression> limit to 1 rows

"Order by <expression> DESC" for max and ASC for min.

>       else
>          do_what_you_are_doing_right_now
>       endif
> ?
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
http://www.postgresql.org/docs/faqs/FAQ.html

Re: Index not being used in MAX function (7.2.3)

От

Tom Lane

Дата:

11 июня 2003 г., 17:27:11

"Dann Corbit" <DCorbit@connx.com> writes:
> Is this a poorly designed hack:
>     Select max(expression) from <join> where <filter>

Well, to start with, any nonempty WHERE probably invalidates the
optimization, and I doubt it works if there's a join rather than a
single table involved.  But you're just handwaving --- the devil is in
the details.  How will you identify which aggregates are MIN and MAX
(no, I don't like the idea of relying on the names; remember we have
an extensible set of aggregates)?  What will you do when there are
multiple aggregates in the command --- or more generally, how complex a
context for the aggregate call are you going to be able to support?
Where exactly is this transformation going to occur in the planning
pipeline, and how many cycles will we waste checking for the possible
transform in cases where it doesn't apply?  Questions like these are
where we've gotten bogged down in the past.  You might care to consult
the pgsql-hackers archives for past discussions.

            regards, tom lane

Re: Index not being used in MAX function (7.2.3)

От

Jonathan Bartlett

Дата:

11 июня 2003 г., 17:44:57

I wonder if a macro system might be warranted - then have max be a macro
instead of an aggregate.  However, I don't know exactly how that would
work since it involves the whole statement.  Anyway, just an idea to
hopefully spur someone else's thinking cap :)

Jon

On Wed, 11 Jun 2003, Tom Lane wrote:

> "Dann Corbit" <DCorbit@connx.com> writes:
> > Is this a poorly designed hack:
> >     Select max(expression) from <join> where <filter>
>
> Well, to start with, any nonempty WHERE probably invalidates the
> optimization, and I doubt it works if there's a join rather than a
> single table involved.  But you're just handwaving --- the devil is in
> the details.  How will you identify which aggregates are MIN and MAX
> (no, I don't like the idea of relying on the names; remember we have
> an extensible set of aggregates)?  What will you do when there are
> multiple aggregates in the command --- or more generally, how complex a
> context for the aggregate call are you going to be able to support?
> Where exactly is this transformation going to occur in the planning
> pipeline, and how many cycles will we waste checking for the possible
> transform in cases where it doesn't apply?  Questions like these are
> where we've gotten bogged down in the past.  You might care to consult
> the pgsql-hackers archives for past discussions.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

11 июня 2003 г., 18:27:44

On Wed, Jun 11, 2003 at 10:44:22 -0700,
  Jonathan Bartlett <johnnyb@eskimo.com> wrote:
> I wonder if a macro system might be warranted - then have max be a macro
> instead of an aggregate.  However, I don't know exactly how that would
> work since it involves the whole statement.  Anyway, just an idea to
> hopefully spur someone else's thinking cap :)

I don't think that would work. There are going to be some cases
where the aggregate is better than the subselect (and not just when
there isn't an appropiate index). And in some cases distinct on order by
may be the best way to get what you want.

Re: Index not being used in MAX function (7.2.3)

От

"Dann Corbit"

Дата:

11 июня 2003 г., 18:42:42

> -----Original Message-----
> From: Bruno Wolff III [mailto:bruno@wolff.to]
> Sent: Wednesday, June 11, 2003 11:30 AM
> To: Jonathan Bartlett
> Cc: Tom Lane; Dann Corbit; jim@nasby.net; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)
>
>
> On Wed, Jun 11, 2003 at 10:44:22 -0700,
>   Jonathan Bartlett <johnnyb@eskimo.com> wrote:
> > I wonder if a macro system might be warranted - then have max be a
> > macro instead of an aggregate.  However, I don't know
> exactly how that
> > would work since it involves the whole statement.  Anyway, just an
> > idea to hopefully spur someone else's thinking cap :)
>
> I don't think that would work. There are going to be some
> cases where the aggregate is better than the subselect (and
> not just when there isn't an appropiate index). And in some
> cases distinct on order by may be the best way to get what you want.

Isn't that the optimizer's job to figure out?  The whole idea of SQL is
to abstract the queries and allow the optimizer to make all the smart
choices about plans and stuff.

I do realize that it is very "non-trivial" to implement, but min() and
max() are used so often it seems it might be useful.

Here are some "free to use" templates for statistical functions:
ftp://cap.connx.com/tournament_software/Kahan.Hpp
ftp://cap.connx.com/tournament_software/STATS.HPP

The Kahan template is an extremely accurate adder (does not lose
precision like direct summation).
The Stats template (which uses the Kahan adder) does all sorts of things
like skew, kurtosis, min, max, stddev, average, count, sum etc. all
simultaneously.
Our product uses a similar template to produce all kinds of useful
statistical information.  See:
http://www.connx.com/products/connx/Connx%208.8%20UserGuide/connxcdd32.h
tm
And look at the statistical functions book.

No, we don't do the optimization I have suggested for min/max, but I
hope to poke it into our tool set some day.  However, we do have a
function called "sortfirst()" and a function called "sortlast() " both
of which do perform the suggested optimizations [when possible].

Perhaps PostgreSQL could do something similar.

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

11 июня 2003 г., 19:46:20

On Wed, Jun 11, 2003 at 11:41:26 -0700,
  Dann Corbit <DCorbit@connx.com> wrote:
>
> Isn't that the optimizer's job to figure out?  The whole idea of SQL is
> to abstract the queries and allow the optimizer to make all the smart
> choices about plans and stuff.

In theory yes. My comment was specifically addressing the idea of using
a macro. I don't think this would work because subselects aren't
always the best way to do this.

There are other areas where how you write queries makes a big difference
in performance even though the queries are logically equivalent.

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

11 июня 2003 г., 20:08:52

On Wed, Jun 11, 2003 at 11:59:36 -0700,
  Dennis Gearon <gearond@cvc.net> wrote:
> I guess the question is, are other big iron data bases using indexes on
> MAX/MIN functions, and how are they doing it?

It is easier for them to do it because they don't have to worry about
a function named max not really being a maximum.

Re: Index not being used in MAX function (7.2.3)

От

Ang Chin Han

Дата:

12 июня 2003 г., 04:02:38

Tom Lane wrote:

> There's been no change in the basic problem, which is that no one has
> proposed a reasonably general method of translating aggregates into
> index manipulations.  Postgres has an extremely general, extensible
> concept of aggregates, and we're not going to mess it up with some
> poorly-designed hack.  But show me a clean design and implementation,
> and it'll go in.

Just a quick idea, in CREATE AGGREGATE, add optional parameters of:

1. ORDER BY ASC|DESC|USING operator
2. LIMIT {count}

And modify INITCOND param to:
    INITCOND = COUNT | initial_condition
where INITCOND = COUNT forces pgsql to get all row counts into INITCOND
first before calling sfunc or ffunc.

Still ugly, but things might be able to be generallized to:

-- returnme is a function that returns its parameter.
Or else make SFUNC optional and would by default return its param.

CREATE AGGREGATE max
   (BASETYPE=int, SFUNC=returnme, STYPE=int, INITCOND=NULL,
   ORDER BY DESC LIMIT 1);

CREATE AGGREGATE min
   (BASETYPE=int, SFUNC=returnme, STYPE=int, INITCOND=NULL,
   ORDER BY ASC LIMIT 1);

CREATE AGGREGATE count
   (BASETYPE=int, SFUNC=returnme, STYPE=int, INITCOND=COUNT,
   LIMIT 0);

The implementation would be:

SELECT min(foo) FROM bar
translates to:
SELECT (SELECT sfunc FROM bar ORDER BY foo ASC LIMIT 1) as min;
(or similar, if you get the idea)

SELECT baz, min(foo) FROM bar GROUP BY baz ORDER BY baz;
translates to:
SELECT
   baz,
   (SELECT sfunc FROM bar
      WHERE baz = highlevel_bar.baz
      ORDER BY foo ASC LIMIT 1) as min
   FROM bar
   ORDER BY baz;
-- Hoping that the subselect would automagically use an index if it
-- exists, like normal queries.

SELECT baz, count(*) FROM bar GROUP BY baz ORDER BY baz;
translates to:
SELECT
   baz,
   (SELECT __COUNT__ FROM bar
     WHERE baz = highlevel_bar.baz) as count
   FROM bar
   ORDER BY baz;

Note how the GROUP BY gets pushed into the subselect as a WHERE
condition, possibly allowing generic optimization of SELECT count(*).

Lots of hand waving in parts, but I hope I got the idea across. Can't
tell how much work is it to do without in depth knowledge of pgsql
internals, though. :(

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
  11:00am  up 168 days,  1:56,  9 users,  load average: 5.08, 5.05, 5.04

Вложения

msg-32091-61182.dat

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

12 июня 2003 г., 22:01:04

On Wed, Jun 11, 2003 at 01:02:57PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > Not to drag this out further, but you might want to hold off on that
> > patch. 7.4 is supposed to use indexes for max/min.
>
> Where did you get that idea?

I remember seeing something about someone committing the min/max to
ORDER BY/LIMIT transform, but obviously I must have been smoking
something. My bad.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

12 июня 2003 г., 22:17:30

Am I just being naive, or couldn't this be solved by adding min/max
boolean flags to pg_aggregates and the appropriate syntax to CREATE
AGGREGATE? That would just leave the simple matter of the index scanning
code </sarcasm>.

BTW, I recently tried to do something like this...

SELECT key, blah, foo, bar, scoring_function(blah) AS score INTO TEMP t1 FROM blah;
SELECT key, blah, foo, bar
    INTO TEMP info_for_max_scoring_entry_for_each_key
    FROM t1
    WHERE t1.score = (SELECT score FROM t1 AS inner_t1 WHERE
    inner_t1.key = t1.key ORDER BY score DESC LIMIT 1)
;

The performance was horrid. I ended up building a middle table using
SELECT key, max(score) INTO TEMP t2 FROM t1 GROUP BY key;

and joining with that to build the final table I wanted. So it seems the
ORDER/LIMIT hack doesn't work well at all except in limited situations.

On Wed, Jun 11, 2003 at 03:11:26PM -0500, Bruno Wolff III wrote:
> On Wed, Jun 11, 2003 at 11:59:36 -0700,
>   Dennis Gearon <gearond@cvc.net> wrote:
> > I guess the question is, are other big iron data bases using indexes on
> > MAX/MIN functions, and how are they doing it?
>
> It is easier for them to do it because they don't have to worry about
> a function named max not really being a maximum.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

12 июня 2003 г., 22:26:23

On Thu, Jun 12, 2003 at 12:08:15PM +0800, Ang Chin Han wrote:
> Just a quick idea, in CREATE AGGREGATE, add optional parameters of:
>
> 1. ORDER BY ASC|DESC|USING operator
> 2. LIMIT {count}

IMHO, I don't think it's right to focus on the ORDER BY/LIMIT hack. The
real issue here is that the best way to find a min is to grab the first
tuple in the index (granted, a bit tricker in pgsql due to MVCC), and
the best way to find a max is to grab the last tuple in the index. And
this extends beyond the simplest of min/max examples. For instance, this
technique should be used to solve grouped aggregates (assuming a
suitable index, of course), the only difference is that you don't use
the first/last tuple, you use the first/last one that matches your other
criteria.

I haven't read the source, but it seems to me what's lacking is the
ability to scan indexes in order to do this.

This becomes really important whenever pgsql gains the ability to use
multiple indexes per table (someone smack me if it can do this now and I
don't realize it), because then you could do something like

SELECT min(a), max(b), min(c), min(d)

and the query would be blazing fast if you had the right indexes on all
4 fields.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

Tom Lane

Дата:

12 июня 2003 г., 22:35:26

"Jim C. Nasby" <jim@nasby.net> writes:
> Am I just being naive, or couldn't this be solved by adding min/max
> boolean flags to pg_aggregates and the appropriate syntax to CREATE
> AGGREGATE?

I'd prefer to see a direct link to the associated sort operator
('<' for MIN or '>' for MAX).  But yeah, some addition to the system
catalogs seems essential if you don't want the code to be full of
unsupportable assumptions about aggregate and operator names.

> So it seems the ORDER/LIMIT hack doesn't work well at all except in
> limited situations.

No kidding.  This is one reason there hasn't been a huge push to get it
implemented: the actual usefulness of the hack is quite limited.  In
your example, I suspect the presence of the unrelated WHERE clause is
what makes it unhelpful.

            regards, tom lane

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

13 июня 2003 г., 15:22:34

On Thu, Jun 12, 2003 at 17:17:19 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
> Am I just being naive, or couldn't this be solved by adding min/max
> boolean flags to pg_aggregates and the appropriate syntax to CREATE
> AGGREGATE? That would just leave the simple matter of the index scanning
> code </sarcasm>.

There are other potential aggregates of this type. You would be better
off adding an operator (class?) than a flag.

>
> BTW, I recently tried to do something like this...
>
> SELECT key, blah, foo, bar, scoring_function(blah) AS score INTO TEMP t1 FROM blah;
> SELECT key, blah, foo, bar
>     INTO TEMP info_for_max_scoring_entry_for_each_key
>     FROM t1
>     WHERE t1.score = (SELECT score FROM t1 AS inner_t1 WHERE
>     inner_t1.key = t1.key ORDER BY score DESC LIMIT 1)
> ;
>
> The performance was horrid. I ended up building a middle table using
> SELECT key, max(score) INTO TEMP t2 FROM t1 GROUP BY key;
>
> and joining with that to build the final table I wanted. So it seems the
> ORDER/LIMIT hack doesn't work well at all except in limited situations.

Unless there was a combined index on key and score I would expect
the form you ended up using to be the fastest way to do it. With
a combined index, distinct on would probably be a bit faster (epsecially
if there were lots of values with the same key).

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

13 июня 2003 г., 15:35:09

On Thu, Jun 12, 2003 at 17:26:11 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
> On Thu, Jun 12, 2003 at 12:08:15PM +0800, Ang Chin Han wrote:
>
> This becomes really important whenever pgsql gains the ability to use
> multiple indexes per table (someone smack me if it can do this now and I

SMACK!

> don't realize it), because then you could do something like
>
> SELECT min(a), max(b), min(c), min(d)
>
> and the query would be blazing fast if you had the right indexes on all
> 4 fields.

And you can speed this up now using subselects.

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

13 июня 2003 г., 20:12:51

On Fri, Jun 13, 2003 at 10:37:46AM -0500, Bruno Wolff III wrote:
> On Thu, Jun 12, 2003 at 17:26:11 -0500,
>   "Jim C. Nasby" <jim@nasby.net> wrote:
> > On Thu, Jun 12, 2003 at 12:08:15PM +0800, Ang Chin Han wrote:
> >
> > This becomes really important whenever pgsql gains the ability to use
> > multiple indexes per table (someone smack me if it can do this now and I
>
> SMACK!

You can use multiple indexes per query plan node? IE: if you have
indexes on a and b, you can use both indexes when you do WHERE a=blah
and b=foo? That's what I was reffering to...
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

"scott.marlowe"

Дата:

13 июня 2003 г., 20:34:16

On Fri, 13 Jun 2003, Jim C. Nasby wrote:

> On Fri, Jun 13, 2003 at 10:37:46AM -0500, Bruno Wolff III wrote:
> > On Thu, Jun 12, 2003 at 17:26:11 -0500,
> >   "Jim C. Nasby" <jim@nasby.net> wrote:
> > > On Thu, Jun 12, 2003 at 12:08:15PM +0800, Ang Chin Han wrote:
> > >
> > > This becomes really important whenever pgsql gains the ability to use
> > > multiple indexes per table (someone smack me if it can do this now and I
> >
> > SMACK!
>
> You can use multiple indexes per query plan node? IE: if you have
> indexes on a and b, you can use both indexes when you do WHERE a=blah
> and b=foo? That's what I was reffering to...

In that instance you'd be better off with a multiple key index.

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

13 июня 2003 г., 20:55:37

On Fri, Jun 13, 2003 at 15:12:39 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
> On Fri, Jun 13, 2003 at 10:37:46AM -0500, Bruno Wolff III wrote:
> > On Thu, Jun 12, 2003 at 17:26:11 -0500,
> >   "Jim C. Nasby" <jim@nasby.net> wrote:
> > > On Thu, Jun 12, 2003 at 12:08:15PM +0800, Ang Chin Han wrote:
> > >
> > > This becomes really important whenever pgsql gains the ability to use
> > > multiple indexes per table (someone smack me if it can do this now and I
> >
> > SMACK!
>
> You can use multiple indexes per query plan node? IE: if you have
> indexes on a and b, you can use both indexes when you do WHERE a=blah
> and b=foo? That's what I was reffering to...

I think what you are describing doesn't make much sense. On any scan
you are only going to using one index (possibly over several columns).
But while processing a query you might be making use of several indexes
on a table. For example the following query should would fast if
appropiate indexes exist:
select (select col1 from tab order by col1),
  (select col2 from tab order by col2)

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

13 июня 2003 г., 22:37:01

On Fri, Jun 13, 2003 at 03:58:13PM -0500, Bruno Wolff III wrote:
> > You can use multiple indexes per query plan node? IE: if you have
> > indexes on a and b, you can use both indexes when you do WHERE a=blah
> > and b=foo? That's what I was reffering to...
>
> I think what you are describing doesn't make much sense. On any scan
> you are only going to using one index (possibly over several columns).

I believe MSSQL and Oracle support it; they scan the indexes then grab
the appropriate set of matching tuple addresses. The advantage you get
from this is you can get close to multi-key index performance without
using multi-key indexes. This is useful when you need to do lookups on
two different fields in a table, and also need to do lookups on both
fields. IE:

select ... where a=foo;
select ... where b=bar;
select ... where a=blah and b=baz;

I was strictly looking to the future when pgsql migth eventually support
this. :)
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

14 июня 2003 г., 13:24:45

On Fri, Jun 13, 2003 at 17:36:49 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
>
> I believe MSSQL and Oracle support it; they scan the indexes then grab
> the appropriate set of matching tuple addresses. The advantage you get
> from this is you can get close to multi-key index performance without
> using multi-key indexes. This is useful when you need to do lookups on
> two different fields in a table, and also need to do lookups on both
> fields. IE:
>
> select ... where a=foo;
> select ... where b=bar;
> select ... where a=blah and b=baz;
>
> I was strictly looking to the future when pgsql migth eventually support
> this. :)

I don't see how this would be useful in any of these examples. The first
two seem to be ones where one index scan would work. The third would
be handled by postgres using an index scan and a filter (assuming no
multikey index was available). I can't think of a circumtances where
doing two index scans and then joining the rows obtained from each
scan would be faster than the way postgres does it. If there was an 'or'
instead of an 'and' then unioning the two sets of results from index
scans would make sense. I tried a quick test of this and saw postgres
using a seq scan with a filter, but I might not have data with the
right set of properties to make this work. In theory you could do
a plain index scan for one half of the or and an index scan with a filter
(to remove what would be duplicates) on the other half of the or clause.
This would be a win in many common cases. I tried some stuff to see
what postgres does and it looks like it has a way to search two indexes
at once.

The following commands:

select version();

explain analyze
  select gameid from crate where areaid = 'JL005' or rate = 5034;

explain analyze
  (select gameid from crate where areaid = 'JL005') union all
  (select gameid from crate where rate = 5034 and areaid <> 'JL005');

generated the following output:

                                version
------------------------------------------------------------------------
 PostgreSQL 7.4devel on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Index Scan using crate_pkey, "temp" on crate  (cost=0.00..151.34 rows=38 width=7) (actual time=0.09..0.72 rows=72
loops=1)
   Index Cond: ((areaid = 'JL005'::text) OR (rate = 5034))
 Total runtime: 0.85 msec
(3 rows)

                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..151.32 rows=39 width=7) (actual time=0.08..0.73 rows=72 loops=1)
   ->  Subquery Scan "*SELECT* 1"  (cost=0.00..39.88 rows=10 width=7) (actual time=0.07..0.10 rows=2 loops=1)
         ->  Index Scan using crate_pkey on crate  (cost=0.00..39.88 rows=10 width=7) (actual time=0.07..0.09 rows=2
loops=1)
               Index Cond: (areaid = 'JL005'::text)
   ->  Subquery Scan "*SELECT* 2"  (cost=0.00..111.43 rows=29 width=7) (actual time=0.03..0.57 rows=70 loops=1)
         ->  Index Scan using "temp" on crate  (cost=0.00..111.43 rows=29 width=7) (actual time=0.03..0.44 rows=70
loops=1)
               Index Cond: (rate = 5034)
               Filter: (areaid <> 'JL005'::text)
 Total runtime: 0.96 msec
(9 rows)

The select with the union was an attempt to force the results of two index
scans to be combined. But if you look at the results of teh plan for the
simpler query you will see that postgres is doing an index scan with
an 'or' condition which suggests that it is doing pretty much the same thing
as the more complicated query, but more efficiently.

P.S.
In my example yesterday there were suppposed to be limit 1 clauses
in both subselects as the were supposed to be the equivalent of min
functions.

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

14 июня 2003 г., 15:23:33

On Sat, Jun 14, 2003 at 08:27:20AM -0500, Bruno Wolff III wrote:
> I don't see how this would be useful in any of these examples. The first
> two seem to be ones where one index scan would work. The third would
> be handled by postgres using an index scan and a filter (assuming no
> multikey index was available). I can't think of a circumtances where
> doing two index scans and then joining the rows obtained from each
> scan would be faster than the way postgres does it. If there was an 'or'

It would be useful if we didn't have to immediately consider MVCC info,
which requires hitting the tupples. Indexes are usually much narrower
than tables, even with metadata included, so you can scan two indexes
and and/or the results faster than scanning one index, then hitting all
the tuples.

I believe this could be done right now in pgsql, though I'm not sure if
it would be as useful since you'll still have to read all the tupples to
obtain the MVCC info (this is why I hope the MVCC info can eventually be
moved out of mainline storage).

> instead of an 'and' then unioning the two sets of results from index
> scans would make sense. I tried a quick test of this and saw postgres
> using a seq scan with a filter, but I might not have data with the
> right set of properties to make this work. In theory you could do
> a plain index scan for one half of the or and an index scan with a filter
> (to remove what would be duplicates) on the other half of the or clause.
> This would be a win in many common cases. I tried some stuff to see
> what postgres does and it looks like it has a way to search two indexes
> at once.
>
> The following commands:
>
> select version();
>
> explain analyze
>   select gameid from crate where areaid = 'JL005' or rate = 5034;
>
> explain analyze
>   (select gameid from crate where areaid = 'JL005') union all
>   (select gameid from crate where rate = 5034 and areaid <> 'JL005');
>
> generated the following output:
>
>                                 version
> ------------------------------------------------------------------------
>  PostgreSQL 7.4devel on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
> (1 row)
>
>                                                          QUERY PLAN
      
>
----------------------------------------------------------------------------------------------------------------------------
>  Index Scan using crate_pkey, "temp" on crate  (cost=0.00..151.34 rows=38 width=7) (actual time=0.09..0.72 rows=72
loops=1)
>    Index Cond: ((areaid = 'JL005'::text) OR (rate = 5034))
>  Total runtime: 0.85 msec
> (3 rows)
>
>                                                           QUERY PLAN
        
>
------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=0.00..151.32 rows=39 width=7) (actual time=0.08..0.73 rows=72 loops=1)
>    ->  Subquery Scan "*SELECT* 1"  (cost=0.00..39.88 rows=10 width=7) (actual time=0.07..0.10 rows=2 loops=1)
>          ->  Index Scan using crate_pkey on crate  (cost=0.00..39.88 rows=10 width=7) (actual time=0.07..0.09 rows=2
loops=1)
>                Index Cond: (areaid = 'JL005'::text)
>    ->  Subquery Scan "*SELECT* 2"  (cost=0.00..111.43 rows=29 width=7) (actual time=0.03..0.57 rows=70 loops=1)
>          ->  Index Scan using "temp" on crate  (cost=0.00..111.43 rows=29 width=7) (actual time=0.03..0.44 rows=70
loops=1)
>                Index Cond: (rate = 5034)
>                Filter: (areaid <> 'JL005'::text)
>  Total runtime: 0.96 msec
> (9 rows)
>
> The select with the union was an attempt to force the results of two index
> scans to be combined. But if you look at the results of teh plan for the
> simpler query you will see that postgres is doing an index scan with
> an 'or' condition which suggests that it is doing pretty much the same thing
> as the more complicated query, but more efficiently.

What indexes are on crate?
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Index not being used in MAX function (7.2.3)

От

Bruno Wolff III

Дата:

14 июня 2003 г., 15:38:21

On Sat, Jun 14, 2003 at 10:23:20 -0500,
  "Jim C. Nasby" <jim@nasby.net> wrote:
>
> It would be useful if we didn't have to immediately consider MVCC info,
> which requires hitting the tupples. Indexes are usually much narrower
> than tables, even with metadata included, so you can scan two indexes
> and and/or the results faster than scanning one index, then hitting all
> the tuples.

I don't think you are likely to see much gain from this as scanning
two indexes instead of one is likely to cost about as much as scanning
an index and looking at the tupples to see if they match the other
condition. Also if you look at both indexes you have to do a join
to connect the tuples back together and there is going to be some
cost to that.
>
> What indexes are on crate?

I had indexes on areaid, game and game, areaid. I created one of crate
for the test. I don't normally search on crate. I do do orders on it,
but generally after joins so an index isn't a great help. The table only
has about 16000 rows in it, so it is a pretty small sample.

I would actually be interested in hearing a comment from someone who
knows more just how a index condition with an OR is handled when the
two indexes aren't part of the same multicolumn index.

Re: Index not being used in MAX function (7.2.3)

От

Tom Lane

Дата:

14 июня 2003 г., 17:18:35

Bruno Wolff III <bruno@wolff.to> writes:
> I don't think you are likely to see much gain from this as scanning
> two indexes instead of one is likely to cost about as much as scanning
> an index and looking at the tupples to see if they match the other
> condition.

We have speculated about this in the past.  Several of the big
commercial DBs can do it, so it seems that at least some people find
it valuable.

I'd be inclined to look at it in combination with decoupling heap and
index scan order: that is, you traverse the index and make a list of
heap tuple TIDs that the index says to visit, then you visit those
tuples in heap storage order.  This gets rid of a lot of the
random-access overhead of the present indexscanning scheme, at the cost
of not producing sorted-by-the-indexkey output (but presumably the
planner could choose whether to do it this way or the old way).

The way this fits with multiple indexes is that you could gather tuple
TIDs from several indexes and intersect or union the lists before you
visit the heap.  I believe the common way to do this is to represent
the TID lists as sparse bitmaps and AND or OR the bitmaps.

> I would actually be interested in hearing a comment from someone who
> knows more just how a index condition with an OR is handled when the
> two indexes aren't part of the same multicolumn index.

Right now, the way we handle it is to test each tuple retrieved by an
indexscan against the conditions associated with the previous indexscans
(ie, the earlier OR terms), so that we can detect whether we already
returned the tuple in a previous scan.  This works but it's not
especially efficient, because if the OR terms overlap then each tuple in
the overlap is fetched multiple times.  Plus you have all that
computation done to recheck the conditions.

            regards, tom lane

Re: Index not being used in MAX function (7.2.3)

От

"Jim C. Nasby"

Дата:

16 июня 2003 г., 06:12:45

On Sat, Jun 14, 2003 at 01:18:15PM -0400, Tom Lane wrote:
> Bruno Wolff III <bruno@wolff.to> writes:
> > I don't think you are likely to see much gain from this as scanning
> > two indexes instead of one is likely to cost about as much as scanning
> > an index and looking at the tupples to see if they match the other
> > condition.
>
> We have speculated about this in the past.  Several of the big
> commercial DBs can do it, so it seems that at least some people find
> it valuable.
>
> I'd be inclined to look at it in combination with decoupling heap and
> index scan order: that is, you traverse the index and make a list of
> heap tuple TIDs that the index says to visit, then you visit those
> tuples in heap storage order.  This gets rid of a lot of the
> random-access overhead of the present indexscanning scheme, at the cost
> of not producing sorted-by-the-indexkey output (but presumably the
> planner could choose whether to do it this way or the old way).
>
> The way this fits with multiple indexes is that you could gather tuple
> TIDs from several indexes and intersect or union the lists before you
> visit the heap.  I believe the common way to do this is to represent
> the TID lists as sparse bitmaps and AND or OR the bitmaps.

That's my understanding as well. As to how useful it is, you just have
to consider the case of a very wide table and being able to narrow
things down as much as possible before hitting the table itself. If TID
is 6 bytes and you have 2 indexes that are each on int fields, that's 10
bytes per row, per index. Compare that to a table that's 100 bytes wide
or more and it's very easy to come out way ahead on the index side.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Index not being used in MAX function (7.2.3)

Вложения