Обсуждение: Performance of MIN() and MAX()
PGSQL 6.5.0, FreeBSD 3.2, Intel Pentium II 366MHz, 128 MB
The table below was filled with about 1,000,000 records. Then a bunch of
indexes was created and a VACUUM executed.
I was under impression that when max(<primary key>) is called, it should
just take the value from the index. I believe it should not do any kind of
scan. But, in fact, it scans the table.
select max(id) from ItemBars
takes well over 10 seconds to complete. It's almost instantaneous on MSSQL
or on Interbase. Something is clearly wrong. MAX() on the primary key should
not take so much time. Am I doing something wrong? Is it a known bug? If
it's a bug then it's a show stopper for us. How can I help fixing it?
CREATE TABLE ItemBars (       ID              SERIAL          PRIMARY KEY     ,       ItemID          INT
NOTNULL        ,       Interv          INT             NOT NULL        ,       StaTS           DATETIME        NOT NULL
      ,       EndTS           DATETIME        NOT NULL        ,       IsActive        BOOL            NOT NULL        ,
     Opn             FLOAT(7)        NOT NULL        ,       High            FLOAT(7)        NOT NULL        ,
Low            FLOAT(7)        NOT NULL        ,       Cls             FLOAT(7)        NOT NULL        ,       Vol
      INT             NOT NULL
 
);
Gene Sokolov.
			
		"Gene Sokolov" <hook@aktrad.ru> writes:
> I was under impression that when max(<primary key>) is called, it should
> just take the value from the index. I believe it should not do any kind of
> scan. But, in fact, it scans the table.
You are mistaken.  Postgres has no idea that min() and max() have any
semantics that have anything to do with indexes.  I would like to see
that optimization myself, but it's not a particularly easy thing to add
given the system structure and the emphasis on datatype extensibility.
> it's a show stopper for us.
You might be able to hack around the issue with queries like
SELECT x FROM table ORDER BY x LIMIT 1;
SELECT x FROM table ORDER BY x DESC LIMIT 1;
to get the min and max respectively.  The current 6.6 code will
implement these with indexscans, although I think 6.5 would not
unless given an additional cue, like a WHERE clause involving x...
        regards, tom lane
			
		From: Tom Lane <tgl@sss.pgh.pa.us>
> > I was under impression that when max(<primary key>) is called, it should
> > just take the value from the index. I believe it should not do any kind
of
> > scan. But, in fact, it scans the table.
>
> You are mistaken.  Postgres has no idea that min() and max() have any
> semantics that have anything to do with indexes.  I would like to see
> that optimization myself, but it's not a particularly easy thing to add
> given the system structure and the emphasis on datatype extensibility.
>
> > it's a show stopper for us.
>
> You might be able to hack around the issue with queries like
>
> SELECT x FROM table ORDER BY x LIMIT 1;
> SELECT x FROM table ORDER BY x DESC LIMIT 1;
It is a real show stopper. No luck completely, the indexes are ignored:
*************************************************************
[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
bars=> create index bars_id on itemsbars(id);
CREATE
bars=> explain select id from itemsbars order by id limit 1;
NOTICE:  QUERY PLAN:
Sort  (cost=44404.41 rows=969073 width=4) ->  Seq Scan on itemsbars  (cost=44404.41 rows=969073 width=4)
EXPLAIN
bars=> \d itemsbars
Table    = itemsbars
+--------------------+----------------------------------+-------+
|       Field        |              Type                | Length|
+--------------------+----------------------------------+-------+
| id                 | int4 not null default nextval('" |     4 |
| itemid             | int4 not null                    |     4 |
| interv             | int4 not null                    |     4 |
| stats              | datetime not null                |     8 |
| endts              | datetime not null                |     8 |
| isactive           | bool not null                    |     1 |
| opn                | float8 not null                  |     8 |
| high               | float8 not null                  |     8 |
| low                | float8 not null                  |     8 |
| cls                | float8 not null                  |     8 |
| vol                | int4 not null                    |     4 |
+--------------------+----------------------------------+-------+
Indices:  bars_complex2         bars_endts         bars_id         bars_interv         bars_itemid         bars_stats
     itemsbars_pkey
 
			
		"Gene Sokolov" <hook@aktrad.ru> writes:
> From: Tom Lane <tgl@sss.pgh.pa.us>
>> You might be able to hack around the issue with queries like
>> SELECT x FROM table ORDER BY x LIMIT 1;
>> SELECT x FROM table ORDER BY x DESC LIMIT 1;
> It is a real show stopper. No luck completely, the indexes are ignored:
> bars=> explain select id from itemsbars order by id limit 1;
> NOTICE:  QUERY PLAN:
> Sort  (cost=44404.41 rows=969073 width=4)
>   -> Seq Scan on itemsbars  (cost=44404.41 rows=969073 width=4)
Yes, you missed my comment that 6.5.* needs some help or it won't
consider an index scan at all.  This gives the right sort of plan:
regression=> explain select id from itemsbars where id > 0 order by id limit 1;
NOTICE:  QUERY PLAN:
Index Scan using itemsbars_id_key on itemsbars  (cost=21.67 rows=334 width=4)
The WHERE clause can be chosen so that it won't actually exclude
anything, but there has to be a WHERE clause that looks useful with
an index or an indexscan plan won't even get generated.  (Also,
the DESC case doesn't work in 6.5.*, unless you apply the backwards-
index-scan patch that Hiroshi posted a few weeks ago.)
This is fixed in current sources, BTW:
regression=> explain select id from itemsbars order by id limit 1;
NOTICE:  QUERY PLAN:
Index Scan using bars_id on itemsbars  (cost=62.00 rows=1000 width=4)
regression=> explain select id from itemsbars order by id desc ;
NOTICE:  QUERY PLAN:
Index Scan Backward using bars_id on itemsbars  (cost=62.00 rows=1000 width=4)
although we still need to do some rejiggering of the cost estimation
rules; current sources are probably *too* eager to use an indexscan.
        regards, tom lane
			
		
Tom Lane wrote in message <16225.937319132@sss.pgh.pa.us>...
>although we still need to do some rejiggering of the cost estimation
>rules; current sources are probably *too* eager to use an indexscan.
>
   I did some testing today on a 1.6 million row table of random integers
in the range of 0..32767.  Using explain I could get a "select max(f1)..."
down to a cost of about 30000 using a where clause of "f1 > 0"...
   After running the queries I achieved the following results (dual P133,
w/ 128 megs ram, IDE)...
   select max(f1) from t1 [68 seconds] [explain cost 60644.00]   select max(f1) from t1 where f1 > 0 [148 seconds]
[explaincost
 
30416.67]
   Knowing my data does have at least one value above 30000 I can apply a
better heuristic other than f1 > 0
   select max(f1) from t1 where f1 > 30000 [12.43 seconds] [explain cost
30416.67]
   Until more agg. function optimizations are implemented, programmers
might have to use the old melon to speed things up.
           Damond