Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> SELECT id, MIN(foo.time)
> FROM Foo foo
> GROUP BY foo.id;
> can one tell the query executor to do an index scan on Foo_id_idx
That is a considered plan type. For example, using the regression
test database:
regression=# set enable_seqscan TO 0;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=0.00..1121.89 rows=1000 width=8) (actual time=19.12..1619.71 rows=100 loops=1) -> Group (cost=0.00..1096.89
rows=10000width=8) (actual time=0.56..1484.14 rows=10000 loops=1) -> Index Scan using tenk1_hundred on tenk1
(cost=0.00..1071.89rows=10000 width=8) (actual time=0.53..1194.64 rows=10000 loops=1)Total runtime: 1621.69 msec
(4 rows)
regression=# set enable_seqscan TO 1;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=997.39..1072.39 rows=1000 width=8) (actual time=1162.52..1677.46 rows=100 loops=1) -> Group
(cost=997.39..1047.39rows=10000 width=8) (actual time=1157.33..1565.75 rows=10000 loops=1) -> Sort
(cost=997.39..1022.39rows=10000 width=8) (actual time=1157.29..1298.10 rows=10000 loops=1) Sort Key:
hundred -> Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=8) (actual time=0.16..277.69 rows=10000
loops=1)Totalruntime: 1695.83 msec
(6 rows)
The sort-based plan usually comes out ahead in estimated total cost, but
if you were using a cursor to read it then I believe the indexscan plan
would get chosen due to its much lower startup cost.
> If the answer is yes, there is a way, then how about if we do this:
> CREATE INDEX Foo_id_time_idx ON Foo(id, time);
> now, considering the same query, it could be executed even faster,
> because we could do an index scan on Foo_id_time_idx and only need
> to consider the first data tuple of every Foo.id group (because
> the ordering of Foo_id_time_idx guarantees that the MIN(time) is
> in the first tuple.
This won't happen because (a) the optimizer treats aggregates as black
boxes; there is no notion that some aggregates have a connection to the
sort ordering of some indexes; (b) the indexscan would still have to
pass over the other time values in each group.
regards, tom lane