Re: Q: will GROUP BY make use of an index to return tuples early?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Q: will GROUP BY make use of an index to return tuples early?
Дата
Msg-id 6664.1023649026@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Q: will GROUP BY make use of an index to return tuples early?  (Gunther Schadow <gunther@aurora.regenstrief.org>)
Список pgsql-sql
Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> SELECT id, MIN(foo.time)
>    FROM Foo foo
>   GROUP BY foo.id;
> can one tell the query executor to do an index scan on Foo_id_idx

That is a considered plan type.  For example, using the regression
test database:

regression=# set enable_seqscan TO 0;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
                            QUERY PLAN                                                                  
 

---------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=0.00..1121.89 rows=1000 width=8) (actual time=19.12..1619.71 rows=100 loops=1)  ->  Group  (cost=0.00..1096.89
rows=10000width=8) (actual time=0.56..1484.14 rows=10000 loops=1)        ->  Index Scan using tenk1_hundred on tenk1
(cost=0.00..1071.89rows=10000 width=8) (actual time=0.53..1194.64 rows=10000 loops=1)Total runtime: 1621.69 msec
 
(4 rows)
regression=# set enable_seqscan TO 1;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
                   QUERY PLAN                                                         
 

---------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=997.39..1072.39 rows=1000 width=8) (actual time=1162.52..1677.46 rows=100 loops=1)  ->  Group
(cost=997.39..1047.39rows=10000 width=8) (actual time=1157.33..1565.75 rows=10000 loops=1)        ->  Sort
(cost=997.39..1022.39rows=10000 width=8) (actual time=1157.29..1298.10 rows=10000 loops=1)              Sort Key:
hundred             ->  Seq Scan on tenk1  (cost=0.00..333.00 rows=10000 width=8) (actual time=0.16..277.69 rows=10000
loops=1)Totalruntime: 1695.83 msec
 
(6 rows)

The sort-based plan usually comes out ahead in estimated total cost, but
if you were using a cursor to read it then I believe the indexscan plan
would get chosen due to its much lower startup cost.

> If the answer is yes, there is a way, then how about if we do this:
> CREATE INDEX Foo_id_time_idx ON Foo(id, time);
> now, considering the same query, it could be executed even faster,
> because we could do an index scan on Foo_id_time_idx and only need
> to consider the first data tuple of every Foo.id group (because
> the ordering of Foo_id_time_idx guarantees that the MIN(time) is
> in the first tuple.

This won't happen because (a) the optimizer treats aggregates as black
boxes; there is no notion that some aggregates have a connection to the
sort ordering of some indexes; (b) the indexscan would still have to
pass over the other time values in each group.
        regards, tom lane


В списке pgsql-sql по дате отправления:

Предыдущее
От: "Ian Cass"
Дата:
Сообщение: Re: select failure
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Support for distributed queries with semijoins could be possible, if ...