Обсуждение: agg/order-by question
Consider the explain for the following queries .. sample=# explain select a, count(*) from foo group by a order by a; QUERY PLAN -------------------------------------------------------------------------Aggregate (cost=69.83..77.33 rows=100 width=4) -> Group (cost=69.83..74.83 rows=1000 width=4) -> Sort (cost=69.83..72.33 rows=1000 width=4) Sort Key: a -> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) (5 rows) sample=# explain select a, count(*) from foo group by a order by a desc; QUERY PLAN -------------------------------------------------------------------------------Sort (cost=80.65..80.90 rows=100 width=4) Sort Key: a -> Aggregate (cost=69.83..77.33 rows=100 width=4) -> Group (cost=69.83..74.83 rows=1000width=4) -> Sort (cost=69.83..72.33 rows=1000 width=4) Sort Key: a -> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) (7 rows) In the first case pgsql doesn't have a Sort on top because the Sort below the Group produces the right "interesting order" (using the System-R term). In the second case however, since the order-by clause demands "desc" there is a Sort tagged on on top. Now, instead of doing this, isn't it better to just have a similar plan as in the first case, but just change the lower Sort to be descending ? It doesn't affect the Group and the Agg in any way .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Sat, Jul 12, 2003 at 00:39:06 -0700, Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> wrote: > > Consider the explain for the following queries .. > > sample=# explain select a, count(*) from foo group by a order by a; > QUERY PLAN > ------------------------------------------------------------------------- > Aggregate (cost=69.83..77.33 rows=100 width=4) > -> Group (cost=69.83..74.83 rows=1000 width=4) > -> Sort (cost=69.83..72.33 rows=1000 width=4) > Sort Key: a > -> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) > (5 rows) > > sample=# explain select a, count(*) from foo group by a order by a desc; > QUERY PLAN > ------------------------------------------------------------------------------- > Sort (cost=80.65..80.90 rows=100 width=4) > Sort Key: a > -> Aggregate (cost=69.83..77.33 rows=100 width=4) > -> Group (cost=69.83..74.83 rows=1000 width=4) > -> Sort (cost=69.83..72.33 rows=1000 width=4) > Sort Key: a > -> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=4) > (7 rows) > > > In the first case pgsql doesn't have a Sort on top because the Sort > below the Group produces the right "interesting order" (using the > System-R term). In the second case however, since the order-by clause > demands "desc" there is a Sort tagged on on top. > > Now, instead of doing this, isn't it better to just have a similar > plan as in the first case, but just change the lower Sort to be > descending ? It doesn't affect the Group and the Agg in any way .. You might try this in 7.4. I am pretty sure a change was made a couple of weeks ago to let group by work with either sort order. Also hash aggragates have been available for quite a while in 7.4. This is a better plan when there are only a small number of distinct values.
>>>>> "Bruno" == Bruno Wolff, <Bruno> writes: Bruno> You might try this in 7.4. I am pretty sure a change was Bruno> made a couple of weeks ago to let group by workwith either Bruno> sort order. Also hash aggragates have been available for Bruno> quite a while in 7.4. This isa better plan when there are Bruno> only a small number of distinct values. Gotcha ! Thanks. TelegraphCQ is still on the 7.3.2 code base .. after doing one hellish merge in March, I'm not too eager to do another, although merging more often is likely going to be less painful. I knew about the hash-aggregates - we had set spilling of hash-aggregates to disk for large number of distinct values (with a crude form of recursive partitioning) as a course project for our undergraduate database class at Berkeley. When I get some time, I want to clean up my solution code and contribute it as a patch. I don't think that will be before the end of summer though. BTW, some systems prefer sorted grouped-aggregates to hashed grouped-aggregates - even for small distinct values. How it works is to just update the running aggregates in place in the sort tournament tree. The only requirement is to be able to compute aggregates of aggregates, so that partial aggs for the same distinct values across different runs can be merged. The advantage is that you get sorted grouped aggregation for the same cost of unsorted hash-grouped agg. The disadvantage is that you lose the modularity of the sort. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh