Re: Functional dependency in GROUP BY through JOINs

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Functional dependency in GROUP BY through JOINs
Дата
Msg-id 007301cdd42c$462a4fc0$d27eef40$@gmail.com
обсуждение исходный текст
Ответ на Re: Functional dependency in GROUP BY through JOINs  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 07 December 2012 06:22
> To: Simon Riggs
> Cc: David Rowley; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Functional dependency in GROUP BY through JOINs
> 
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On 5 December 2012 23:37, David Rowley <dgrowleyml@gmail.com>
> wrote:
> >> Though this plan might not be quite as optimal as it could be as it
> >> performs the grouping after the join.
> 
> > PostgreSQL always calculates aggregation as the last step.
> 
> > It's a well known optimisation to push-down GROUP BY clauses to the
> > lowest level, but we don't do that, yet.
> 
> > You're right that it can make a massive difference to many queries.
> 
> In the case being presented here, it's not apparent to me that there's any
> advantage to be had at all.  You still need to aggregate over the rows
joining
> to each uniquely-keyed row.  So how exactly are you going to "push down
> the GROUP BY", and where does the savings come from?
> 

I guess the saving is, in at least this case it's because the join only
joins 10 rows on either side, but I think also because the grouping would
also be cheaper because it's done on an INT rather than CHAR().
But I'm thinking you're meaning the planner would have to know this is
cheaper and compare both versions of the plan. 

I should have showed the plan of the other nested query originally, so here
it is this time.

Version = 9.2.1


test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,SUM(s.quantity)
test-# FROM products p
test-# INNER JOIN bigsalestable s ON p.productid = s.productid
test-# GROUP BY p.product_code;                                                             QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------HashAggregate  (cost=18926.22..18926.32 rows=10 width=150)
(actual
time=553.403..553.405 rows=10 loops=1)  ->  Hash Join  (cost=1.23..18676.22 rows=50000 width=150) (actual
time=0.041..324.970 rows=1000000 loops=1)        Hash Cond: (s.productid = p.productid)        ->  Seq Scan on
bigsalestables  (cost=0.00..14425.00 rows=1000000
 
width=8) (actual time=0.012..70.627 rows=1000000 loops=1)        ->  Hash  (cost=1.10..1.10 rows=10 width=150) (actual
time=0.013..0.013 rows=10 loops=1)              Buckets: 1024  Batches: 1  Memory Usage: 1kB              ->  Seq Scan
onproducts p  (cost=0.00..1.10 rows=10
 
width=150) (actual time=0.004..0.007 rows=10 loops=1)Total runtime: 553.483 ms
(8 rows)

test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,s.quantity
test-# FROM products AS p
test-# INNER JOIN (SELECT productid,SUM(quantity) AS quantity FROM
bigsalestable GROUP BY productid) AS s ON p.productid = s.productid;
       QUERY PLAN
 
----------------------------------------------------------------------------
--------------------------------------------------------Hash Join  (cost=19426.22..19431.07 rows=10 width=154) (actual
time=295.548..295.557 rows=10 loops=1)  Hash Cond: (bigsalestable.productid = p.productid)  ->  HashAggregate
(cost=19425.00..19427.00rows=200 width=8) (actual
 
time=295.514..295.518 rows=10 loops=1)        ->  Seq Scan on bigsalestable  (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.004..59.330 rows=1000000 loops=1)  ->  Hash  (cost=1.10..1.10 rows=10 width=150) (actual
time=0.017..0.017
rows=10 loops=1)        Buckets: 1024  Batches: 1  Memory Usage: 1kB        ->  Seq Scan on products p
(cost=0.00..1.10rows=10 width=150)
 
(actual time=0.010..0.012 rows=10 loops=1)Total runtime: 295.612 ms
(8 rows)

Regards

David Rowley


>             regards, tom lane




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: pg_upgrade problem with invalid indexes
Следующее
От: "David Rowley"
Дата:
Сообщение: Re: Functional dependency in GROUP BY through JOINs