Обсуждение: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

Поиск
Список
Период
Сортировка

Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't
obvious from the plan.

The query actually runs slightly slower in CVS than with 7.3, though it's hard
to compare because it seems to have done everything differently. Every hash
join has become a merge join and every merge join has become a hash join :/



SELECT hier.level_0_id as parent_id,
       (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
       *
  FROM hier LEFT OUTER JOIN (
        SELECT distinct level_0_id, level_1_id
          FROM cache_foo JOIN foo_hier USING (foo_id)
         WHERE key_value = 839
           AND dist < 60
     ) AS cache ON (hier.hier_id = cache.level_1_id)
 WHERE level = 1
 ORDER BY 1,2



The cvs plan:

                                                                                   QUERY PLAN
                                                        

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3691.30..3691.49 rows=76 width=1584) (actual time=917.19..917.23 rows=31 loops=1)
   Sort Key: hier.level_0_id, (subplan)
   ->  Hash Left Join  (cost=2837.80..3688.92 rows=76 width=1584) (actual time=914.46..916.89 rows=31 loops=1)
         Hash Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Seq Scan on hier  (cost=0.00..63.86 rows=14 width=1576) (actual time=7.16..7.37 rows=31 loops=1)
               Filter: ("level" = 1)
         ->  Hash  (cost=2799.99..2799.99 rows=15122 width=16) (actual time=905.99..905.99 rows=0 loops=1)
               ->  Subquery Scan "cache"  (cost=2686.58..2799.99 rows=15122 width=16) (actual time=853.43..905.84
rows=31loops=1) 
                     ->  Unique  (cost=2686.58..2799.99 rows=15122 width=16) (actual time=853.41..905.68 rows=31
loops=1)
                           ->  Sort  (cost=2686.58..2724.38 rows=15122 width=16) (actual time=853.40..873.00 rows=16440
loops=1)
                                 Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                 ->  Merge Join  (cost=123.56..1636.78 rows=15122 width=16) (actual time=248.54..723.14
rows=16440loops=1) 
                                       Merge Cond: ("outer".foo_id = "inner".foo_id)
                                       ->  Index Scan using foo_hier_foo on foo_hier  (cost=0.00..1173.54 rows=45140
width=12)(actual time=0.04..234.30 rows=45140 loops=1) 
                                       ->  Sort  (cost=123.56..123.73 rows=67 width=4) (actual time=248.43..267.31
rows=16441loops=1) 
                                             Sort Key: cache_foo.foo_id
                                             ->  Index Scan using idx_cache_foo on cache_foo  (cost=0.00..121.53
rows=67width=4) (actual time=0.06..116.21 rows=16486 loops=1) 
                                                   Index Cond: ((key_value = 839) AND (dist < 60::double precision))
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.01 rows=1 width=516) (actual
time=0.03..0.05rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 928.46 msec



The 7.3 plan:

                                                                                  QUERY PLAN
                                                      

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4209.04..4209.12 rows=31 width=85) (actual time=849.01..849.05 rows=31 loops=1)
   Sort Key: hier.level_0_id, (subplan)
   ->  Merge Join  (cost=4199.51..4208.27 rows=31 width=85) (actual time=846.77..848.69 rows=31 loops=1)
         Merge Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Sort  (cost=64.63..64.71 rows=31 width=77) (actual time=8.38..8.42 rows=31 loops=1)
               Sort Key: hier.hier_id
               ->  Seq Scan on hier  (cost=0.00..63.86 rows=31 width=77) (actual time=7.84..8.20 rows=31 loops=1)
                     Filter: ("level" = 1)
         ->  Sort  (cost=4134.88..4139.07 rows=1674 width=16) (actual time=837.74..837.78 rows=31 loops=1)
               Sort Key: "cache".level_1_id
               ->  Subquery Scan "cache"  (cost=3919.66..4045.23 rows=1674 width=16) (actual time=786.64..837.54
rows=31loops=1) 
                     ->  Unique  (cost=3919.66..4045.23 rows=1674 width=16) (actual time=786.63..837.38 rows=31
loops=1)
                           ->  Sort  (cost=3919.66..3961.52 rows=16743 width=16) (actual time=786.61..804.71 rows=16440
loops=1)
                                 Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                 ->  Hash Join  (cost=1599.78..2745.06 rows=16743 width=16) (actual time=349.16..628.43
rows=16440loops=1) 
                                       Hash Cond: ("outer".foo_id = "inner".foo_id)
                                       ->  Index Scan using idx_cache_foo on cache_foo  (cost=0.00..810.43 rows=16743
width=4)(actual time=0.07..144.44 rows=16486 loops=1) 
                                             Index Cond: ((key_value = 839) AND (dist < 60::double precision))
                                       ->  Hash  (cost=746.40..746.40 rows=45140 width=12) (actual time=347.32..347.32
rows=0loops=1) 
                                             ->  Seq Scan on foo_hier  (cost=0.00..746.40 rows=45140 width=12) (actual
time=0.05..222.63rows=45140 loops=1) 
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.03 rows=1 width=17) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 854.57 msec



--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't
> obvious from the plan.

> SELECT hier.level_0_id as parent_id,
>        (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
>        *
>   FROM hier LEFT OUTER JOIN (
>         SELECT distinct level_0_id, level_1_id
>           FROM cache_foo JOIN foo_hier USING (foo_id)
>          WHERE key_value = 839
>            AND dist < 60
>      ) AS cache ON (hier.hier_id = cache.level_1_id)
>  WHERE level = 1
>  ORDER BY 1,2

Why would you expect hash aggregation to be used here?  There's no
aggregates ... nor even any GROUP BY.

A hash aggregation plan looks like this:

regression=# explain select ten, sum(unique1) from tenk1 group by ten;
                           QUERY PLAN
-----------------------------------------------------------------
 HashAggregate  (cost=508.00..508.02 rows=10 width=8)
   ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=8)
(2 rows)

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> The query actually runs slightly slower in CVS than with 7.3, though
> it's hard to compare because it seems to have done everything
> differently. Every hash join has become a merge join and every merge
> join has become a hash join :/

The estimated row counts for the bottom-level scans show some
differences, which probably account for the differing choice of plans.

Are these actually the same data with up-to-date ANALYZE stats in both
cases?  I do not recall that we've made any adjustments to the basic
statistical estimation routines since 7.3 ...

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't
> > obvious from the plan.
>
> > SELECT hier.level_0_id as parent_id,
> >        (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
> >        *
> >   FROM hier LEFT OUTER JOIN (
> >         SELECT distinct level_0_id, level_1_id
> >           FROM cache_foo JOIN foo_hier USING (foo_id)
> >          WHERE key_value = 839
> >            AND dist < 60
> >      ) AS cache ON (hier.hier_id = cache.level_1_id)
> >  WHERE level = 1
> >  ORDER BY 1,2
>
> Why would you expect hash aggregation to be used here?  There's no
> aggregates ... nor even any GROUP BY.

Well, "SELECT distinct level_0_id, level_1_id"  is equivalent to a GROUP BY
level_0_id, level_1_id.

Um, I think I grabbed the wrong query from the logs though, sorry. Here's a
better example from the actual code, in fact I think it's what the above query
turned into after more work.

There's only a small decrease in speed from 7.3 to CVS now, but I was hoping
for a big speed increase from hash aggregates since most of the time is being
sunk into that sort. But it definitely isn't using them. I guess TNSTAAFL.


SELECT hier.level_0_id as parent_id,
       (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
       *
  FROM hier LEFT OUTER JOIN (
        SELECT min(dist) AS mindist, count(distinct foo_id) AS num_foos, level_1_id
          FROM cache_foos JOIN foo_hier USING (foo_id)
         WHERE key_id = 839
           AND dist < 60
         GROUP BY level_0_id, level_1_id
     ) AS cache ON (hier.hier_id = cache.level_1_id)
 WHERE level = 1
 ORDER BY level_0_id;

CVS:

                                                                                   QUERY PLAN
                                                         

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3936.87..3937.06 rows=76 width=1596) (actual time=1047.13..1047.16 rows=31 loops=1)
   Sort Key: hier.level_0_id
   ->  Hash Left Join  (cost=2989.02..3934.50 rows=76 width=1596) (actual time=1044.90..1046.87 rows=31 loops=1)
         Hash Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Seq Scan on hier  (cost=0.00..63.86 rows=14 width=1576) (actual time=7.92..8.13 rows=31 loops=1)
               Filter: ("level" = 1)
         ->  Hash  (cost=2951.21..2951.21 rows=15122 width=24) (actual time=1033.78..1033.78 rows=0 loops=1)
               ->  Subquery Scan "cache"  (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.66..1033.60
rows=31loops=1) 
                     ->  GroupAggregate  (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.64..1033.40
rows=31loops=1) 
                           ->  Sort  (cost=2686.58..2724.38 rows=15122 width=24) (actual time=913.00..931.40 rows=16440
loops=1)
                                 Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                 ->  Merge Join  (cost=123.56..1636.78 rows=15122 width=24) (actual time=280.80..779.05
rows=16440loops=1) 
                                       Merge Cond: ("outer".foo_id = "inner".foo_id)
                                       ->  Index Scan using foo_hier_foo on foo_hier  (cost=0.00..1173.54 rows=45140
width=12)(actual time=0.04..225.13 rows=45140 loops=1) 
                                       ->  Sort  (cost=123.56..123.73 rows=67 width=12) (actual time=280.69..302.62
rows=16441loops=1) 
                                             Sort Key: cache_foos.foo_id
                                             ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..121.53
rows=67width=12) (actual time=0.05..128.19 rows=16486 loops=1) 
                                                   Index Cond: ((key_id = 839) AND (dist < 60::double precision))
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.01 rows=1 width=516) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 1058.63 msec


7.3:

                                                                                     QUERY PLAN
                                                            

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4103.84..4103.92 rows=31 width=97) (actual time=1033.79..1033.82 rows=31 loops=1)
   Sort Key: hier.level_0_id
   ->  Merge Join  (cost=4094.32..4103.08 rows=31 width=97) (actual time=1031.62..1033.54 rows=31 loops=1)
         Merge Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Sort  (cost=64.63..64.71 rows=31 width=77) (actual time=7.92..7.96 rows=31 loops=1)
               Sort Key: hier.hier_id
               ->  Seq Scan on hier  (cost=0.00..63.86 rows=31 width=77) (actual time=7.25..7.77 rows=31 loops=1)
                     Filter: ("level" = 1)
         ->  Sort  (cost=4029.69..4033.87 rows=1674 width=24) (actual time=1023.54..1023.58 rows=31 loops=1)
               Sort Key: "cache".level_1_id
               ->  Subquery Scan "cache"  (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.88..1023.35
rows=31loops=1) 
                     ->  Aggregate  (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.86..1023.14 rows=31
loops=1)
                           ->  Group  (cost=3730.75..3856.32 rows=16743 width=24) (actual time=822.88..940.44
rows=16440loops=1) 
                                 ->  Sort  (cost=3730.75..3772.61 rows=16743 width=24) (actual time=822.86..841.47
rows=16440loops=1) 
                                       Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                       ->  Hash Join  (cost=1410.87..2556.15 rows=16743 width=24) (actual
time=347.17..662.63rows=16440 loops=1) 
                                             Hash Cond: ("outer".foo_id = "inner".foo_id)
                                             ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..810.43
rows=16743width=12) (actual time=0.07..152.56 rows=16486 loops=1) 
                                                   Index Cond: ((key_id = 839) AND (dist < 60::double precision))
                                             ->  Hash  (cost=746.40..746.40 rows=45140 width=12) (actual
time=345.53..345.53rows=0 loops=1) 
                                                   ->  Seq Scan on foo_hier  (cost=0.00..746.40 rows=45140 width=12)
(actualtime=0.06..213.59 rows=45140 loops=1) 
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.03 rows=1 width=17) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 1039.57 msec





--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
>
> The estimated row counts for the bottom-level scans show some
> differences, which probably account for the differing choice of plans.
>
> Are these actually the same data with up-to-date ANALYZE stats in both
> cases?  I do not recall that we've made any adjustments to the basic
> statistical estimation routines since 7.3 ...

The exact same data. I just copied the data using pg_dumpall to do the upgrade
and haven't updated either database at all since. The analyze stats should be
pretty up-to-date but in any case should be the same for both assuming they
get copied in the dump as well.

--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> There's only a small decrease in speed from 7.3 to CVS now, but I was hoping
> for a big speed increase from hash aggregates since most of the time is being
> sunk into that sort. But it definitely isn't using them. I guess TNSTAAFL.

It looks like it's avoiding the hash choice because it thinks there will
be many groups, 15122 to be exact:

>                      ->  GroupAggregate  (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.64..1033.40
rows=31loops=1) 

You could probably persuade it that hashed aggregation will be okay by
increasing sort_mem sufficiently.  But it would also be interesting to
see if the number-of-groups estimate can be improved ... 15122 is rather
badly off from the true value of 31 ...

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> The exact same data. I just copied the data using pg_dumpall to do the upgrade
> and haven't updated either database at all since. The analyze stats should be
> pretty up-to-date but in any case should be the same for both assuming they
> get copied in the dump as well.

Uh, no, pg_dump doesn't copy pg_statistic; you must do a fresh ANALYZE
after loading the dump file.

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> You could probably persuade it that hashed aggregation will be okay by
> increasing sort_mem sufficiently.  But it would also be interesting to
> see if the number-of-groups estimate can be improved ... 15122 is rather
> badly off from the true value of 31 ...

I don't see how you're ever going to reliably come up with a good estimate for
this. Consider that it's not just the distribution of the column that matters,
but the distribution given the where clauses in effect. This is dependent on
what degree the expressions in the where clauses are independent of the
expressions we're grouping on, which is hard to predict in foovance.

If the prediction is wrong is it just a performance penalty? The hash can
still proceed if it has to go to disk? In which case is there a way for me to
force it to use hashes if I know better than the optimizer?

I've run vacuum full and analyze on both databases again. The data should be
identical as I've just copied the database and I haven't updated anything. Two
example queries tested with both. It isn't using hash aggregates for either.
The plans are still quite different.


This is the previous query except I've added hier.level_0_id to the join
clause in the hopes it would skip the redundant sort. It didn't work, though
the second sort is fast due to there being relatively few records coming out
of the group.

SELECT hier.level_0_id as parent_id,
       (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
       *
  FROM hier LEFT OUTER JOIN (
        SELECT min(dist) AS mindist, count(distinct foo_id) AS num_foos, level_1_id, level_0_id
          FROM cache_foos JOIN foo_hier USING (foo_id)
         WHERE region_id = 839
           AND dist < 60
         GROUP BY level_0_id, level_1_id
     ) AS cache ON (hier.hier_id = cache.level_1_id and hier.level_0_id = cache.level_0_id)
 WHERE level = 1
 ORDER BY hier.level_0_id, hier.level_1_id

CVS:
                                                                                  QUERY PLAN
                                                      

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4091.65..4091.73 rows=31 width=101) (actual time=907.95..907.99 rows=31 loops=1)
   Sort Key: hier.level_0_id, hier.level_1_id
   ->  Merge Left Join  (cost=3964.22..4090.89 rows=31 width=101) (actual time=905.53..907.61 rows=31 loops=1)
         Merge Cond: (("outer".level_0_id = "inner".level_0_id) AND ("outer".hier_id = "inner".level_1_id))
         ->  Sort  (cost=64.63..64.71 rows=31 width=77) (actual time=7.61..7.64 rows=31 loops=1)
               Sort Key: hier.level_0_id, hier.hier_id
               ->  Seq Scan on hier  (cost=0.00..63.86 rows=31 width=77) (actual time=6.47..7.36 rows=31 loops=1)
                     Filter: ("level" = 1)
         ->  Sort  (cost=3899.59..3899.90 rows=124 width=24) (actual time=897.76..897.80 rows=31 loops=1)
               Sort Key: "cache".level_0_id, "cache".level_1_id
               ->  Subquery Scan "cache"  (cost=3697.05..3895.28 rows=124 width=24) (actual time=771.57..897.28 rows=31
loops=1)
                     ->  GroupAggregate  (cost=3697.05..3895.28 rows=124 width=24) (actual time=771.55..897.03 rows=31
loops=1)
                           ->  Sort  (cost=3697.05..3736.57 rows=15809 width=24) (actual time=764.08..782.64 rows=16440
loops=1)
                                 Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                 ->  Hash Join  (cost=853.20..2594.49 rows=15809 width=24) (actual time=307.51..603.83
rows=16440loops=1) 
                                       Hash Cond: ("outer".foo_id = "inner".foo_id)
                                       ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..752.94 rows=15808
width=12)(actual time=0.08..111.71 rows=16486 loops=1) 
                                             Index Cond: ((region_id = 839) AND (dist < 60::double precision))
                                       ->  Hash  (cost=740.16..740.16 rows=45216 width=12) (actual time=306.11..306.11
rows=0loops=1) 
                                             ->  Seq Scan on foo_hier  (cost=0.00..740.16 rows=45216 width=12) (actual
time=0.03..143.56rows=45140 loops=1) 
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.05 rows=2 width=17) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 915.22 msec


7.3:
                                                                                      QUERY PLAN
                                                             

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4056.43..4056.51 rows=31 width=101) (actual time=1036.93..1036.97 rows=31 loops=1)
   Sort Key: hier.level_0_id, hier.level_1_id
   ->  Merge Join  (cost=4047.39..4055.66 rows=31 width=101) (actual time=1034.53..1036.59 rows=31 loops=1)
         Merge Cond: (("outer".level_0_id = "inner".level_0_id) AND ("outer".hier_id = "inner".level_1_id))
         ->  Sort  (cost=64.63..64.71 rows=31 width=77) (actual time=10.15..10.18 rows=31 loops=1)
               Sort Key: hier.level_0_id, hier.hier_id
               ->  Seq Scan on hier  (cost=0.00..63.86 rows=31 width=77) (actual time=9.59..9.94 rows=31 loops=1)
                     Filter: ("level" = 1)
         ->  Sort  (cost=3982.76..3986.82 rows=1624 width=24) (actual time=1024.23..1024.26 rows=31 loops=1)
               Sort Key: "cache".level_0_id, "cache".level_1_id
               ->  Subquery Scan "cache"  (cost=3693.23..3896.18 rows=1624 width=24) (actual time=835.96..1023.98
rows=31loops=1) 
                     ->  Aggregate  (cost=3693.23..3896.18 rows=1624 width=24) (actual time=835.95..1023.76 rows=31
loops=1)
                           ->  Group  (cost=3693.23..3815.00 rows=16236 width=24) (actual time=828.96..941.70
rows=16440loops=1) 
                                 ->  Sort  (cost=3693.23..3733.82 rows=16236 width=24) (actual time=828.93..848.83
rows=16440loops=1) 
                                       Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                       ->  Hash Join  (cost=1432.53..2557.77 rows=16236 width=24) (actual
time=350.81..670.83rows=16440 loops=1) 
                                             Hash Cond: ("outer".foo_id = "inner".foo_id)
                                             ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..800.51
rows=16236width=12) (actual time=0.08..159.95 rows=16486 loops=1) 
                                                   Index Cond: ((region_id = 839) AND (dist < 60::double precision))
                                             ->  Hash  (cost=746.35..746.35 rows=45135 width=12) (actual
time=349.15..349.15rows=0 loops=1) 
                                                   ->  Seq Scan on foo_hier  (cost=0.00..746.35 rows=45135 width=12)
(actualtime=0.06..219.15 rows=45140 loops=1) 
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.10 rows=1 width=17) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 1042.86 msec





This is another query, one that involves a distinct on and a group. I guess
the plan for this one could be different because the JOIN no longer constrains
the join order. Though the order seems the same to me, it's the statistics
that seem to be different.


SELECT *
  FROM (
        SELECT DISTINCT ON (bar_id)
               bar_id, bar_location_id, num_foos, mindist
          FROM (
                SELECT bar_id, bar_location_id, count(distinct foo_id) as num_foos, min(dist) as mindist
                  FROM cache_foos
                 WHERE region_id = 839
                   AND dist < 60
                   AND foo_id is not null
                 GROUP BY bar_id, bar_location_id
                ) as x
         ORDER BY bar_id, mindist asc
      ) AS cache
   JOIN bar using (bar_id)
   JOIN bar_location using (bar_location_id)


CVS:

                                                                                           QUERY PLAN
                                                                        

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3438.87..3815.85 rows=200 width=654) (actual time=470.72..533.88 rows=112 loops=1)
   Merge Cond: ("outer".bar_location_id = "inner".bar_location_id)
   ->  Index Scan using bar_location_pkey on bar_location  (cost=0.00..344.50 rows=11792 width=610) (actual
time=0.04..59.41rows=11757 loops=1) 
   ->  Sort  (cost=3438.87..3439.37 rows=200 width=44) (actual time=435.51..435.68 rows=112 loops=1)
         Sort Key: "cache".bar_location_id
         ->  Merge Join  (cost=3319.28..3431.22 rows=200 width=44) (actual time=407.23..434.97 rows=112 loops=1)
               Merge Cond: ("outer".bar_id = "inner".bar_id)
               ->  Index Scan using bar_pkey on bar  (cost=0.00..99.52 rows=3768 width=20) (actual time=0.02..18.18
rows=3619loops=1) 
               ->  Sort  (cost=3319.28..3319.78 rows=200 width=20) (actual time=407.17..407.29 rows=112 loops=1)
                     Sort Key: "cache".bar_id
                     ->  Subquery Scan "cache"  (cost=3232.65..3311.64 rows=200 width=20) (actual time=405.71..406.80
rows=112loops=1) 
                           ->  Unique  (cost=3232.65..3311.64 rows=200 width=20) (actual time=405.70..406.24 rows=112
loops=1)
                                 ->  Sort  (cost=3232.65..3272.14 rows=15797 width=20) (actual time=405.69..405.84
rows=146loops=1) 
                                       Sort Key: bar_id, mindist
                                       ->  Subquery Scan x  (cost=1854.57..2131.02 rows=15797 width=20) (actual
time=272.07..405.08rows=146 loops=1) 
                                             ->  GroupAggregate  (cost=1854.57..2131.02 rows=15797 width=20) (actual
time=272.06..404.21rows=146 loops=1) 
                                                   ->  Sort  (cost=1854.57..1894.06 rows=15797 width=20) (actual
time=270.37..289.31rows=16440 loops=1) 
                                                         Sort Key: bar_id, bar_location_id
                                                         ->  Index Scan using idx_cache_foos on cache_foos
(cost=0.00..752.94rows=15797 width=20) (actual time=0.05..118.41 rows=16440 loops=1) 
                                                               Index Cond: ((region_id = 839) AND (dist < 60::double
precision))
                                                               Filter: (foo_id IS NOT NULL)
 Total runtime: 540.00 msec


7.3:
                                                                                            QUERY PLAN
                                                                         

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2236.97..2892.88 rows=162 width=159) (actual time=515.70..562.77 rows=112 loops=1)
   ->  Merge Join  (cost=2236.97..2397.80 rows=162 width=44) (actual time=515.58..555.70 rows=112 loops=1)
         Merge Cond: ("outer".bar_id = "inner".bar_id)
         ->  Index Scan using bar_pkey on bar  (cost=0.00..148.37 rows=3850 width=20) (actual time=0.05..29.76
rows=3619loops=1) 
         ->  Sort  (cost=2236.97..2237.37 rows=162 width=20) (actual time=515.49..515.64 rows=112 loops=1)
               Sort Key: "cache".bar_id
               ->  Subquery Scan "cache"  (cost=2222.92..2231.02 rows=162 width=20) (actual time=513.96..515.09
rows=112loops=1) 
                     ->  Unique  (cost=2222.92..2231.02 rows=162 width=20) (actual time=513.95..514.51 rows=112
loops=1)
                           ->  Sort  (cost=2222.92..2226.97 rows=1621 width=20) (actual time=513.94..514.11 rows=146
loops=1)
                                 Sort Key: bar_id, mindist
                                 ->  Subquery Scan x  (cost=1933.89..2136.50 rows=1621 width=20) (actual
time=313.72..513.34rows=146 loops=1) 
                                       ->  Aggregate  (cost=1933.89..2136.50 rows=1621 width=20) (actual
time=313.71..512.52rows=146 loops=1) 
                                             ->  Group  (cost=1933.89..2055.46 rows=16209 width=20) (actual
time=311.41..422.29rows=16440 loops=1) 
                                                   ->  Sort  (cost=1933.89..1974.41 rows=16209 width=20) (actual
time=311.39..329.79rows=16440 loops=1) 
                                                         Sort Key: bar_id, bar_location_id
                                                         ->  Index Scan using idx_cache_foos on cache_foos
(cost=0.00..800.51rows=16209 width=20) (actual time=0.05..181.92 rows=16440 loops=1) 
                                                               Index Cond: ((region_id = 839) AND (dist < 60::double
precision))
                                                               Filter: (foo_id IS NOT NULL)
   ->  Index Scan using bar_location_pkey on bar_location  (cost=0.00..3.04 rows=1 width=115) (actual time=0.04..0.04
rows=1loops=112) 
         Index Cond: ("outer".bar_location_id = bar_location.bar_location_id)
 Total runtime: 568.69 msec


[BTW I've had to search and repalce on the plans at the request of the client,
 I hope I didn't lose any relevant information doing that]

--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> You could probably persuade it that hashed aggregation will be okay by
> increasing sort_mem sufficiently.  But it would also be interesting to
> see if the number-of-groups estimate can be improved ... 15122 is rather
> badly off from the true value of 31 ...

My sort_mem is already quite large, 53M. I think that's large enough even for
15,122 rows.


--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> I don't see how you're ever going to reliably come up with a good
> estimate for this.

Well, it'll probably never be really good, but it's only been a few
weeks that we've had code that tried to do it at all; I'm not prepared
to write off the concept without even trying.  And I see it's not doing
that badly in your example, now that you've ANALYZEd --- 124 estimated
groups vs 31 actual is well within what would make me happy.

But with only 124 estimated groups, it's certainly not the size of the
hashtable that's dissuading it from hashing.  [ Looks at code... ]
Oh, here's the problem:

             * Executor doesn't support hashed aggregation with DISTINCT
             * aggregates.  (Doing so would imply storing *all* the input
             * values in the hash table, which seems like a certain loser.)

The count(distinct) you've got in there turns it off.

> If the prediction is wrong is it just a performance penalty? The hash can
> still proceed if it has to go to disk?

Long as you don't run out of swap space, sure ;-).  So the estimate only
really has to be right within a factor of (swap space)/sort_mem.

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> The count(distinct) you've got in there turns it off.

AAAH. That makes sense. Unfortunately I think I'm stuck with that, I'll have
to check though.

It was always a bit mysterious to me how postgres could implement
count(distinct) without introducing a separate sort and aggregate for each
occurrence.

I suppose it could rewrite it into an inner grouping with the added column,
then an outer grouping without it with a normal count(). But then that would
only work with a single count(distinct). And I'm not clear it would win for
this example either since there could actually be a pretty large number of
distinct values on that column. hmm...

Thanks for all the help.

--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> It was always a bit mysterious to me how postgres could implement
> count(distinct) without introducing a separate sort and aggregate for each
> occurrence.

It can't.  There's a sort + uniq + aggregate process done behind the
scenes in the executor for each DISTINCT aggregate.  This doesn't show
up on the EXPLAIN output, because the planner has nothing to do with it.

I thought about doing this via a separate hashtable for each group ...
for about a minute.  The trouble is you have to run those things in
parallel if you're doing hashed aggregation, so the resources required
are really out of the question in most cases.  With the group approach,
the executor is only processing the values for one outer group at a
time, so it only has to run one inner sort + uniq + agg process at a
time.

I suppose we could consider introducing two implementations (hash or
sort/uniq) for a DISTINCT agg within the executor, but it's code that
remains unwritten...

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
There is a simple case that isn't being handled. A straight DISTINCT is
exactly equivalent to a GROUP BY on all the columns listed. Right now it's
still doing a Sort+Unique. The HashAgg seems to be about 2-3 times as fast for
me.

Actually a DISTINCT ON should also be possible to do as a hashagg as long as
there's no ORDER BY, though I guess that would be fairly uncommon since it's
not that useful.


slo=> explain select distinct id from tab;

                                QUERY PLAN
---------------------------------------------------------------------------
 Unique  (cost=3731.53..3932.49 rows=146 width=4)
   ->  Sort  (cost=3731.53..3832.01 rows=40192 width=4)
         Sort Key: id
         ->  Seq Scan on tab  (cost=0.00..657.92 rows=40192 width=4)



slo=> explain select  id from tab group by id;

                             QUERY PLAN
---------------------------------------------------------------------
 HashAggregate  (cost=758.40..758.40 rows=146 width=4)
   ->  Seq Scan on tab  (cost=0.00..657.92 rows=40192 width=4)

--
greg

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Tom Lane
Дата:
Greg Stark <gsstark@mit.edu> writes:
> There is a simple case that isn't being handled. A straight DISTINCT is
> exactly equivalent to a GROUP BY on all the columns listed. Right now it's
> still doing a Sort+Unique.

True.  The DISTINCT logic is so intertwined with ORDER BY that I
couldn't see an easy way to consider a hash-based alternative in the
planner.  I think it would require querytree changes propagating all the
way back to the parser to make this feasible at all, and then you'd
still need to think about how to make an honest cost-estimate comparison
of the alternatives (preferably without planning the whole query twice).
Anyone want to dig into it?

            regards, tom lane

Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > There is a simple case that isn't being handled. A straight DISTINCT is
> > exactly equivalent to a GROUP BY on all the columns listed. Right now it's
> > still doing a Sort+Unique.
>
> True.  The DISTINCT logic is so intertwined with ORDER BY that I
> couldn't see an easy way to consider a hash-based alternative in the
> planner.  I think it would require querytree changes propagating all the
> way back to the parser to make this feasible at all, and then you'd
> still need to think about how to make an honest cost-estimate comparison
> of the alternatives (preferably without planning the whole query twice).
> Anyone want to dig into it?

So my thinking is that DISTINCT and DISTINCT ON are really just special cases
of GROUP BY. instead of having separate code paths for them I would suggest
rewriting them to GROUP BY, using a "first()" aggregate function for the
DISTINCT ON case. Then let the normal logic handle it.

The only disadvantage is that if there is an index it's possible to implement
DISTINCT ON more efficiently than arbitrary aggregate functions. Is that the
case currently for the Unique node?

If we have a way of marking arbitrary aggregate functions as only requiring
the first or last record then we could notice that the only aggregate in
rewritten DISTINCT ON queries is first() which only requires the first record
of each value and optimize the index scan. As a bonus this would solve the
min()/max() issue as well.

--
greg