Re: bad select performance fixed by forbidding hash joins

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: bad select performance fixed by forbidding hash joins
Дата
Msg-id 14955.933693143@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-hackers
A few weeks ago, George Young <gry@ll.mit.edu> complained that the
following query:

> select os.name,r.run_name,ro.status from opset_steps os,runs r,
> run_opsets ro where (ro.status=3 or ro.status=1) and
> ro.opset_id=os.opset_id and ro.run_id=r.run_id and
> ro.opset_ver=os.opset_ver and r.status=1;

had horrible performance when executed via the system's preferred plan,

> Hash Join  (cost=1793.58 rows=14560 width=38)
>   ->  Hash Join  (cost=1266.98 rows=14086 width=24)
>         ->  Seq Scan on run_opsets ro  (cost=685.51 rows=13903 width=8)
>         ->  Hash  (cost=70.84 rows=1389 width=16)
>               ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
>   ->  Hash  (cost=47.43 rows=374 width=14)
>         ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)

I have looked into this, and it seems that the problem is this: the
innermost hash join between run_opsets and opset_steps is being done
on the join clause ro.opset_ver=os.opset_ver.  In George's data,
the opset_ver columns only have about 14 distinct values, with a
very strong bias towards the values 1,2,3.  This means that the vast
majority of the opset_steps entries go into only three hash buckets,
and the vast majority of the probes from run_opsets search one of those
same three buckets, so that most of the run_opsets rows are being
compared to almost a third of the opset_steps rows, not just a small
fraction of them.  Almost all of the runtime of the query is going into
the tuple comparison tests :-(

It seems clear that we want the system not to risk using a hashjoin
unless it has good evidence that the inner table's column has a fairly
flat distribution.  I'm thinking that the right sort of check would be
to check whether the "disbursion" statistic set by VACUUM ANALYZE is
fairly small, maybe 0.01 or less (but not zero, which would suggest
that VACUUM ANALYZE has never been run).  This would roughly
correspond to the most common value appearing not more than 1% of the
time, so that we can be sure at least 100 different hashbuckets will
be used.  Comments?  Is that value too small?

This change is likely to reduce the optimizer's willingness to use
hashjoins by a *lot*, especially if we make the threshold too small.
If you'd like to see what kind of disbursion numbers you get on your
own data, try something likeselect relname,attname,attdisbursion from pg_class,pg_attributewhere attrelid =
pg_class.oidand relkind = 'r' and attnum > 0order by relname,attname;
 
after a vacuum analyze.
        regards, tom lane

PS: George, in the meantime I bet your query would run fine if the
system would only choose the opset_id clause instead of opset_ver
to do the hashjoin with --- opset_id has far better distribution.
I'm guessing that it thinks the two clauses are equally attractive
and is just choosing whichever one it happens to process first (or
last?).  You might try rearranging the order of the WHERE clauses
as a stopgap solution...


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Ross J. Reedstrom"
Дата:
Сообщение: Re: [HACKERS] || PostgreSQL
Следующее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] Threads