Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Дата
Msg-id AANLkTikhJMusPNB1D-xDexTGdgu=vHWrVS_+UYVVT28_@mail.gmail.com
обсуждение исходный текст
Ответ на HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Scott Carey <scott@richrelevance.com>)
Ответы Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On Mon, Oct 18, 2010 at 9:40 PM, Scott Carey <scott@richrelevance.com> wrote:
> 8.4.5
>
> I consistently see HashJoin plans that hash the large table, and scan the small table.  This is especially puzzling
insome cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and
scanthe big one? 
>
> Here is one case I saw just recently
>
>               Hash Cond: ((a.e_id)::text = (ta.name)::text)
>               ->  Index Scan using c_a_s_e_id on a  (cost=0.00..8.21 rows=14 width=27)
>                     Index Cond: (id = 12)
>               ->  Hash  (cost=89126.79..89126.79 rows=4825695 width=74)
>                     ->  Seq Scan on p_a_1287446030 tmp  (cost=0.00..89126.79 rows=4825695 width=74)
>                           Filter: (id = 12)

Can we have the complex EXPLAIN output here, please?  And the query?
For example, this would be perfectly sensible if the previous line
started with "Hash Semi Join" or "Hash Anti Join".

rhaas=# explain select * from little where exists (select * from big
where big.a = little.a);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Semi Join  (cost=3084.00..3478.30 rows=10 width=4)
   Hash Cond: (little.a = big.a)
   ->  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
   ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4)
         ->  Seq Scan on big  (cost=0.00..1443.00 rows=100000 width=4)
(5 rows)

I'm also a bit suspicious of the fact that the hash condition has a
cast to text on both sides, which implies, to me anyway, that the
underlying data types are not text.  That might mean that the query
planner doesn't have very good statistics, which might mean that the
join selectivity estimates are wackadoo, which can apparently cause
this problem:

rhaas=# explain select * from little, big where little.a = big.a;
                        QUERY PLAN
-----------------------------------------------------------------------
 Hash Join  (cost=3084.00..3577.00 rows=2400 width=8)
   Hash Cond: (little.a = big.a)
   ->  Seq Scan on little  (cost=0.00..34.00 rows=2400 width=4)
   ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4)
         ->  Seq Scan on big  (cost=0.00..1443.00 rows=100000 width=4)
(5 rows)

rhaas=# analyze;
ANALYZE
rhaas=# explain select * from little, big where little.a = big.a;
                            QUERY PLAN
-------------------------------------------------------------------
 Hash Join  (cost=1.23..1819.32 rows=10 width=8)
   Hash Cond: (big.a = little.a)
   ->  Seq Scan on big  (cost=0.00..1443.00 rows=100000 width=4)
   ->  Hash  (cost=1.10..1.10 rows=10 width=4)
         ->  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
(5 rows)

This doesn't appear to make a lot of sense, but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Select count(*), the sequel
Следующее
От: Robert Haas
Дата:
Сообщение: Re: odd postgresql performance (excessive lseek)