why hash on the primary key?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема why hash on the primary key?
Дата
Msg-id 603c8f070811281155u50b1edb9i85e4045677f729e3@mail.gmail.com
обсуждение исходный текст
Ответы Re: why hash on the primary key?  ("Adam Rich" <adam.r@sbcglobal.net>)
Re: why hash on the primary key?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
I'm seeing a lot of plans in my database that look like this:

portal=# explain select * from foo i, foo j where i.id = j.id;
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=769.87..2159.36 rows=13283 width=264)
   Hash Cond: (i.id = j.id)
   ->  Seq Scan on foo i  (cost=0.00..343.83 rows=13283 width=132)
   ->  Hash  (cost=343.83..343.83 rows=13283 width=132)
         ->  Seq Scan on foo j  (cost=0.00..343.83 rows=13283 width=132)

It seems very strange for the planner to decide to build an in-memory
hash table on a column that is already indexed (the primary key, no
less!).  But this is happening A LOT - I often see plans where a
majority of the joins are executed this way (and they're not all
self-joins either...).  It seems like the planner is concluding that
it's going to need most or all of the pages in the table anyway, and
that building a hash table as it goes is quicker than reading the
index pages in from disk.  On a simple query like the above, setting
enable_seqscan to off or random_page_cost to 1 generates the expected
plan:

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..2534.24 rows=13283 width=264)
   Merge Cond: (i.id = j.id)
   ->  Index Scan using foo_pkey on foo i  (cost=0.00..1167.50
rows=13283 width=132)
   ->  Index Scan using foo_pkey on foo j  (cost=0.00..1167.50
rows=13283 width=132)
(4 rows)

Experimentation shows this is actually about 25% faster.  But, this is
kind of a blunt instrument, and my attempts to fiddle with various
parameters have not been real succesful in generating better plans for
more complicated examples.

Any suggestions/explanations?

...Robert

В списке pgsql-general по дате отправления:

Предыдущее
От: Ron Mayer
Дата:
Сообщение: Re: Very large tables
Следующее
От: "Adam Rich"
Дата:
Сообщение: Re: why hash on the primary key?