Обсуждение: why hash on the primary key?

Поиск
Список
Период
Сортировка

why hash on the primary key?

От
"Robert Haas"
Дата:
I'm seeing a lot of plans in my database that look like this:

portal=# explain select * from foo i, foo j where i.id = j.id;
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=769.87..2159.36 rows=13283 width=264)
   Hash Cond: (i.id = j.id)
   ->  Seq Scan on foo i  (cost=0.00..343.83 rows=13283 width=132)
   ->  Hash  (cost=343.83..343.83 rows=13283 width=132)
         ->  Seq Scan on foo j  (cost=0.00..343.83 rows=13283 width=132)

It seems very strange for the planner to decide to build an in-memory
hash table on a column that is already indexed (the primary key, no
less!).  But this is happening A LOT - I often see plans where a
majority of the joins are executed this way (and they're not all
self-joins either...).  It seems like the planner is concluding that
it's going to need most or all of the pages in the table anyway, and
that building a hash table as it goes is quicker than reading the
index pages in from disk.  On a simple query like the above, setting
enable_seqscan to off or random_page_cost to 1 generates the expected
plan:

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..2534.24 rows=13283 width=264)
   Merge Cond: (i.id = j.id)
   ->  Index Scan using foo_pkey on foo i  (cost=0.00..1167.50
rows=13283 width=132)
   ->  Index Scan using foo_pkey on foo j  (cost=0.00..1167.50
rows=13283 width=132)
(4 rows)

Experimentation shows this is actually about 25% faster.  But, this is
kind of a blunt instrument, and my attempts to fiddle with various
parameters have not been real succesful in generating better plans for
more complicated examples.

Any suggestions/explanations?

...Robert

Re: why hash on the primary key?

От
"Adam Rich"
Дата:
> I'm seeing a lot of plans in my database that look like this:
> It seems very strange for the planner to decide to build an in-memory
> hash table on a column that is already indexed (the primary key, no
> less!).  But this is happening A LOT - I often see plans where a
> majority of the joins are executed this way (and they're not all
> self-joins either...).  It seems like the planner is concluding that
> it's going to need most or all of the pages in the table anyway, and
> that building a hash table as it goes is quicker than reading the
> index pages in from disk.  On a simple query like the above, setting
> enable_seqscan to off or random_page_cost to 1 generates the expected
> plan:
> Experimentation shows this is actually about 25% faster.  But, this is
> kind of a blunt instrument, and my attempts to fiddle with various
> parameters have not been real succesful in generating better plans for
> more complicated examples.
>
> Any suggestions/explanations?
>
> ...Robert

Could you send the output of these two queries using "explain analyze"
instead of plain explain?




Re: why hash on the primary key?

От
Tom Lane
Дата:
"Robert Haas" <robertmhaas@gmail.com> writes:
> It seems very strange for the planner to decide to build an in-memory
> hash table on a column that is already indexed (the primary key, no
> less!).

What's strange about it?  A probe into an in-memory hashtable is a lot
cheaper than a probe into an index, so this type of plan makes plenty
of sense if the hashtable will fit in RAM and there are going to be a
lot of probes.  (Where "a lot" means "enough to amortize the cost of
building the hashtable", of course.)

> Experimentation shows this is actually about 25% faster.

Well, that just says your cost parameters need a bit of adjustment
if you'd like the planner to get the crossover point exactly right.

            regards, tom lane

Re: why hash on the primary key?

От
"Robert Haas"
Дата:
> Could you send the output of these two queries using "explain analyze"
> instead of plain explain?

portal=# explain analyze select * from foo i, foo j where i.id = j.id;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=747.87..2127.36 rows=13283 width=272) (actual
time=68.434..205.536 rows=13283 loops=1)
   Hash Cond: (i.id = j.id)
   ->  Seq Scan on foo i  (cost=0.00..315.83 rows=13283 width=136)
(actual time=0.024..23.349 rows=13283 loops=1)
   ->  Hash  (cost=315.83..315.83 rows=13283 width=136) (actual
time=68.353..68.353 rows=13283 loops=1)
         ->  Seq Scan on foo j  (cost=0.00..315.83 rows=13283
width=136) (actual time=0.009..23.839 rows=13283 loops=1)
 Total runtime: 223.390 ms
(6 rows)
portal=# set enable_seqscan to false;
SET
portal=# explain analyze select * from foo i, foo j where i.id = j.id;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..2310.24 rows=13283 width=272) (actual
time=0.272..149.317 rows=13283 loops=1)
   Merge Cond: (i.id = j.id)
   ->  Index Scan using foo_pkey on foo i  (cost=0.00..1055.50
rows=13283 width=136) (actual time=0.205..29.714 rows=13283 loops=1)
   ->  Index Scan using foo_pkey on foo j  (cost=0.00..1055.50
rows=13283 width=136) (actual time=0.025..37.534 rows=13283 loops=1)
 Total runtime: 166.364 ms
(5 rows)

Re: why hash on the primary key?

От
"Robert Haas"
Дата:
> What's strange about it?  A probe into an in-memory hashtable is a lot
> cheaper than a probe into an index, so this type of plan makes plenty
> of sense if the hashtable will fit in RAM and there are going to be a
> lot of probes.  (Where "a lot" means "enough to amortize the cost of
> building the hashtable", of course.)

Hmm...  it didn't occur to me that the index probe itself might be
more expensive than a hash probe.  Is that due to concurrency control,
or are you talking about the need to possibly read index pages in from
disk?

>> Experimentation shows this is actually about 25% faster.
>
> Well, that just says your cost parameters need a bit of adjustment
> if you'd like the planner to get the crossover point exactly right.

Any sense of which ones might be worth fiddling with?

...Robert

Re: why hash on the primary key?

От
Tom Lane
Дата:
"Robert Haas" <robertmhaas@gmail.com> writes:
>> What's strange about it?  A probe into an in-memory hashtable is a lot
>> cheaper than a probe into an index, so this type of plan makes plenty
>> of sense if the hashtable will fit in RAM and there are going to be a
>> lot of probes.  (Where "a lot" means "enough to amortize the cost of
>> building the hashtable", of course.)

> Hmm...  it didn't occur to me that the index probe itself might be
> more expensive than a hash probe.  Is that due to concurrency control,
> or are you talking about the need to possibly read index pages in from
> disk?

Both, plus the loss of sequentiality of access to the table itself.

>>> Experimentation shows this is actually about 25% faster.
>>
>> Well, that just says your cost parameters need a bit of adjustment
>> if you'd like the planner to get the crossover point exactly right.

> Any sense of which ones might be worth fiddling with?

random_page_cost, effective_cache_size, maybe the cpu_xxx parameters.

            regards, tom lane

Re: why hash on the primary key?

От
"Robert Haas"
Дата:
>>> Well, that just says your cost parameters need a bit of adjustment
>>> if you'd like the planner to get the crossover point exactly right.
>
>> Any sense of which ones might be worth fiddling with?
>
> random_page_cost, effective_cache_size, maybe the cpu_xxx parameters.

I fiddled with this some more on a more complex query with more than
20 joins.  It appears that the culprit is from_collapse_limit.  If I
raise from_collapse_limit, most of the hash joins turn into Nested
Loops or Nested Loop Left Joins with Index Scans.  The estimated cost
is about 20% of the cost of the original plan, but the planning takes
so much longer that the actual time is 60% larger.  This is really the
pits.

I had hoped that the work Simon Riggs was doing on join removal would
hit 8.4, but that doesn't seem likely at this point.  The problem is
that the application is a web app where the user can select which
columns they want to see.  Most of the time only ~6-8 columns of 40+
that are available are selected, and the joins needed to generate the
unselected columns can be tossed (which would hopefully both avoid
unnecessary join steps and also speed up planning the remaining
joins).

...Robert