Обсуждение: a few crazy ideas about hash joins

Поиск
Список
Период
Сортировка

a few crazy ideas about hash joins

От
Robert Haas
Дата:
While investigating some performance problems recently I've had cause
to think about the way PostgreSQL uses hash joins.  So here are a few
thoughts.  Some of these have been brought up before.

1. When the hash is not expected to spill to disk, it preserves the
pathkeys of the outer side of the join.  If the optimizer were allowed
to assume that, it could produce significantly more efficient query
plans in some cases.  The problem is what to do if we start executing
the query and find out that we have more stuff to hash than we expect,
such that we need multiple batches?  Now the results won't be sorted.
I think we could handle this as follows: Don't count on the hash join
to preserve pathkeys unless it helps, and only rely on it when it
seems as if the hash table will still fit even if it turns out to be,
say, three times as big as expected.  But if you are counting on the
hash join to preserve pathkeys, then pass that information to the
executor.  When the executor is asked to perform a hash join, it will
first hash the inner side of the relation.  At that point, we know
whether we've succesfully gotten everything into a single batch, or
not.  If we have, perform the join normally.  If the worst has
happened and we've gone multi-batch, then perform the join and sort
the output before returning it.  The performance will suck, but at
least you'll get the right answer.

Previous in-passing reference to this idea here:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php

2. Consider building the hash table lazily.  I often see query planner
pick a hash join over a nested inner indexscan because it thinks that
it'll save enough time making hash probes rather than index probes to
justify the time spent building the hash table up front.  But
sometimes the relation that's being hashed has a couple thousand rows,
only a tiny fraction of which will ever be retrieved from the hash
table.  We can predict when this is going to happen because n_distinct
for the outer column will be much less than the size of the inner rel.In that case, we could consider starting with an
emptyhash table
 
that effectively acts as a cache.  Each time a value is probed, we
look it up in the hash table.  If there's no entry, we use an index
scan to find the matching rows and insert them into the hash table.
Negative results must also be cached.

3. Avoid building the exact same hash table twice in the same query.
This happens more often you'd think.  For example, a table may have
two columns creator_id and last_updater_id which both reference person
(id).  If you're considering a hash join between paths A and B, you
could conceivably check whether what is essentially a duplicate of B
has already been hashed somewhere within path A.  If so, you can reuse
that same hash table at zero startup-cost.

4. As previously discussed, avoid hashing for distinct and then
hashing the results for a hash join on the same column with the same
operators.

http://archives.postgresql.org/message-id/4136ffa0902191346g62081081v8607f0b92c206f0a@mail.gmail.com

Thoughts on the value and/or complexity of implementation of any of these?

...Robert


Re: a few crazy ideas about hash joins

От
Heikki Linnakangas
Дата:
Robert Haas wrote:
> While investigating some performance problems recently I've had cause
> to think about the way PostgreSQL uses hash joins.  So here are a few
> thoughts.  Some of these have been brought up before.
> 
> 1. When the hash is not expected to spill to disk, it preserves the
> pathkeys of the outer side of the join.  If the optimizer were allowed
> to assume that, it could produce significantly more efficient query
> plans in some cases.  The problem is what to do if we start executing
> the query and find out that we have more stuff to hash than we expect,
> such that we need multiple batches?  Now the results won't be sorted.
> I think we could handle this as follows: Don't count on the hash join
> to preserve pathkeys unless it helps, and only rely on it when it
> seems as if the hash table will still fit even if it turns out to be,
> say, three times as big as expected.  But if you are counting on the
> hash join to preserve pathkeys, then pass that information to the
> executor.  When the executor is asked to perform a hash join, it will
> first hash the inner side of the relation.  At that point, we know
> whether we've succesfully gotten everything into a single batch, or
> not.  If we have, perform the join normally.  If the worst has
> happened and we've gone multi-batch, then perform the join and sort
> the output before returning it.  The performance will suck, but at
> least you'll get the right answer.
> 
> Previous in-passing reference to this idea here:
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php

Hmm, instead of a sorting the output if the worst happens, a final merge 
step as in a merge sort would be enough.

> 2. Consider building the hash table lazily.  I often see query planner
> pick a hash join over a nested inner indexscan because it thinks that
> it'll save enough time making hash probes rather than index probes to
> justify the time spent building the hash table up front.  But
> sometimes the relation that's being hashed has a couple thousand rows,
> only a tiny fraction of which will ever be retrieved from the hash
> table.  We can predict when this is going to happen because n_distinct
> for the outer column will be much less than the size of the inner rel.
>  In that case, we could consider starting with an empty hash table
> that effectively acts as a cache.  Each time a value is probed, we
> look it up in the hash table.  If there's no entry, we use an index
> scan to find the matching rows and insert them into the hash table.
> Negative results must also be cached.

Yeah, that would be quite nice. One problem is that our ndistinct 
estimates are not very accurate.

> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think.  For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id).  If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A.  If so, you can reuse
> that same hash table at zero startup-cost.

That seems like a quite simple thing to do. But would it work for a 
multi-batch hash table?

> 4. As previously discussed, avoid hashing for distinct and then
> hashing the results for a hash join on the same column with the same
> operators.

This seems essentially an extension of idea 3.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Robert Haas wrote:
>>
>> While investigating some performance problems recently I've had cause
>> to think about the way PostgreSQL uses hash joins.  So here are a few
>> thoughts.  Some of these have been brought up before.
>>
>> 1. When the hash is not expected to spill to disk, it preserves the
>> pathkeys of the outer side of the join.  If the optimizer were allowed
>> to assume that, it could produce significantly more efficient query
>> plans in some cases.  The problem is what to do if we start executing
>> the query and find out that we have more stuff to hash than we expect,
>> such that we need multiple batches?  Now the results won't be sorted.
>> I think we could handle this as follows: Don't count on the hash join
>> to preserve pathkeys unless it helps, and only rely on it when it
>> seems as if the hash table will still fit even if it turns out to be,
>> say, three times as big as expected.  But if you are counting on the
>> hash join to preserve pathkeys, then pass that information to the
>> executor.  When the executor is asked to perform a hash join, it will
>> first hash the inner side of the relation.  At that point, we know
>> whether we've succesfully gotten everything into a single batch, or
>> not.  If we have, perform the join normally.  If the worst has
>> happened and we've gone multi-batch, then perform the join and sort
>> the output before returning it.  The performance will suck, but at
>> least you'll get the right answer.
>>
>> Previous in-passing reference to this idea here:
>> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php
>
> Hmm, instead of a sorting the output if the worst happens, a final merge
> step as in a merge sort would be enough.

Yeah - good point.

>> 2. Consider building the hash table lazily.  I often see query planner
>> pick a hash join over a nested inner indexscan because it thinks that
>> it'll save enough time making hash probes rather than index probes to
>> justify the time spent building the hash table up front.  But
>> sometimes the relation that's being hashed has a couple thousand rows,
>> only a tiny fraction of which will ever be retrieved from the hash
>> table.  We can predict when this is going to happen because n_distinct
>> for the outer column will be much less than the size of the inner rel.
>>  In that case, we could consider starting with an empty hash table
>> that effectively acts as a cache.  Each time a value is probed, we
>> look it up in the hash table.  If there's no entry, we use an index
>> scan to find the matching rows and insert them into the hash table.
>> Negative results must also be cached.
>
> Yeah, that would be quite nice. One problem is that our ndistinct estimates
> are not very accurate.

Well, the right solution to that problem is to fix our ndistinct estimates.  :-)

>> 3. Avoid building the exact same hash table twice in the same query.
>> This happens more often you'd think.  For example, a table may have
>> two columns creator_id and last_updater_id which both reference person
>> (id).  If you're considering a hash join between paths A and B, you
>> could conceivably check whether what is essentially a duplicate of B
>> has already been hashed somewhere within path A.  If so, you can reuse
>> that same hash table at zero startup-cost.
>
> That seems like a quite simple thing to do. But would it work for a
> multi-batch hash table?

I think not.

>> 4. As previously discussed, avoid hashing for distinct and then
>> hashing the results for a hash join on the same column with the same
>> operators.
>
> This seems essentially an extension of idea 3.

In theory, yes, but in practice, this case is easy to detect for an
arbitrary inner rel, and the other case is not.

Off to JDCon...

...Robert


Re: a few crazy ideas about hash joins

От
Kenneth Marshall
Дата:
On Fri, Apr 03, 2009 at 08:03:33AM -0400, Robert Haas wrote:
> On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > Robert Haas wrote:
> >>
> >> While investigating some performance problems recently I've had cause
> >> to think about the way PostgreSQL uses hash joins. ?So here are a few
> >> thoughts. ?Some of these have been brought up before.
> >>
> >> 1. When the hash is not expected to spill to disk, it preserves the
> >> pathkeys of the outer side of the join. ?If the optimizer were allowed
> >> to assume that, it could produce significantly more efficient query
> >> plans in some cases. ?The problem is what to do if we start executing
> >> the query and find out that we have more stuff to hash than we expect,
> >> such that we need multiple batches? ?Now the results won't be sorted.
> >> I think we could handle this as follows: Don't count on the hash join
> >> to preserve pathkeys unless it helps, and only rely on it when it
> >> seems as if the hash table will still fit even if it turns out to be,
> >> say, three times as big as expected. ?But if you are counting on the
> >> hash join to preserve pathkeys, then pass that information to the
> >> executor. ?When the executor is asked to perform a hash join, it will
> >> first hash the inner side of the relation. ?At that point, we know
> >> whether we've succesfully gotten everything into a single batch, or
> >> not. ?If we have, perform the join normally. ?If the worst has
> >> happened and we've gone multi-batch, then perform the join and sort
> >> the output before returning it. ?The performance will suck, but at
> >> least you'll get the right answer.
> >>
> >> Previous in-passing reference to this idea here:
> >> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php
> >
> > Hmm, instead of a sorting the output if the worst happens, a final merge
> > step as in a merge sort would be enough.
> 
> Yeah - good point.
> 
> >> 2. Consider building the hash table lazily. ?I often see query planner
> >> pick a hash join over a nested inner indexscan because it thinks that
> >> it'll save enough time making hash probes rather than index probes to
> >> justify the time spent building the hash table up front. ?But
> >> sometimes the relation that's being hashed has a couple thousand rows,
> >> only a tiny fraction of which will ever be retrieved from the hash
> >> table. ?We can predict when this is going to happen because n_distinct
> >> for the outer column will be much less than the size of the inner rel.
> >> ?In that case, we could consider starting with an empty hash table
> >> that effectively acts as a cache. ?Each time a value is probed, we
> >> look it up in the hash table. ?If there's no entry, we use an index
> >> scan to find the matching rows and insert them into the hash table.
> >> Negative results must also be cached.
> >
> > Yeah, that would be quite nice. One problem is that our ndistinct estimates
> > are not very accurate.
> 
> Well, the right solution to that problem is to fix our ndistinct estimates.  :-)

Also, as seen in the hash index build performance patch, it would be better
to set the initial hash size bigger than needed to avoid the inter-page
shuffle if the guess is wrong.

Ken



Re: a few crazy ideas about hash joins

От
"Lawrence, Ramon"
Дата:
> While investigating some performance problems recently I've had cause
> to think about the way PostgreSQL uses hash joins.  So here are a few
> thoughts.  Some of these have been brought up before.
>
> 1. When the hash is not expected to spill to disk, it preserves the
> pathkeys of the outer side of the join.  If the optimizer were allowed
> to assume that, it could produce significantly more efficient query
> plans in some cases.

This is definitely possible, but you will have to dynamically modify the
execution path if the hash join ends up to be more than one batch.

> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think.  For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id).  If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A.  If so, you can reuse
> that same hash table at zero startup-cost.
> 4. As previously discussed, avoid hashing for distinct and then
> hashing the results for a hash join on the same column with the same
> operators.
>
> Thoughts on the value and/or complexity of implementation of any of
these?

I would be interested in working with you on any of these changes to
hash join if you decide to pursue them.   I am especially interested in
looking at the hash aggregation code and potentially improving its
efficiency.

We have implemented a multi-way hash join (can join more than 2 tables
at a time) which may help with cases #3 and #4.  Performance results
look very good, and we are planning on building a patch for this over
the summer.

--
Ramon Lawrence


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Fri, Apr 3, 2009 at 11:24 AM, Lawrence, Ramon <ramon.lawrence@ubc.ca> wrote:
> I would be interested in working with you on any of these changes to
> hash join if you decide to pursue them.   I am especially interested in
> looking at the hash aggregation code and potentially improving its
> efficiency.

Wow, that would be great!

> We have implemented a multi-way hash join (can join more than 2 tables
> at a time) which may help with cases #3 and #4.  Performance results
> look very good, and we are planning on building a patch for this over
> the summer.

I'd be interested in hearing you cope with this on the planner end of things.

...Robert


Re: a few crazy ideas about hash joins

От
Simon Riggs
Дата:
On Thu, 2009-04-02 at 22:08 -0400, Robert Haas wrote:

> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think.  For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id).  If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A.  If so, you can reuse
> that same hash table at zero startup-cost.

This is also interesting because there is potential to save memory
through that approach, which allows us to allocate work_mem higher and
avoid multi-batch altogether.

I would be especially interested in using a shared memory hash table
that *all* backends can use - if the table is mostly read-only, as
dimension tables often are in data warehouse applications. That would
give zero startup cost and significantly reduced memory.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: a few crazy ideas about hash joins

От
Greg Stark
Дата:
>> 1. When the hash is not expected to spill to disk, it preserves the
>> pathkeys of the outer side of the join.  If the optimizer were allowed
>> to assume that, it could produce significantly more efficient query
>> plans in some cases.
>
> This is definitely possible, but you will have to dynamically modify the
> execution path if the hash join ends up to be more than one batch.

Yeah, this item seems to be another case where having a "conditional"
branch in the plan would be valuable. What's interesting is that it's
branching based on whether the hash has spilled into batches rather
than whether the number of rows is greater or less than some breakeven
point. It looks like these branche nodes are going to need to know
more than just the generic plan parameters. They're going to need to
know about specifics like "sort has spilled to disk" or "hash has
spilled into batches" etc.


I like the idea of coalescing hash tables. I'm not sure the order in
which the planner decides on things is conducive to being able to make
good decisions about it though. Even if we did it afterwards without
adjusting the planner it might still be worthwhile though.

Incidentally a similar optimization is possible for materialize or
even sorts. They may not come up nearly so often since you would
normally want to go around adding indexes if you're repeatedly sorting
on the same columns. Biut it might not be any harder to get them all
in one swoop.

--
greg


Re: a few crazy ideas about hash joins

От
Greg Stark
Дата:
On Fri, Apr 3, 2009 at 5:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> I would be especially interested in using a shared memory hash table
> that *all* backends can use - if the table is mostly read-only, as
> dimension tables often are in data warehouse applications. That would
> give zero startup cost and significantly reduced memory.

I think that's a non-starter due to visibility issues and handling
inserts and updates. Even just reusing a hash from one execution in a
later execution of the same plan would be tricky since we would have
to expire it if the snapshot changes.

Alternately, you could say that what you describe is addressed by hash
indexes. The fact that they're not great performers compared to
in-memory hashes comes down to dealing with updates and vaccum which
is pretty much the same issue.

Hm.  I wonder if we need a whole class of index algorithms to deal
specifically with read-only tables. A hash table on a read-only table
could spend a lot of time to generate a perfect or near-perfect hash
function and then pack the hash table very densely without any bucket
chains. That might make it a big winner over a dynamic structure which
has to deal with handling inserts and so on. I'm assuming that if you
mark the table read-write it just marks the index invalid and you have
to rebuild it later once you've marked the table read-only again.

-- 
greg


Re: a few crazy ideas about hash joins

От
Simon Riggs
Дата:
On Fri, 2009-04-03 at 18:03 +0100, Greg Stark wrote:

> I wonder if we need a whole class of index algorithms to deal
> specifically with read-only tables

I think we can drop the word "index" from the sentence as well.

"Read-only" isn't an isolated case. Often you find many read-only tables
alongside rapidly changing tables. So even the busiest of databases can
benefit from read-only optimisations. So I want MVCC *and* read only,
not MVCC everywhere (or MVCC nowhere if customer changes horses to get
read-only benefits elsewhere).

Having changes to those tables cause much heavier additional work is OK,
if judged on a cost/benefit basis. So the case I care about ought to be
called "read-mostly" but we're talking write:read ratios of millions:1.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: a few crazy ideas about hash joins

От
"Kevin Grittner"
Дата:
Simon Riggs <simon@2ndQuadrant.com> wrote: 
> "Read-only" isn't an isolated case. Often you find many read-only
tables
> alongside rapidly changing tables. So even the busiest of databases
can
> benefit from read-only optimisations.
> Having changes to those tables cause much heavier additional work is
OK,
> if judged on a cost/benefit basis. So the case I care about ought to
be
> called "read-mostly" but we're talking write:read ratios of
millions:1.
We have tables which are frequently JOINed to other tables in complex
SELECT statements, but which are only modified as part of a software
release.  It would be OK with us if switching between modifiable or
not actually took as much time as, for example, a CLUSTER command if
it gave us a performance benefit when used in these complex queries
when in read-only mode.
-Kevin


Re: a few crazy ideas about hash joins

От
"Lawrence, Ramon"
Дата:
> > I would be especially interested in using a shared memory hash table
> > that *all* backends can use - if the table is mostly read-only, as
> > dimension tables often are in data warehouse applications. That
would
> > give zero startup cost and significantly reduced memory.
>
> I think that's a non-starter due to visibility issues and handling
> inserts and updates. Even just reusing a hash from one execution in a
> later execution of the same plan would be tricky since we would have
> to expire it if the snapshot changes.

If your data set is nearly read-only, materialized views would be a
better way to go and would require no hash join changes.

The idea of perfect hash functions for dimension tables is very
interesting.  If the data set is near static, it is possible to compute
them once in a few minutes time for a million tuple table and then
re-use them until they change.  The research has shown it is possible,
but I do not know if anyone has actually implemented it in a real DBMS.
An implementation could be something to try if there is interest.

--
Ramon Lawrence


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Fri, Apr 3, 2009 at 12:55 PM, Greg Stark <stark@enterprisedb.com> wrote:
>>> 1. When the hash is not expected to spill to disk, it preserves the
>>> pathkeys of the outer side of the join.  If the optimizer were allowed
>>> to assume that, it could produce significantly more efficient query
>>> plans in some cases.
>>
>> This is definitely possible, but you will have to dynamically modify the
>> execution path if the hash join ends up to be more than one batch.
>
> Yeah, this item seems to be another case where having a "conditional"
> branch in the plan would be valuable. What's interesting is that it's
> branching based on whether the hash has spilled into batches rather
> than whether the number of rows is greater or less than some breakeven
> point. It looks like these branche nodes are going to need to know
> more than just the generic plan parameters. They're going to need to
> know about specifics like "sort has spilled to disk" or "hash has
> spilled into batches" etc.

In this particular case, the operation that has to be performed is
more specific than "sort", it's "merge this set of sorted tapes".  So
it's more subtle than just inserting a sort node.

> I like the idea of coalescing hash tables. I'm not sure the order in
> which the planner decides on things is conducive to being able to make
> good decisions about it though. Even if we did it afterwards without
> adjusting the planner it might still be worthwhile though.

I don't see why hash_inner_and_outer can't walk the outer path looking
for suitable hashes to reuse.  I think the question is how aggressive
we want to be in performing that search.  If we limit the search to
hashes on base tables without restriction conditions, we'll probably
catch most of the interesting cases, but obviously not all of them.
If we look for hashes on baserels with restriction conditions or
hashes on joinrels, etc., we might pick up a few more cases, but at
the expense of increased planning time.

The advantage of searching in the executor, I suppose, is that the
checks don't have to be as cheap, since you're only checking the plan
that has already won, rather than lots and lots of potential plans.
On the other hand, your costing will be less accurate, which could
lead to bad decisions elsewhere.

> Incidentally a similar optimization is possible for materialize or
> even sorts. They may not come up nearly so often since you would
> normally want to go around adding indexes if you're repeatedly sorting
> on the same columns. Biut it might not be any harder to get them all
> in one swoop.

At least in my experience, sort and materialize nodes are pretty rare,
so I'm not sure it would be worth the time it would take to search for
them.  In most cases it seems to be cheaper to hash the inner and
outer paths than to sort even one of them and then merge-join the
result.  When I do get these sorts of paths, they tend to be in
larger, more complex queries where there's less chance of useful
reuse.  But my experience might not be representative...

...Robert


Re: a few crazy ideas about hash joins

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I don't see why hash_inner_and_outer can't walk the outer path looking
> for suitable hashes to reuse.  I think the question is how aggressive
> we want to be in performing that search.

Correct, but you've got the details all wrong.  The real problem is that
the planner might discard a join path hash(A,B) at level 2 because it
loses compared to, say, merge(A,B).  But when we get to level three,
perhaps hash(hash(A,B),C) would've been the best plan due to synergy
of the two hashes.  We'll never find that out unless we keep the
"inferior" hash path around.  We can certainly do that; the question
is what's it going to cost us to allow more paths to survive to the
next join level.  (And I'm afraid the answer may be "plenty"; it's a
combinatorial explosion we're looking at here.)

>> Incidentally a similar optimization is possible for materialize or
>> even sorts.

The planner already goes to great lengths to avoid extra sorts for
multiple levels of mergejoin ... that's what all the "pathkey" thrashing
is about.  It's pretty expensive.
        regards, tom lane


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I don't see why hash_inner_and_outer can't walk the outer path looking
>> for suitable hashes to reuse.  I think the question is how aggressive
>> we want to be in performing that search.
>
> Correct, but you've got the details all wrong.  The real problem is that
> the planner might discard a join path hash(A,B) at level 2 because it
> loses compared to, say, merge(A,B).  But when we get to level three,
> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> of the two hashes.  We'll never find that out unless we keep the
> "inferior" hash path around.  We can certainly do that; the question
> is what's it going to cost us to allow more paths to survive to the
> next join level.  (And I'm afraid the answer may be "plenty"; it's a
> combinatorial explosion we're looking at here.)

That would be crazy.  I think doing it the way I suggested is correct,
just not guaranteed to catch every case.  The reality is that even if
we took Greg Stark's suggestion of detecting this situation only in
the executor, we'd still get some benefit out of this.  If we take my
intermediate approach, we'll catch more cases where this is a win.
What you're suggesting here would catch every conceivable case, but at
the expense of what I'm sure would be an unacceptable increase in
planning time for very limit benefit.

...Robert


Re: a few crazy ideas about hash joins

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Correct, but you've got the details all wrong. �The real problem is that
>> the planner might discard a join path hash(A,B) at level 2 because it
>> loses compared to, say, merge(A,B). �But when we get to level three,
>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
>> of the two hashes. �We'll never find that out unless we keep the
>> "inferior" hash path around. �We can certainly do that; the question
>> is what's it going to cost us to allow more paths to survive to the
>> next join level. �(And I'm afraid the answer may be "plenty"; it's a
>> combinatorial explosion we're looking at here.)

> That would be crazy.  I think doing it the way I suggested is correct,
> just not guaranteed to catch every case.  The reality is that even if
> we took Greg Stark's suggestion of detecting this situation only in
> the executor, we'd still get some benefit out of this.  If we take my
> intermediate approach, we'll catch more cases where this is a win.
> What you're suggesting here would catch every conceivable case, but at
> the expense of what I'm sure would be an unacceptable increase in
> planning time for very limit benefit.

Maybe, maybe not.  I've seen plenty of plans that have several
mergejoins stacked up on top of each other with no intervening sorts.
There is 0 chance that the planner would have produced that if it
thought that it had to re-sort at each level; something else would have
looked cheaper.  I think that your proposals will end up getting very
little of the possible benefit, because the planner will fail to choose
plan trees in which the optimization can be exploited.
        regards, tom lane


Re: a few crazy ideas about hash joins

От
Grzegorz Jaskiewicz
Дата:
On 3 Apr 2009, at 19:44, Lawrence, Ramon wrote:

>>> I would be especially interested in using a shared memory hash table
>>> that *all* backends can use - if the table is mostly read-only, as
>>> dimension tables often are in data warehouse applications. That
> would
>>> give zero startup cost and significantly reduced memory.
>>
>> I think that's a non-starter due to visibility issues and handling
>> inserts and updates. Even just reusing a hash from one execution in a
>> later execution of the same plan would be tricky since we would have
>> to expire it if the snapshot changes.
>
> If your data set is nearly read-only, materialized views would be a
> better way to go and would require no hash join changes.

I think what he means is that some of the tables in join are  
effectively read-only. So materialized views are nono here. Unless you  
mean just a partial ones.
I have to say, that frankly I got same problem, and plausibly my  
schemas could benefit from changes discussed here.



Re: a few crazy ideas about hash joins

От
Dimitri Fontaine
Дата:
Hi,

Le 3 avr. 09 à 22:29, Tom Lane a écrit :
> Correct, but you've got the details all wrong.  The real problem is
> that
> the planner might discard a join path hash(A,B) at level 2 because it
> loses compared to, say, merge(A,B).  But when we get to level three,
> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> of the two hashes.  We'll never find that out unless we keep the
> "inferior" hash path around.  We can certainly do that; the question
> is what's it going to cost us to allow more paths to survive to the
> next join level.  (And I'm afraid the answer may be "plenty"; it's a
> combinatorial explosion we're looking at here.)

I remember having done some board game simulation project at school,
using alpha-beta algorithms with cuts, and as an optimisation a
minimax too. Those are heuristics, but that you can decide to run on
the full set of possible trees when you want a global optimum rather
than a local one.

Now, I don't know the specifics of the planner code, but would it be
possible to use a minimax kind of heuristic? Then a planner effort GUC
would allow users to choose whether they want to risk the "plenty"
combinatorial explosion in some requests.

It could be that the planner already is smarter than this of course,
and I can't even say I'd be surprised about it, but still trying...
--
dim

http://en.wikipedia.org/wiki/Minimax

Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Correct, but you've got the details all wrong.  The real problem is that
>>> the planner might discard a join path hash(A,B) at level 2 because it
>>> loses compared to, say, merge(A,B).  But when we get to level three,
>>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
>>> of the two hashes.  We'll never find that out unless we keep the
>>> "inferior" hash path around.  We can certainly do that; the question
>>> is what's it going to cost us to allow more paths to survive to the
>>> next join level.  (And I'm afraid the answer may be "plenty"; it's a
>>> combinatorial explosion we're looking at here.)
>
>> That would be crazy.  I think doing it the way I suggested is correct,
>> just not guaranteed to catch every case.  The reality is that even if
>> we took Greg Stark's suggestion of detecting this situation only in
>> the executor, we'd still get some benefit out of this.  If we take my
>> intermediate approach, we'll catch more cases where this is a win.
>> What you're suggesting here would catch every conceivable case, but at
>> the expense of what I'm sure would be an unacceptable increase in
>> planning time for very limit benefit.
>
> Maybe, maybe not.  I've seen plenty of plans that have several
> mergejoins stacked up on top of each other with no intervening sorts.
> There is 0 chance that the planner would have produced that if it
> thought that it had to re-sort at each level; something else would have
> looked cheaper.  I think that your proposals will end up getting very
> little of the possible benefit, because the planner will fail to choose
> plan trees in which the optimization can be exploited.

Well, I'm all ears if you have suggestions for improvement.  For
sorts, we use PathKeys to represent the ordering of each path and keep
the paths for each set of pathkeys.  By analogy, we could maintain a
list of PathHash structures for each path representing the tables that
had already been hashed.  add_path() would then have to consider both
the PathHash structures and the PathKey structures before concluding
that a path was definitely worse than some path previously found.  At
each level of the join tree, we'd need to truncate PathHash structures
that provably have no further use (e.g. on a base table that does not
appear again above the level of the join already planned) to avoid
keeping around paths that appeared to be better only because we didn't
know that the paths they have hashed are worthless in practice.  Maybe
that wouldn't even be that expensive, actually, because there will be
lots of cases where you know the relevant table doesn't appear
elsewhere in the query and not save any extra paths.  But I think we'd
have to write the code and benchmark it to really know.

I guess the reason I'm not too worked up about this is because my
experience is that the planner nearly always prefers hash joins on
small tables, even when an index is present - the queries I'm worried
about optimizing don't need any additional encouragement to use hash
joins; they're doing it already.  But certainly it doesn't hurt to see
how many cases we can pick up.

...Robert


Re: a few crazy ideas about hash joins

От
Bruce Momjian
Дата:
Are there any TODOs here?

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Correct, but you've got the details all wrong. ?The real problem is that
> >>> the planner might discard a join path hash(A,B) at level 2 because it
> >>> loses compared to, say, merge(A,B). ?But when we get to level three,
> >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> >>> of the two hashes. ?We'll never find that out unless we keep the
> >>> "inferior" hash path around. ?We can certainly do that; the question
> >>> is what's it going to cost us to allow more paths to survive to the
> >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a
> >>> combinatorial explosion we're looking at here.)
> >
> >> That would be crazy. ?I think doing it the way I suggested is correct,
> >> just not guaranteed to catch every case. ?The reality is that even if
> >> we took Greg Stark's suggestion of detecting this situation only in
> >> the executor, we'd still get some benefit out of this. ?If we take my
> >> intermediate approach, we'll catch more cases where this is a win.
> >> What you're suggesting here would catch every conceivable case, but at
> >> the expense of what I'm sure would be an unacceptable increase in
> >> planning time for very limit benefit.
> >
> > Maybe, maybe not. ?I've seen plenty of plans that have several
> > mergejoins stacked up on top of each other with no intervening sorts.
> > There is 0 chance that the planner would have produced that if it
> > thought that it had to re-sort at each level; something else would have
> > looked cheaper. ?I think that your proposals will end up getting very
> > little of the possible benefit, because the planner will fail to choose
> > plan trees in which the optimization can be exploited.
> 
> Well, I'm all ears if you have suggestions for improvement.  For
> sorts, we use PathKeys to represent the ordering of each path and keep
> the paths for each set of pathkeys.  By analogy, we could maintain a
> list of PathHash structures for each path representing the tables that
> had already been hashed.  add_path() would then have to consider both
> the PathHash structures and the PathKey structures before concluding
> that a path was definitely worse than some path previously found.  At
> each level of the join tree, we'd need to truncate PathHash structures
> that provably have no further use (e.g. on a base table that does not
> appear again above the level of the join already planned) to avoid
> keeping around paths that appeared to be better only because we didn't
> know that the paths they have hashed are worthless in practice.  Maybe
> that wouldn't even be that expensive, actually, because there will be
> lots of cases where you know the relevant table doesn't appear
> elsewhere in the query and not save any extra paths.  But I think we'd
> have to write the code and benchmark it to really know.
> 
> I guess the reason I'm not too worked up about this is because my
> experience is that the planner nearly always prefers hash joins on
> small tables, even when an index is present - the queries I'm worried
> about optimizing don't need any additional encouragement to use hash
> joins; they're doing it already.  But certainly it doesn't hurt to see
> how many cases we can pick up.
> 
> ...Robert
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Are there any TODOs here?

I'd say that all of the items listed in my original email could be
TODOs.  I'm planning to work on as many of them as I have time for.
Ramon Lawrence is also working on some related ideas, as discussed
upthread.  AFAICS no one has expressed the idea that anything that's
been talked about is a bad idea, so it's just a question of finding
enough round tuits.

...Robert


Re: a few crazy ideas about hash joins

От
Bruce Momjian
Дата:
Robert Haas wrote:
> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > Are there any TODOs here?
> 
> I'd say that all of the items listed in my original email could be
> TODOs.  I'm planning to work on as many of them as I have time for.
> Ramon Lawrence is also working on some related ideas, as discussed
> upthread.  AFAICS no one has expressed the idea that anything that's
> been talked about is a bad idea, so it's just a question of finding
> enough round tuits.

OK, would you please add them to the Index/Hash section of the TODO
list;  I am afraid I will not do the justice.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > Are there any TODOs here?
>>
>> I'd say that all of the items listed in my original email could be
>> TODOs.  I'm planning to work on as many of them as I have time for.
>> Ramon Lawrence is also working on some related ideas, as discussed
>> upthread.  AFAICS no one has expressed the idea that anything that's
>> been talked about is a bad idea, so it's just a question of finding
>> enough round tuits.
>
> OK, would you please add them to the Index/Hash section of the TODO
> list;  I am afraid I will not do the justice.

I think perhaps Optimizer / Executor would be more appropriate, since
these are not about hash indices but rather about hash joins.  I will
look at doing that.

Also I think the last item under Index / Hash is actually NOT done,
and belongs in the main index section rather than Index / Hash.

The first item in the Index / Hash section doesn't really look like a
TODO, or at the very least it's unclear what the action item is
supposed to be.

...Robert


Re: a few crazy ideas about hash joins

От
Bruce Momjian
Дата:
Robert Haas wrote:
> On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Robert Haas wrote:
> >> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > Are there any TODOs here?
> >>
> >> I'd say that all of the items listed in my original email could be
> >> TODOs. ?I'm planning to work on as many of them as I have time for.
> >> Ramon Lawrence is also working on some related ideas, as discussed
> >> upthread. ?AFAICS no one has expressed the idea that anything that's
> >> been talked about is a bad idea, so it's just a question of finding
> >> enough round tuits.
> >
> > OK, would you please add them to the Index/Hash section of the TODO
> > list; ?I am afraid I will not do the justice.
> 
> I think perhaps Optimizer / Executor would be more appropriate, since
> these are not about hash indices but rather about hash joins.  I will
> look at doing that.

Yes, please.

> Also I think the last item under Index / Hash is actually NOT done,
> and belongs in the main index section rather than Index / Hash.

Yep, I didn't realize that editing "Index" also does the subsections,
while editing the subsections doesn't edit the upper level.

> The first item in the Index / Hash section doesn't really look like a
> TODO, or at the very least it's unclear what the action item is
> supposed to be.

Yep, remove, thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: a few crazy ideas about hash joins

От
Chris Dunlop
Дата:
On 2009-04-03, Simon Riggs <simon@2ndQuadrant.com> wrote:
>
> On Fri, 2009-04-03 at 18:03 +0100, Greg Stark wrote:
>
>> I wonder if we need a whole class of index algorithms to deal
>> specifically with read-only tables
>
> I think we can drop the word "index" from the sentence as well.
>
> "Read-only" isn't an isolated case. Often you find many read-only tables
> alongside rapidly changing tables. So even the busiest of databases can
> benefit from read-only optimisations. So I want MVCC *and* read only,
> not MVCC everywhere (or MVCC nowhere if customer changes horses to get
> read-only benefits elsewhere).
>
> Having changes to those tables cause much heavier additional work is OK,
> if judged on a cost/benefit basis. So the case I care about ought to be
> called "read-mostly" but we're talking write:read ratios of millions:1.

For the record and in case anyone gets interested in following up the
idea of "read only" tables, see also:

http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/76366



Re: a few crazy ideas about hash joins

От
Robert Haas
Дата:
On Tue, Apr 7, 2009 at 5:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> I think perhaps Optimizer / Executor would be more appropriate, since
>> these are not about hash indices but rather about hash joins.  I will
>> look at doing that.
>
> Yes, please.

Done.  See what you think...

>> Also I think the last item under Index / Hash is actually NOT done,
>> and belongs in the main index section rather than Index / Hash.
>
> Yep, I didn't realize that editing "Index" also does the subsections,
> while editing the subsections doesn't edit the upper level.

Heh.  I'd write some webapps to do some of these things, but I haven't
been able to interest anyone in providing me with a
postgresql.org-based hosting arrangement.

>> The first item in the Index / Hash section doesn't really look like a
>> TODO, or at the very least it's unclear what the action item is
>> supposed to be.
>
> Yep, remove, thanks.

...Robert