Re: Hash Join cost estimates

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Hash Join cost estimates
Дата
Msg-id 1364599244.1187.186.camel@sussancws0025
обсуждение исходный текст
Ответ на Re: Hash Join cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Hash Join cost estimates  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Fri, 2013-03-29 at 16:37 -0400, Tom Lane wrote: 
> Jeff Davis <pgsql@j-davis.com> writes:
> > Yes, I have run into this issue (or something very similar). I don't
> > understand why the bucketsize even matters much -- assuming few hash
> > collisions, we are not actually evaluating the quals any more times than
> > necessary. So why all of the hashjoin-specific logic in determining the
> > number of qual evaluations? The only reason I can think of is to model
> > the cost of comparing the hashes themselves.
> 
> I think the point is that there may *not* be few hash collisions ...

In Stephen's case the table was only 41KB, so something still seems off.
Maybe we should model the likelihood of a collision based on the
cardinalities (assuming a reasonably good hash function)?

Also, I think I found an important assumption that seems dubious (in
comment for estimate_hash_bucketsize()):

"If the other relation in the join has a key distribution similar to
this one's, then the most-loaded buckets are exactly those that will be
probed most often.  Therefore, the "average" bucket size for costing
purposes should really be taken as something close to the "worst case"
bucket size.  We try to estimate this by adjusting the fraction if there
are too few distinct data values, and then scaling up by the ratio of
the most common value's frequency to the average frequency."

But the key distribution is not necessarily similar at all... the large
table might have many more distinct values.

Stephen, do you think this could explain your problem?

Regards,Jeff Davis






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Add parallel pg_dump option.
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Getting to 9.3 beta