Re: Hash Join cost estimates

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Hash Join cost estimates
Дата
Msg-id 20130330205242.GZ4361@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Hash Join cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I think the point is that there may *not* be few hash collisions ...

Right, but that's actually all entirely based on concerns over there
being duplicates (hence why we consider the MCVs and ndistinct), which
makes *some* sense given that we currently have a single linked-list in
each bucket into which any dups are placed.

It occurs to me that we could get away from this by having a 2-level
system.  We hash to a bucket which contains a linked list of linked
lists.  The bucket list would be for actual collisions (which we don't
consider at all in the current bucket estimating) and each of those
entries would be a linked list of duplicates for that entry in the
bucket.  This would add some small cost for building the hash, since we
would have to step through each entry in the bucket to determine if the
entry being added is a new entry or not, but not very much unless we're
worried about lots of collisions happening (which I certainly hope isn't
the case).  Hash tables are generally expected to take more effort to
build initially anyway, so I don't see a problem putting more logic
there.  Also, we could skip checking each entry in the bucket when the
input is known to be unique and instead just skip to the end of the
bucket since the new entry can't match any existing.

We could still work through the bucketing logic and add some costing to
that case for those situations where we are hashing on only one key of a
multi-key join and we expect a lot of duplicates to exist.  I'm not sure
how much that happens though- I would hope that we would use a composite
hash key most of the time that we have multi-key joins that use hash
tables.

Thoughts?
Thanks,
    Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Hash Join cost estimates
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Hash Join cost estimates