Re: tweaking NTUP_PER_BUCKET

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: tweaking NTUP_PER_BUCKET
Дата	3 июля 2014 г. 21:50:42
Msg-id	53B5A5FA.4050705@fuzzy.cz обсуждение исходный текст
Ответ на	Re: tweaking NTUP_PER_BUCKET (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: tweaking NTUP_PER_BUCKET (Tomas Vondra <tv@fuzzy.cz>)
Список	pgsql-hackers

Дерево обсуждения

Hi Stephen,

On 3.7.2014 20:10, Stephen Frost wrote:
> Tomas,
> 
> * Tomas Vondra (tv@fuzzy.cz) wrote:
>> However it's likely there are queries where this may not be the case,
>> i.e. where rebuilding the hash table is not worth it. Let me know if you
>> can construct such query (I wasn't).
> 
> Thanks for working on this! I've been thinking on this for a while
> and this seems like it may be a good approach. Have you considered a
> bloom filter over the buckets..? Also, I'd suggest you check the

I know you've experimented with it, but I haven't looked into that yet.

> archives from about this time last year for test cases that I was
> using which showed cases where hashing the larger table was a better
> choice- those same cases may also show regression here (or at least
> would be something good to test).

Good idea, I'll look at the test cases - thanks.

> Have you tried to work out what a 'worst case' regression for this 
> change would look like? Also, how does the planning around this
> change? Are we more likely now to hash the smaller table (I'd guess
> 'yes' just based on the reduction in NTUP_PER_BUCKET, but did you
> make any changes due to the rehashing cost?)?

The case I was thinking about is underestimated cardinality of the inner
table and a small outer table. That'd lead to a large hash table and
very few lookups (thus making the rehash inefficient). I.e. something
like this:
 Hash Join    Seq Scan on small_table (rows=100) (actual rows=100)    Hash       Seq Scan on bad_estimate (rows=100)
(actualrows=1000000000)           Filter: ((a < 100) AND (b < 100))

But I wasn't able to reproduce this reasonably, because in practice
that'd lead to a nested loop or something like that (which is a planning
issue, impossible to fix in hashjoin code).

Tomas

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Atri Sharma
Дата: 03 июля 2014 г., 21:40:40
Сообщение: Re: tweaking NTUP_PER_BUCKET

Следующее

От: Greg Stark
Дата: 03 июля 2014 г., 21:52:31
Сообщение: Re: tweaking NTUP_PER_BUCKET

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: tweaking NTUP_PER_BUCKET

Предыдущее

Следующее