Re: A better way than tweaking NTUP_PER_BUCKET

Поиск

Список

Период

Сортировка

От	Stephen Frost
Тема	Re: A better way than tweaking NTUP_PER_BUCKET
Дата	26 июня 2013 г. 15:51:08
Msg-id	20130626155058.GD5952@tamriel.snowman.net обсуждение исходный текст
Ответ на	Re: A better way than tweaking NTUP_PER_BUCKET (Atri Sharma <atri.jiit@gmail.com>)
Ответы	Re: A better way than tweaking NTUP_PER_BUCKET
Список	pgsql-hackers

Дерево обсуждения

* Atri Sharma (atri.jiit@gmail.com) wrote:
> My point is that I would like to help in the implementation, if possible. :)

Feel free to go ahead and implement it..  I'm not sure when I'll have a
chance to (probably not in the next week or two anyway).  Unfortunately,
the bigger issue here is really about testing the results and
determining if it's actually faster/better with various data sets
(including ones which have duplicates).  I've got one test data set
which has some interesting characteristics (for one thing, hashing the
"large" side and then seq-scanning the "small" side is actually faster
than going the other way, which is quite 'odd' imv for a hashing
system): http://snowman.net/~sfrost/test_case2.sql

You might also look at the other emails that I sent regarding this
subject and NTUP_PER_BUCKET.  Having someone confirm what I saw wrt
changing that parameter would be nice and it would be a good comparison
point against any kind of pre-filtering that we're doing.

One thing that re-reading the bloom filter description reminded me of is
that it's at least conceivable that we could take the existing hash
functions for each data type and do double-hashing or perhaps seed the
value to be hashed with additional data to produce an "independent" hash
result to use.  Again, a lot of things that need to be tested and
measured to see if they improve overall performance.
Thanks,
    Stephen

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: A better way than tweaking NTUP_PER_BUCKET