Hi,
On 10/27/2017 05:22 PM, Sokolov Yura wrote:
>
> Hi, Tomas
>
> BRIN bloom index is a really cool feature, that definitely should be in
> core distribution (either in contrib or builtin)!!!
>
> Small suggestion for algorithm:
>
> It is well known practice not to calculate whole hash function for every
> point, but use double hashing approach.
> Example of proving double hashing approach for bloom filters:
> https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
>
> So, instead of:
> for (i = 0; i < filter->nhashes; i++)
> {
> /* compute the hashes with a seed, used for the bloom filter */
> uint32 h = ((uint32)DatumGetInt64(hash_uint32_extended(value,
> i))) % filter->nbits;
>
> /* set or check bit h */
> }
>
> such construction could be used:
> /* compute the hashes, used for the bloom filter */
> uint32 big_h = ((uint32)DatumGetInt64(hash_uint32_extended(value, i)));
> uint32 h = big_h % filter->nbits;
> /* ensures d is never >= filter->nbits */
> uint32 d = big_h % (filter->nbits - 1);
>
> for (i = 0; i < filter->nhashes; i++)
> {
> /* set or check bit h */
>
> /* compute next bit h */
> h += d++;
> if (h >= filter->nbits) h -= filter->nbits;
> if (d == filter->nbits) d = 0;
> }
>
> Modulo of one 64bit result by two coprime numbers (`nbits` and `nbits-1`)
> gives two good-quality functions usable for double hashing.
>
OK, thanks for the suggestion.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers