Re: tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: tweaking NTUP_PER_BUCKET
Дата
Msg-id 53B95A44.7010402@fuzzy.cz
обсуждение исходный текст
Ответ на Re: tweaking NTUP_PER_BUCKET  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On 6.7.2014 06:47, Stephen Frost wrote:
> * Greg Stark (stark@mit.edu) wrote:
>> Last time was we wanted to use bloom filters in hash joins to
>> filter out tuples that won't match any of the future hash batches
>> to reduce the amount of tuples that need to be spilled to disk.
>> However the problem was that it was unclear for a given amount of
>> memory usage how to pick the right size bloom filter and how to
>> model how much it would save versus how much it would cost in
>> reduced hash table size.
> 
> Right. There's only one hash function available, really, (we don't 
> currently support multiple hash functions..), unless we want to try
> and re-hash the 32bit hash value that we get back (not against trying
> that, but it isn't what I'd start with) and it would hopefully be
> sufficient for this.

I don't really see a need to add more hashing functions. Use the hash
value itself as an input to the bloom filter seems just fine to me.

According to [1] the expected number of collisions is
  n - k + k * math.pow((1 - 1/k), n)

where 'k' is the number of possible values (2^32 in our case), and 'n'
is the number of inserts (i.e. values inserted into the table). With a
1GB work_mem and ~50B per row, that's ~20M rows. According to the
formula, that's ~50k collisions. In other words, 0.25% of false
positives. This seems more than sufficient for the bloom filter, and if
0.25% of false positives causes it to be inefficient I'd say the gains
are not that great anyway. (Unless my calculations are somehow flawed,
of course).

[1] https://www.cs.duke.edu/courses/cps102/spring10/Lectures/L-18.pdf

>> I think it just required some good empirical tests and hash join
>> heavy workloads to come up with some reasonable guesses. We don't
>> need a perfect model just some reasonable bloom filter size that
>> we're pretty sure will usually help more than it hurts.
> 
> This would help out a lot of things, really.. Perhaps what Tomas is 
> developing regarding test cases would help here also.

The test cases might be reused I guess. However I still don't have a
clear idea of how exactly the bloom filter would be implemented. In the
threads I found it was suggested to use per-bucket filtersm, which seems
really strange to me. I see two options:

(a) global filter
   - bloom filter sized according to estimates from planner   - ... so we're screwed if the estimates are significantly
wrong  - create the filter in ExecHashTableCreate   - insert all tuples in ExecHashTableInsert (all batches)
 

(b) per-batch filter
   - may be built after the batching is completed   - we have reliable numbers at this point (not just estimates)   -
per-batchfilters may be smaller (only fraction of tuples)
 

So the per-batch filter seems to be a better approach. The question is
the sizing of the bloom filter. According to [2] there are three rules
of thumb for sizing bloom filters:
  (1) 1B per item, to get 2% error rate  (2) 0.7 * bits of hash functions (~5 when using 1B / item)  (3) number of
hashesdominates the performance (more to compute,      more memory accesses)
 

When working with 20M rows, this means ~20MB and 5 hash functions. Not
that bad, although we don't know how expensive it's to build the filter.

[2] http://corte.si/posts/code/bloom-filter-rules-of-thumb/index.html

However this is somewhat flawed because this assumes the 20M hashed
values are distinct, so if the inner table contains duplicate keys, we
might use smaller bloom filter. But we don't know that value (ndistinct
estimates are rather unreliable, but we might use hyperloglog counter to
do this).

Also, there are cases when we know there will be no misses (e.g. a join
on a FK), and in that case it's pointless to build the bloom filter.
Would be nice to get some flag from the planner whether to build it, a
threshold when to build it (e.g. if rowcount is > 10k, build the filter).

Doing something similar for the two patches I posted (tweaking the
nbuckets dynamically) might also benefit from such planning information,
although it's not that critical for them.

regards
Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: 9.4 logical replication - walsender keepalive replies
Следующее
От: Emre Hasegeli
Дата:
Сообщение: Re: Selectivity estimation for inet operators