Re: WIP: bloom filter in Hash Joins with batches

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: WIP: bloom filter in Hash Joins with batches
Дата
Msg-id 567C2374.2010006@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: WIP: bloom filter in Hash Joins with batches  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers

On 12/24/2015 02:51 PM, Simon Riggs wrote:
> On 17 December 2015 at 16:00, Tomas Vondra <tomas.vondra@2ndquadrant.com
> <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>     On 12/17/2015 11:44 AM, Simon Riggs wrote:
>
>
>         My understanding is that the bloom filter would be ineffective
>         in any of
>         these cases
>         * Hash table is too small
>
>
>     Yes, although it depends what you mean by "too small".
>
>     Essentially if we can do with a single batch, then it's cheaper to
>     do a single lookup in the hash table instead of multiple lookups in
>     the bloom filter. The bloom filter might still win if it fits into
>     L3 cache, but that seems rather unlikely.
>
>         * Bloom filter too large
>
>
>     Too large with respect to what?
>
>     One obvious problem is that the bloom filter is built for all
>     batches at once, i.e. for all tuples, so it may be so big won't fit
>     into work_mem (or takes a significant part of it). Currently it's
>     not accounted for, but that'll need to change.
>
>
> The benefit seems to be related to cacheing, or at least that memory
> speed is critical. If the hash table is too small, or the bloom
> filter too large then there would be no benefit from performing the
> action (Lookup Bloom then maybe Lookup Hash) compared with just
> doing(LookupHash).
>
> So the objective must be to get a Bloom Filter that is small enough
> that it lives in a higher/faster level of cache than the main Hash
> table. Or possibly that we separate that into a two stage process so
> that the first level can be applied by a GPU and then later checked
> against hash outside of a GPU.

I don't think that's quite true.

Had the primary goal been to replace expensive hashtable lookups with 
cheap bloom filter checks, then sure - the size of the bloom filter 
would be crucial (namely getting the whole bloom filter into L3, which 
is an order of magnitude faster compared to RAM).

But that's not what the patch does - it aims to reduce the amount of 
data that need to be serialized/deserialized when batching is necessary, 
trading those costs for the bloom overhead. The size of the bloom filter 
may still matter, but it's way less important because the 
(de)serialization overhead is much more expensive than the hash lookup.

FWIW I've been trying to use the bloom filters in the "traditional" way 
(doing "cheap" bloom checks before hashtable lookups), but I've been 
unable to get any measurable benefit. Either I was doing it wrong 
somehow, or perhaps NTUP_PER_BUCKET=1 made the hashtable too fast (it 
essentially means we need a single lookup to see if the entry exists, 
while bloom filter needs several lookups, depending on the desired false 
positive ratio).

So I concluded that the attempt to use the bloom filter that way is 
futile, and that the batching case seems like the only scenario where 
the bloom filters may still be useful.

There's a number of additional problems with sizing the bloom filter to 
fit into L3:

(1) We have no idea how much L3 there is, although this might be fixed    by either adding a GUC, read it from
/proc/cpuinfoor just use some    reasonable default (say, most Xeon CPUs today have ~20 MB)
 

(2) The L3 is not per-process, but shared by all processes, so we don't    really have all the L3, but perhaps just
L3/Nwhere N is the number    of backends.
 

(3) The size of the bloom filter depends on the number of items it    should handle and false positive rate. I've used
5%in the patch,    but maybe it'd make sense to increase this in order to get smaller    filter?
 

Also, the bloom filter does not compete with the whole hash table, but 
"just" with the "buckets" as that's sufficient to determine if there's a 
tuple for a key or not.

Let's do some math for a hash table with 100k rows (so a fairly small 
hash table):
   * nbuckets = 131072 (2^17), so 1MB   * bloom filter (5% false positive rate [1]): 76kB, k=4 (see [1])

So both the buckets and bloom fit into L3, and the bloom filter is 
unlikely to speed things up.

The difference between buckets and bloom filter is about 10x, and both 
sizes scale about linearly (with 1M items it'll be ~10MB vs. 760kB), so 
there clearly is an area where bloom filter fits into L3 while buckets 
don't.

Sadly I haven't been able to exploit that, but I'll try again.

In any case, none of this can help unless the bloom filter really 
eliminates many lookups.

[1] http://hur.st/bloomfilter?n=100000&p=0.05


>
> I think you also need to consider whether we use a hash bloom filter or
> just simply apply an additional range predicate. The latter idea is
> similar to my earlier thoughts here
> http://www.postgresql.org/message-id/CA+U5nMLYf2cgbq+YUw-ArLBTcPrqanBf5QiFEC-PBRJCFzOngg@mail.gmail.com

Interesting, but I don't really see how this is related to bloom filters?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Emre Hasegeli
Дата:
Сообщение: Re: BRIN cost estimate
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Commit fest status for 2015-11