Bloom Filter lookup for hash joins

Поиск
Список
Период
Сортировка
От Atri Sharma
Тема Bloom Filter lookup for hash joins
Дата
Msg-id CAOeZVif-R-iLF966wEipk5By-KhzVLOqpWqurpaK3p5fYw-Rdw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bloom Filter lookup for hash joins  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Bloom Filter lookup for hash joins  (Ants Aasma <ants@cybertec.at>)
Re: Bloom Filter lookup for hash joins  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
Hi All,

I have been researching bloom filters and discussed it on IRC with
RhodiumToad and David Fetter, and they pointed me to the various
places that could potentially have bloom filters, apart from the
places that already have them currently.

I have been reading the current implementation of hash joins, and in
ExecScanHashBucket, which I understand is the actual lookup function,
we could potentially look at a bloom filter per bucket. Instead of
actually looking up each hash value for the outer relation, we could
just check the corresponding bloom filter for that bucket, and if we
get a positive, then lookup the actual values i.e. continue with our
current behaviour (since we could be looking at a false positive).

This doesn't involve a lot of new logic. We need to add a bloom filter
in HashJoinState and set it when the hash table is being made. Also,
one thing we need to look at is the set of hash functions being used
for the bloom filters. This can be a point of further discussion.

The major potential benefit we could gain is that bloom filters never
give false negatives. So, we could save a lot of lookup where the
corresponding bloom filter gives a negative.

This approach can also be particularly useful for hash anti joins,
where we look for negatives. Since bloom filters can easily be stored
and processed, by straight bit operations, we could be looking at a
lot of saving here.

I am not sure if we already do something like this. Please direct me
to the relevant parts in the code if we already do.

Regards,

Atri

--
Regards,

Atri
l'apprenant



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Reduce maximum error in tuples estimation after vacuum.
Следующее
От: Robins Tharakan
Дата:
Сообщение: Re: Add more regression tests for dbcommands