Re: Hash Joins vs. Bloom Filters / take 2

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Hash Joins vs. Bloom Filters / take 2
Дата
Msg-id CAH2-Wz=FZ8r=qDX3hz5ctaDNts1+bdeVNYhXB+m+dr79XYTajQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash Joins vs. Bloom Filters / take 2  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Hash Joins vs. Bloom Filters / take 2
Список pgsql-hackers
On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.

It's generally true that you need 9.6 bits per element to get a 1%
false positive rate. 1% could be considered much too low here.

Do we need to eliminate 99% of all hash join probes (that find nothing
to join on) to make this Bloom filter optimization worthwhile?
Personally, I doubt it.

> That's efficient, but it's not magic. It can still happen that the
> whole set can't fit in work_mem with an acceptable false positive
> rate.

A merge join is always going to be the better choice when extremely
memory constrained.

-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: Hash Joins vs. Bloom Filters / take 2
Следующее
От: "R, Siva"
Дата:
Сообщение: Duplicate Item Pointers in Gin index