Re: Hash Joins vs. Bloom Filters / take 2

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: Hash Joins vs. Bloom Filters / take 2
Дата	21 февраля 2018 г. 02:23:58
Msg-id	CAH2-Wz=FZ8r=qDX3hz5ctaDNts1+bdeVNYhXB+m+dr79XYTajQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Hash Joins vs. Bloom Filters / take 2 (Claudio Freire <klaussfreire@gmail.com>)
Ответы	Re: Hash Joins vs. Bloom Filters / take 2
Список	pgsql-hackers

Дерево обсуждения

On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.

It's generally true that you need 9.6 bits per element to get a 1%
false positive rate. 1% could be considered much too low here.

Do we need to eliminate 99% of all hash join probes (that find nothing
to join on) to make this Bloom filter optimization worthwhile?
Personally, I doubt it.

> That's efficient, but it's not magic. It can still happen that the
> whole set can't fit in work_mem with an acceptable false positive
> rate.

A merge join is always going to be the better choice when extremely
memory constrained.

-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hash Joins vs. Bloom Filters / take 2