Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
Дата
Msg-id CAH2-WzkUEKmYAH21F=WsBODYSCw8JZhx86VFrVQTHasXTS_MLg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Tue, Sep 19, 2017 at 1:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 09/19/2017 06:03 PM, Peter Geoghegan wrote:
>> I believe that parallelism makes the use of Bloom filter a lot more
>> compelling, too. Obviously this is something that wasn't taken into
>> consideration in 2015.
>>
>
> I haven't thought about it from that point of view. Can you elaborate
> why that would be the case? Sorry if this was explained earlier in this
> thread (I don't see it in the history, though).

Well, IPC and locking shared state to protect the state's structure is
potentially a big bottleneck for parallel hash join. I think that
Bloom filters were first used in distributed databases in the 1980s,
where a network round trip could be saved, which this is a little
like. That's why my guess is that Bloom filtering will be more
valuable when parallelism is used.

I think that right deep hash joins make this really compelling, if and
when they allow you to build multiple Bloom filters that can be
combined from multiple right deep hash table builds. I think you can
do fancy things like reduce the amount of I/O against a star schema
fact table considerably. You can use one Bloom filter (built against
some dimension table) to drive a bitmap index scan on a fact table
index, and then another Bloom filter (built against some other
dimension table) to drive another bitmap index scan. The executor then
does a bitmap AND to combine the two for a bitmap heap scan on the
fact table.

(Maybe this technique doesn't necessarily use a Bloom filter; it could
be some other type of bitmap.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: [HACKERS] why not parallel seq scan for slow functions
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] coverage analysis improvements