Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
Дата
Msg-id 9b9fd273-18e7-2b07-7aa1-4b00ab59b8d1@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,

On 09/19/2017 02:55 AM, Robert Haas wrote:
> On Mon, Sep 18, 2017 at 5:13 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Uh, why does the planner need to be involved at all?
>>>
>>> Because it loses if the Bloom filter fails to filter anything.  That's
>>> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x =
>>> b.x given a foreign key on a.x referencing b(x).
>>
>> Wouldn't a merge join be a lot more likely in this case anyway? Low
>> selectivity hash joins with multiple batches are inherently slow; the
>> wasted overhead of using a bloom filter may not matter.
>>
>> Obviously this is all pretty speculative. I suspect that this could be
>> true, and it seems worth investigating that framing of the problem
>> first.
> 
> ISTR Tomas Vondra doing some experiments with this a few years ago and
> finding that it was, in fact, a problem.
> 

You seem to have better memory than me, but you're right - I did some
experiments with this in 2015, the WIP patch and discussion is here:
 https://www.postgresql.org/message-id/5670946E.8070705@2ndquadrant.com

The whole idea was that with a bloom filter we can reduce the amount of
tuples (from the outer relation) written to batches.

The patch is fairly simple, and did not try to push the bloom filters to
scan nodes or anything like that. It might be a meaningful first step,
though, particularly for selective joins (where only small number of
rows from the outer relation has a match in the hash table).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Setting pd_lower in GIN metapage
Следующее
От: Robins Tharakan
Дата:
Сообщение: Re: [HACKERS] psql - add ability to test whether a variable exists