Re: WIP: Hash Join-Filter Pruning using Bloom Filters

Поиск
Список
Период
Сортировка
От Lawrence, Ramon
Тема Re: WIP: Hash Join-Filter Pruning using Bloom Filters
Дата
Msg-id 6EEA43D22289484890D119821101B1DF2C16F0@exchange20.mercury.ad.ubc.ca
обсуждение исходный текст
Ответ на Re: WIP: Hash Join-Filter Pruning using Bloom Filters  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Список pgsql-hackers
> -----Original Message-----
> From: Jonah H. Harris [mailto:jonah.harris@gmail.com]
> I have a new patch which does not create a bloom filter unless it sees
> that the hash join is going to batch.  I'll send it along later
> tonight.
>
> Currently it's additional space not accounted for by work_mem.
> Additionally, it's a good amount more space than is required.  This is
> fixed in the newer patch as well.

I think that the bloom filter will also improve the performance of
in-memory joins as well.  The basic trade-off in that case is the time
to probe multiple entries in a bucket in the hash table (which currently
defaults to 10) versus the cost of building/probing the bloom filter.
The bloom filter should win in this case as long as there are tuples in
the probe relation that cannot find a match in the build relation.

My suggestion would be to keep it enabled for all joins.  If possible,
it would be valuable to try to estimate what percentage of tuples that
the bloom filter filters out.  A simple estimate would be to determine
the percentage of the build table that is involved in the join.  For
instance, the good test cases had between 40-90% of the customer
relation filtered out and a corresponding percentage of the probe
relation, lineitem, was filtered out by the bloom filter.  The bad case
used all of customer, so the bloom filter stopped no probe tuples.

It would be useful for testing to track the number and percentage of
probe tuples that the bloom filter prevents a probe for.  You may
further record which of these tuples were in the in-memory batch and
on-disk batches.  These statistics may help you get the bloom filter
optimized for all cases.

--
Ramon Lawrence




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Patch for ISO-8601-Interval Input and output.
Следующее
От: Unicron
Дата:
Сообщение: question for patch "Automatically update view"