Re: WIP: Hash Join-Filter Pruning using Bloom Filters

Поиск
Список
Период
Сортировка
От Lawrence, Ramon
Тема Re: WIP: Hash Join-Filter Pruning using Bloom Filters
Дата
Msg-id 6EEA43D22289484890D119821101B1DF2C16EF@exchange20.mercury.ad.ubc.ca
обсуждение исходный текст
Ответ на Re: WIP: Hash Join-Filter Pruning using Bloom Filters  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Ответы Re: WIP: Hash Join-Filter Pruning using Bloom Filters  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Список pgsql-hackers
> -----Original Message-----
> On Sun, Nov 2, 2008 at 10:49 PM, Jonah H. Harris
> <jonah.harris@gmail.com> wrote:
> It's effective as-is for a preliminary patch.  The GUC code is the
> least of my worries.
>
> > Can you provide some figures on the performance impact of the bloom
> filter?

I have tested the Bloom filter patch.  It compiles cleanly against HEAD.

As indicated, the performance improvements for hash join are good,
especially when the build table is filtered with a selection condition.
Performance improvements range from a couple of percent up to 20% for
multi-batch joins.  Note that the bloom filter will slightly slow
queries where the filter has no benefit.

I have not looked at the actual implementation of the Bloom filter, but
will proceed to do that next.  One issue to be considered is how the
space used for the bloom filter is related to the work_mem allocated to
the join. That is, does the bloom filter consume some of the work_mem
space or is it treated as additional memory allocated to the join.

Experimental Results (in ms)
Query    Time With Filter    Time Without Filter    % Faster
1    2,166            2,648                18%
2    1,665            1,772                6%
3    5,308            6,374                17%
4    63,690        75,715            15%
5    87,864        81,552            -8%
6    12,492        11,696            -7%


Query 1: (provided by patch author) = 190,000 results
CREATE TABLE t1 (id INTEGER PRIMARY KEY, x INTEGER); CREATE TABLE t2 (id
INTEGER PRIMARY KEY, x INTEGER); INSERT INTO t1 (SELECT ge, ge % 100
FROM generate_series(1, 1000000) ge); INSERT INTO t2 (SELECT * FROM t1);
VACUUM ANALYZE;
SELECT COUNT(*) FROM t1, t2WHERE t1.id = t2.id      AND t1.x < 30      AND t2.x > 10;
The next five queries are on the TPCH 1GB data set.

Query 2: (in-memory join with restrictive build filter) = 56,110 results
select count(*) from customer, orders where c_custkey = o_custkey and
c_nationkey = 10;

Query 3: (multi-batch join with less restrictive build filter) = 841,767
results
select count(*) from customer, orders where c_custkey = o_custkey and
c_nationkey > 10;

Query 4: (large multi-batch join) = 3,215,402 results
select count(*) from orders, lineitem where o_orderkey = l_orderkey and
o_totalprice > 150000;

Query 5: (large multi-batch join with no filter - hard case for BLOOM
filter) = 6,000,003 results
select count(*) from orders, lineitem where o_orderkey = l_orderkey;

Query 6: (large probe, in-memory build with no filter - hard case for
BLOOM filter) = 6,000,003 results
select count(*) from supplier, lineitem where s_suppkey = l_suppkey;

All tests were run 4 times and the times were averaged.  The initial run
time was discarded to deal with buffering issues.

--
Dr. Ramon Lawrence
University of British Columbia Okanagan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: gram.y=>preproc.y
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Patch for ISO-8601-Interval Input and output.