Re: WIP: bloom filter in Hash Joins with batches

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: WIP: bloom filter in Hash Joins with batches
Дата
Msg-id CAM3SWZQkrQZTKvXkcGsBDginH+KODsuPXAhqhOW5zCKZBTtCTQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: WIP: bloom filter in Hash Joins with batches  (Peter Geoghegan <pg@heroku.com>)
Re: WIP: bloom filter in Hash Joins with batches  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> So, this seems to bring reasonable speedup, as long as the selectivity is
> below 50%, and the data set is sufficiently large.

What about semijoins? Apparently they can use bloom filters
particularly effectively. Have you considered them as a special case?

Also, have you considered Hash join conditions with multiple
attributes as a special case? I'm thinking of cases like this:

regression=# set enable_mergejoin = off;
SET
regression=# explain analyze select * from tenk1 o join tenk2 t on
o.twenty = t.twenty and t.hundred = o.hundred;                                                      QUERY PLAN
──────────────────────────────────────────────────────────────────────Hash Join  (cost=595.00..4103.00 rows=50000
width=488)(actual 
time=12.086..1026.194 rows=1000000 loops=1)  Hash Cond: ((o.twenty = t.twenty) AND (o.hundred = t.hundred))  ->  Seq
Scanon tenk1 o  (cost=0.00..458.00 rows=10000 width=244) 
(actual time=0.017..4.212 rows=10000 loops=1)  ->  Hash  (cost=445.00..445.00 rows=10000 width=244) (actual
time=12.023..12.023 rows=10000 loops=1)        Buckets: 16384  Batches: 1  Memory Usage: 2824kB        ->  Seq Scan on
tenk2t  (cost=0.00..445.00 rows=10000 
width=244) (actual time=0.006..3.453 rows=10000 loops=1)Planning time: 0.567 msExecution time: 1116.094 ms
(8 rows)

(Note that while the optimizer has a slight preference for a merge
join in this case, the plan I show here is a bit faster on my
machine).


--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Blind attempt at a Cygwin fix
Следующее
От: Marko Tiikkaja
Дата:
Сообщение: Re: Add numeric_trim(numeric)