Re: New style of hash join proposal

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: New style of hash join proposal
Дата
Msg-id 87ve3lcx29.fsf@oxford.xeocode.com
обсуждение исходный текст
Ответ на Re: New style of hash join proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: New style of hash join proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> We currently execute a lot of joins as Nested Loops which would be more
>> efficient if we could batch together all the outer keys and execute a single
>> inner bitmap index scan for all of them together.
>
> Please give an example of what you're talking about that you think we
> can't do now.

So basically given a simple join like:

postgres=# explain analyze select * from a where i in (select i from b);
QUERYPLAN                                                 
 
------------------------------------------------------------------------------------------------------------Hash IN
Join (cost=3.25..181.75 rows=100 width=4) (actual time=0.704..44.262 rows=100 loops=1)  Hash Cond: (a.i = b.i)  ->  Seq
Scanon a  (cost=0.00..140.00 rows=10000 width=4) (actual time=0.034..20.864 rows=10000 loops=1)  ->  Hash
(cost=2.00..2.00rows=100 width=4) (actual time=0.530..0.530 rows=100 loops=1)        ->  Seq Scan on b
(cost=0.00..2.00rows=100 width=4) (actual time=0.013..0.236 rows=100 loops=1)Total runtime: 44.550 ms
 
(6 rows)

Note that we're doing a full sequential scan of "a" even though we've already
finished hashing "b" and know full well which keys we'll need. If we have an
index on "a" and "b" is sufficiently smaller than "a", as in this case, then
we could do a bitmap index scan on "a" and pull out just those keys.

In a simple case like this you could hack it by doing a query like:

postgres=# explain analyze select * from a where i = any (array(select i from b));
          QUERY PLAN                                                   
 
---------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on a  (cost=40.59..64.01 rows=10 width=4) (actual time=2.193..2.535 rows=100 loops=1)  Recheck Cond: (i = ANY
($0)) InitPlan    ->  Seq Scan on b  (cost=0.00..2.00 rows=100 width=4) (actual time=0.024..0.279 rows=100 loops=1)
-> Bitmap Index Scan on ai  (cost=0.00..38.59 rows=10 width=0) (actual time=2.155..2.155 rows=100 loops=1)        Index
Cond:(i = ANY ($0))Total runtime: 2.829 ms
 
(7 rows)

This is effectively an equivalent plan but it runs 20x faster. 

But constructing an array is pointless, we could just do a hash_seq_search
directly and in any case I'm not sure it's always possible to transform the
query in this way. 

I was thinking we could pass the hash itself as a parameter to the bitmap
index scan kind of like how we pass the bitmap structures up from bitmap index
scans to bitmap heap scans and get a plan like:
                                                QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------Hash IN
Join Hash Cond: (a.i = b.i)  ->  Bitmap Heap Scan on a        Recheck Cond: (a.i = ANY ($0))        ->  Bitmap Index
Scanon ai              Index Cond: (i = ANY ($0))  ->  Hash        ->  Seq Scan on b
 

The really promising thing about this is it would actually simplify the work
in the case where the hash overflows. Instead of having to set aside tuples
for subsequent batches we would know we got all the matching records in the
index scan. So we can throw out the hash and start a new one for each batch.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zeugswetter Andreas OSB SD"
Дата:
Сообщение: Re: Remove hacks for old bad qsort() implementations?
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Remove hacks for old bad qsort() implementations?