Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Boom filters for hash joins (was: A design for amcheckheapam verification)
Дата
Msg-id CA+TgmoaqvJCkeUbhh0x3GLHJ5uY8xmqLbncjSB3UKavxvAzjnQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Aug 29, 2017 at 10:22 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Indeed.  Thank you for working on this!  To summarise a couple of
> ideas that Peter and I discussed off-list a while back:  (1) While
> building the hash table for a hash join we could build a Bloom filter
> per future batch and keep it in memory, and then while reading from
> the outer plan we could skip writing tuples out to future batches if
> there is no chance they'll find a match when read back in later (works
> only for inner joins and only pays off in inverse proportion to the
> join's selectivity);

You could try to use this with LEFT JOINs also, I think.  If you find
that a row can't match later, don't write it to the batch file;
instead, return a null-extended row immediately.

> (2) We could push a Bloom filter down to scans
> (many other databases do this, and at least one person has tried this
> with PostgreSQL and found it to pay off[1]).

I think the hard part is going to be figuring out a query planner
framework for this, because pushing down the Bloom filter down to the
scan changes the cost and the row-count of the scan.  This is
essentially a less-tractable version of the problem of deciding how
many parallel workers to use; in that case, at least, the number of
works must be an integer, so we could resort to trying them all, if
the costing model were otherwise good enough.  But here the thing that
matters is selectivity, which is a continuous rather than discrete.
Consider:

SELECT * FROM A LEFT JOIN (B INNER JOIN C ON B.x = C.x) ON A.y = B.y
LEFT JOIN D ON A.z > D.z;

Since the A-D join is not an equijoin, we'll have to implement it as a
nested loop; a case could also arise where a nested loop is the only
efficient strategy even for an equijoin, for example if D is huge.

Now, suppose the join to B-C is going to knock out 90% of the rows and
the join to D is going to knock out 50% of the rows, and that these
two percentages are independent of each other.  Normally, it would be
a slam-dunk to perform the B-C join first, but the Bloom filter
muddies the waters, because it might well be the case that what we
should really do is build the Bloom filter, apply it, then do the A-D
join, and do the A-(B-C) join last.  However, in order for the planner
to find that plan, it's going to need to be able to cost the nestloop
with an outer path that corresponds to a scan of A with the Bloom
filter applied.  And where is that path going to come from?

Note that it can't use that Bloom-filtered path for A unconditionally
because we might not opt to implement the A-(B-C) join as a hash join
at all; we have to build paths for NestLoop(A,D) and
NestLoop(Bloom(A),D) and we have to keep track of the fact that the
latter path has a dependency on a future hash join.  Things get
rapidly more complicated if you've got 10 or so different joins.  We
have to consider generate scan paths for the driving table with every
combination of Bloom filters that could be generated by subsequent
joins, and then all of those scan paths have to be carried through all
of the possible joinrels that can be generated later.  I think it's
not crazy to think that this could increase the cost of planning a
factor by 2^(# of joins).

Presumably some other method needs to be found, but I'm not clear what
that other method is.  Our planner is fundamentally bottom-up, and
it's not at all clear how to make it capable of changing its mind
later about something like the number of rows that a scan of A will
return, or how much that scan will cost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: [HACKERS] GnuTLS support
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Reporting query on crash even if completed