> -----Original Message-----
> > Minor question on this patch. AFAICS there is another patch that
seems
> > to be aiming at exactly the same use case. Jonah's Bloom filter
patch.
> >
> > Shouldn't we have a dust off to see which one is best? Or at least a
> > discussion to test whether they overlap? Perhaps you already did
that
> > and I missed it because I'm not very tuned in on this thread.
> >
> > --
> > Simon Riggs www.2ndQuadrant.com
> > PostgreSQL Training, Services and Support
>
> We haven't had that discussion AFAIK, and definitely should. First
> glance suggests they could coexist peacefully, with proper coaxing. If
> I understand things properly, Jonah's patch filters tuples early in
> the join process, and this patch tries to ensure that hash join
> batches are kept in RAM when they're most likely to be used. So
> they're orthogonal in purpose, and the patches actually apply *almost*
> cleanly together. Jonah, any comments? If I continue to have some time
> to devote, and get through all I think I can do to review this patch,
> I'll gladly look at Jonah's too, FWIW.
>
> - Josh
The skew patch and bloom filter patch are orthogonal and can both be
applied. The bloom filter patch is a great idea, and it is used in many
other database systems. You can use the TPC-H data set to demonstrate
that the bloom filter patch will significantly improve performance of
multi-batch joins (with or without data skew).
Any query that filters a build table before joining on the probe table
will show improvements with a bloom filter. For example,
select * from customer, orders where customer.c_nationkey = 10 and
customer.c_custkey = orders.o_custkey
The bloom filter on customer would allow us to avoid probing with orders
tuples that cannot possibly find a match due to the selection criteria.
This is especially beneficial for multi-batch joins where an orders
tuple must be written to disk if its corresponding customer batch is not
the in-memory batch.
I have no experience reviewing patches, but I would be happy to help
contribute/review the bloom filter patch as best I can.
--
Dr. Ramon Lawrence
Assistant Professor, Department of Computer Science, University of
British Columbia Okanagan
E-mail: ramon.lawrence@ubc.ca