Hi,
On 05/04/2016 11:02 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, May 4, 2016 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My other design-level complaint is that basing this on foreign keys is
>>> fundamentally the wrong thing. What actually matters is the unique index
>>> underlying the FK; that is, if we have "a.x = b.y" and there's a
>>> compatible unique index on b.y, we can conclude that no A row will match
>>> more than one B row, whether or not an explicit FK relationship has been
>>> declared. So we should drive this off unique indexes instead of FKs,
>>> first because we will find more cases, and second because the planner
>>> already examines indexes and doesn't need any additional catalog lookups
>>> to get the required data. (IOW, the relcache additions that were made in
>>> this patch series should go away too.)
>
>> Without prejudice to anything else in this useful and detailed review,
>> I have a question about this. A unique index proves that no A row
>> will match more than one B row, and I agree that deriving that from
>> unique indexes is sensible. However, ISTM that an FK provides
>> additional information: we know that, modulo filter conditions on B,
>> every A row will match *exactly* one row B row, which can prevent us
>> from *underestimating* the size of the join product. A unique index
>> can't do that.
>
> Very good point, but unless I'm missing something, that is not what the
> current patch does. I'm not sure offhand whether that's an important
> estimation failure mode currently, or if it is whether it would be
> sensible to try to implement that rule entirely separately from the "at
> most one" aspect, or if it isn't sensible, whether that's a sufficiently
> strong reason to confine the "at most one" logic to working only with FKs
> and not with bare unique indexes.
FWIW it's a real-world problem with multi-column FKs. As David pointed
out upthread, a nice example of this issue is Q9 in the TPC-H bench,
where the underestimate leads to HashAggregate and then OOM failure.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services