Re: planner missing a trick for foreign tables w/OR conditions

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: planner missing a trick for foreign tables w/OR conditions
Дата
Msg-id 12558.1387301313@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: planner missing a trick for foreign tables w/OR conditions  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: planner missing a trick for foreign tables w/OR conditions
Re: planner missing a trick for foreign tables w/OR conditions
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Dec 16, 2013 at 6:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The hard part is not extracting the partial qual.  The hard part is
>> trying to make sure that adding this entirely-redundant scan qual doesn't
>> catastrophically degrade join size estimates.

> OK, I had a feeling that's where the problem was likely to be.  Do you
> have any thoughts about a more principled way of solving this problem?
> I mean, off-hand, it's not clear to me that the comments about this
> being a MAJOR HACK aren't overstated.

Well, the business about injecting the correction by adjusting a cached
selectivity is certainly a hack, but it's not one that I think is urgent
to get rid of; I don't foresee anything that's likely to break it soon.

> I might be missing something, but I suspect it works fine if every
> path for the relation is generating the same rows.

I had been thinking it would fall down if there are several OR conditions
affecting different collections of rels, but after going through the math
again, I'm now thinking I was wrong and it does in fact work out.  As you
say, we do depend on all paths generating the same rows, but since the
extracted single-rel quals are inserted as plain baserestrictinfo quals,
that'll be true.

A bigger potential objection is that we're opening ourselves to larger
problems with estimation failures due to correlated qual conditions, but
again I'm finding that the math doesn't bear that out.  It's reasonable
to assume that our estimate for the extracted qual will be better than
our estimate for the OR as a whole, so our adjusted size estimates for
the filtered base relations are probably OK.  And the adjustment to the
OR clause selectivity means that the size estimate for the join comes
out exactly the same.  We'll actually be better off than with what is
likely to happen now, which is that people manually extract the simplified
condition and insert it into the query explicitly.  PG doesn't realize
that that's redundant and so will underestimate the join size.

So at this point I'm pretty much talked into it.  We could eliminate the
dependence on indexes entirely, and replace this code with a step that
simply tries to pull single-base-relation quals out of ORs wherever it can
find one.  You could argue that the produced quals would sometimes not be
worth testing for, but we could apply a heuristic that says to forget it
unless the estimated selectivity of the extracted qual is less than,
I dunno, 0.5 maybe.  (I wonder if it'd be worth inserting a check that
there's not already a manually-generated equivalent clause, too ...)

A very nice thing about this is we could do this step ahead of relation
size estimate setting and thus remove the redundant work that currently
happens in set_plain_rel_size when the optimization fires.  Which is
another aspect of the current code that's a hack, so getting rid of it
would be a net reduction in hackiness.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Extension Templates S03E11
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: patch: make_timestamp function