Re: two seperate queries run faster than queries ORed

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: two seperate queries run faster than queries ORed
Дата
Msg-id 7925.1079988249@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: two seperate queries run faster than queries ORed  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Список pgsql-performance
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Well, you have to be careful on the combination to not give the wrong
> answers if there's a row with u.status=3 that matches a row d.status=3.

We could in theory handle that using something similar to the method
currently used for "OR" indexscans (that is, rather than doing either
UNION- or UNION-ALL-like processing, we drop tuples from later scans
that meet the qual tests of the earlier scans).  However I don't see any
clean way in the current planner to cost out both approaches and pick
the cheaper one.  It looks to me like we'd have to do over the *entire*
join planning process each way, which is ugly as well as unreasonably
expensive.  The problem is that the OR approach only wins when the
component clauses of the OR can drop down to lower levels of the plan
tree if they are considered separately.  But a plan tree with a
restriction at a low level and one without it are two different things,
and the dynamic-programming approach we use to build up join plans
doesn't yield the same solutions.  (As indeed it shouldn't, since the
whole point of Joseph's example is to get fundamentally different plans
for the two parts of the OR.)

We could possibly approach it heuristically, that is examine the clauses
and try to guess whether it's better to split them apart or not.  But
even assuming that we punt on that part of the problem, it seems like a
mess.  For instance suppose that there are additional relations in the
query that aren't mentioned in the OR clause.  The planner may want to
join some of those relations in advance of forming the join that the OR
itself describes.  Pushing down different parts of the OR might cause
the best join path to change.  How could you merge multiple scans if
some include extra relations and some don't?

In short, I see how such a plan could be executed, but I don't see any
effective approach for generating the plan ...

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Joseph Shraibman
Дата:
Сообщение: Re: two seperate queries run faster than queries ORed together
Следующее
От: "Subbiah, Stalin"
Дата:
Сообщение: Databases Vs. Schemas