Re: IN or EXISTS

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: IN or EXISTS
Дата
Msg-id 1316741520.31091.7.camel@sussancws0025
обсуждение исходный текст
Ответ на Re: IN or EXISTS  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-performance
On Wed, 2011-08-31 at 09:33 +0800, Craig Ringer wrote:
> On the other hand, the `IN' subquery is uncorrelated needs only run
> once, where the `EXISTS' subquery is correlated and has to run once for
> every outer record.

If the EXISTS looks semantically similar to an IN (aside from NULL
semantics), then it can be made into a semijoin. It doesn't require
re-executing any part of the plan.

I don't think there are any cases where [NOT] IN is an improvement, am I
mistaken?

> Another complication is the possible presence of NULL in an IN list.
> Getting NULLs in `IN' lists is a common source of questions on this
> list, because people are quite surprised by how it works. EXISTS avoids
> the NULL handling issue (and in the process demonstrates how woefully
> inconsistent SQL's handling of NULL really is).

Absolutely. The NULL behavior of IN is what makes it hard to optimize,
and therefore you should use EXISTS instead if the semantics are
suitable.

> Theoretically the query planner could transform:
>
> SELECT * from y WHERE y.id IN (SELECT DISTINCT z.y_id FROM z WHERE
> z.y_id IS NOT NULL);
>
> into:
>
> SELECT * FROM y WHERE EXISTS (SELECT 1 FROM z WHERE z.y_id = y.id)
>
> ... or vice versa depending on which it thought would be faster.

Although those two queries are semantically the same (I think), a lot of
very similar pairs of queries are not equivalent. For instance, if it
was a NOT IN you couldn't change that to a NOT EXISTS.

Regards,
    Jeff Davis


В списке pgsql-performance по дате отправления:

Предыдущее
От: Michael Viscuso
Дата:
Сообщение: Re: Query optimization using order by and limit
Следующее
От: Gunnlaugur Þór Briem
Дата:
Сообщение: Re: Constraint exclusion on UNION ALL subqueries with WHERE conditions