Обсуждение: Automatic optimization of IN clauses via INNER JOIN

Поиск
Список
Период
Сортировка

Automatic optimization of IN clauses via INNER JOIN

От
Thomas Hamilton
Дата:
Apparently the latest version of MySQL has solved this problem:
http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/

But I am running PostgreSQL v8.3 and am observing generally that SELECT ... WHERE ... IN (a, b, c, ...) is much slower
thanSELECT ... INNER JOIN (SELECT a UNION ALL SELECT b UNION ALL SELECT c ...) 

Why doesn't the optimizer automatically transform IN clauses to INNER JOINs in this fashion?





Re: Automatic optimization of IN clauses via INNER JOIN

От
Tom Lane
Дата:
Thomas Hamilton <thomashamilton76@yahoo.com> writes:
> Apparently the latest version of MySQL has solved this problem:
http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/
> But I am running PostgreSQL v8.3 and am observing generally that SELECT ... WHERE ... IN (a, b, c, ...) is much
slowerthan SELECT ... INNER JOIN (SELECT a UNION ALL�SELECT b UNION ALL SELECT c ...) 

> Why doesn't the optimizer automatically transform IN clauses to INNER JOINs in this fashion?

Did you read all the comments on that three-year-old article?

            regards, tom lane

Re: Automatic optimization of IN clauses via INNER JOIN

От
Thomas Hamilton
Дата:
Yes, I see the one note that running Analyze can improve the performance.

But in our testing under the same optimization and conditions INNER JOIN is significantly outperforming IN.



----- Original Message ----
From: Tom Lane tgl@sss.pgh.pa.us

Thomas Hamilton <thomashamilton76@yahoo.com> writes:
> Apparently the latest version of MySQL has solved this problem:
http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/
> But I am running PostgreSQL v8.3 and am observing generally that SELECT ... WHERE ... IN (a, b, c, ...) is much
slowerthan SELECT ... INNER JOIN (SELECT a UNION ALL SELECT b UNION ALL SELECT c ...) 

> Why doesn't the optimizer automatically transform IN clauses to INNER JOINs in this fashion?

Did you read all the comments on that three-year-old article?

            regards, tom lane





Re: Automatic optimization of IN clauses via INNER JOIN

От
Tom Lane
Дата:
Thomas Hamilton <thomashamilton76@yahoo.com> writes:
> But in our testing�under the same optimization and conditions INNER JOIN is significantly outperforming IN.

[ shrug... ]  You haven't provided any details, so it's impossible to
offer any useful advice.

            regards, tom lane

Re: Automatic optimization of IN clauses via INNER JOIN

От
Robert Haas
Дата:
On Thu, Dec 17, 2009 at 10:23 AM, Thomas Hamilton
<thomashamilton76@yahoo.com> wrote:
> Apparently the latest version of MySQL has solved this problem:
http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/
>
> But I am running PostgreSQL v8.3 and am observing generally that SELECT ... WHERE ... IN (a, b, c, ...) is much
slowerthan SELECT ... INNER JOIN (SELECT a UNION ALL SELECT b UNION ALL SELECT c ...) 

That's certainly not MY observation.  It would be interesting to see
what's going on in your case but you'll need to provide more details.

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

...Robert

Re: Automatic optimization of IN clauses via INNER JOIN

От
Grzegorz Jaśkiewicz
Дата:
On Thu, Dec 17, 2009 at 6:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 17, 2009 at 10:23 AM, Thomas Hamilton
> <thomashamilton76@yahoo.com> wrote:
>> Apparently the latest version of MySQL has solved this problem:
http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/
>>
>> But I am running PostgreSQL v8.3 and am observing generally that SELECT ... WHERE ... IN (a, b, c, ...) is much
slowerthan SELECT ... INNER JOIN (SELECT a UNION ALL SELECT b UNION ALL SELECT c ...) 
>
> That's certainly not MY observation.  It would be interesting to see
> what's going on in your case but you'll need to provide more details.
>
> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>

I asked the same question many times, and answer was always the same -
there's no point in doing that...
well... I've been asked by folks at work, the same thing (for typical
engineer, grasping the idea of join can be hard sometimes...).




--
GJ

Re: Automatic optimization of IN clauses via INNER JOIN

От
Craig Ringer
Дата:
On 17/12/2009 11:57 PM, Tom Lane wrote:
> Thomas Hamilton<thomashamilton76@yahoo.com>  writes:
>> But in our testing under the same optimization and conditions INNER JOIN is significantly outperforming IN.
>
> [ shrug... ]  You haven't provided any details, so it's impossible to
> offer any useful advice.

In other words: can we discuss this with reference to a specific case?
Please provide your queries, your EXPLAIN ANALYZE output, and other
relevant details as per:

   http://wiki.postgresql.org/wiki/SlowQueryQuestions

I'd be interested in knowing whether the planner can perform such
transformations and if so why it doesn't myself. I have the vague
feeling there may be semantic differences in the handling of NULL but I
can't currently seem to puzzle them out.

--
Craig Ringer

Re: Automatic optimization of IN clauses via INNER JOIN

От
Robert Haas
Дата:
On Thu, Dec 17, 2009 at 9:20 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 17/12/2009 11:57 PM, Tom Lane wrote:
>>
>> Thomas Hamilton<thomashamilton76@yahoo.com>  writes:
>>>
>>> But in our testing under the same optimization and conditions INNER JOIN
>>> is significantly outperforming IN.
>>
>> [ shrug... ]  You haven't provided any details, so it's impossible to
>> offer any useful advice.
>
> In other words: can we discuss this with reference to a specific case?
> Please provide your queries, your EXPLAIN ANALYZE output, and other relevant
> details as per:
>
>  http://wiki.postgresql.org/wiki/SlowQueryQuestions
>
> I'd be interested in knowing whether the planner can perform such
> transformations and if so why it doesn't myself. I have the vague feeling
> there may be semantic differences in the handling of NULL but I can't
> currently seem to puzzle them out.

NOT IN is the only that really kills you as far as optimization is
concerned.  IN can be transformed to a join.  NOT IN forces a NOT
(subplan)-type plan, which bites - hard.

...Robert

Re: Automatic optimization of IN clauses via INNER JOIN

От
Grzegorz Jaśkiewicz
Дата:
On Fri, Dec 18, 2009 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> NOT IN is the only that really kills you as far as optimization is
> concerned.  IN can be transformed to a join.  NOT IN forces a NOT
> (subplan)-type plan, which bites - hard.

in a well designed database (read: not abusing NULLs) - it can be done
with joins too.



--
GJ

Re: Automatic optimization of IN clauses via INNER JOIN

От
Robert Haas
Дата:
2009/12/18 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> On Fri, Dec 18, 2009 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> NOT IN is the only that really kills you as far as optimization is
>> concerned.  IN can be transformed to a join.  NOT IN forces a NOT
>> (subplan)-type plan, which bites - hard.
>
> in a well designed database (read: not abusing NULLs) - it can be done
> with joins too.

But not by PostgreSQL, or so I believe.

...Robert

Re: Automatic optimization of IN clauses via INNER JOIN

От
Grzegorz Jaśkiewicz
Дата:
2009/12/18 Robert Haas <robertmhaas@gmail.com>:
> 2009/12/18 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>> On Fri, Dec 18, 2009 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> NOT IN is the only that really kills you as far as optimization is
>>> concerned.  IN can be transformed to a join.  NOT IN forces a NOT
>>> (subplan)-type plan, which bites - hard.
>>
>> in a well designed database (read: not abusing NULLs) - it can be done
>> with joins too.
>
> But not by PostgreSQL, or so I believe.

using left join ?



--
GJ

Re: Automatic optimization of IN clauses via INNER JOIN

От
Robert Haas
Дата:
2009/12/18 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> 2009/12/18 Robert Haas <robertmhaas@gmail.com>:
>> 2009/12/18 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
>>> On Fri, Dec 18, 2009 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>>> NOT IN is the only that really kills you as far as optimization is
>>>> concerned.  IN can be transformed to a join.  NOT IN forces a NOT
>>>> (subplan)-type plan, which bites - hard.
>>>
>>> in a well designed database (read: not abusing NULLs) - it can be done
>>> with joins too.
>>
>> But not by PostgreSQL, or so I believe.
>
> using left join ?

If at least one column in the subselect is strict, you can rewrite it
that way yourself, but the optimizer won't do it. I wish it did, but I
don't wish it badly enough to have written the code myself, and
apparently neither does anyone else.

...Robert

Re: Automatic optimization of IN clauses via INNER JOIN

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> If at least one column in the subselect is strict, you can rewrite it
> that way yourself, but the optimizer won't do it. I wish it did, but I
> don't wish it badly enough to have written the code myself, and
> apparently neither does anyone else.

I was thinking about this earlier today.  It's a bit of a PITA because
we need the information very early in the planner, before it's done much
analysis.  So for example we might find ourselves duplicating the work
that will happen later to determine which tables are nullable by outer
joins.  I think this would be all right as long as we ensure that it's
only done when there's a chance for a win (ie, no extra cycles if
there's not actually a NOT IN present).  It could still be an
unpleasantly large amount of new code though.

Wouldn't we need to enforce that *all* columns of the subselect are
non-null, rather than *any*?

            regards, tom lane

Re: Automatic optimization of IN clauses via INNER JOIN

От
Robert Haas
Дата:
On Fri, Dec 18, 2009 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If at least one column in the subselect is strict, you can rewrite it
>> that way yourself, but the optimizer won't do it. I wish it did, but I
>> don't wish it badly enough to have written the code myself, and
>> apparently neither does anyone else.
>
> I was thinking about this earlier today.  It's a bit of a PITA because
> we need the information very early in the planner, before it's done much
> analysis.  So for example we might find ourselves duplicating the work
> that will happen later to determine which tables are nullable by outer
> joins.  I think this would be all right as long as we ensure that it's
> only done when there's a chance for a win (ie, no extra cycles if
> there's not actually a NOT IN present).  It could still be an
> unpleasantly large amount of new code though.

I haven't looked at the code (I'm not even sure where you're thinking
this would need to happen) but is there any way that we can do this
and usefully hold onto the results for future use?

> Wouldn't we need to enforce that *all* columns of the subselect are
> non-null, rather than *any*?

[ thinks about it ]

Yes.

...Robert