Обсуждение: BUG #4926: too few pathkeys for mergeclauses

Поиск
Список
Период
Сортировка

BUG #4926: too few pathkeys for mergeclauses

От
"Roman Kononov"
Дата:
The following bug has been logged online:

Bug reference:      4926
Logged by:          Roman Kononov
Email address:      kononov@ftml.net
PostgreSQL version: 8.4.0
Operating system:   Linux x86_64
Description:        too few pathkeys for mergeclauses
Details:

test=# create table junk(i int);
CREATE TABLE
test=# select * from junk left outer join (select coalesce(i,1) as x,
coalesce(i,2) as y from junk) t on coalesce(i,3)=x and coalesce(i,4)=y and
coalesce(i,5)=x;
ERROR:  too few pathkeys for mergeclauses
test=# select * from junk left outer join (select coalesce(i,1) as x,
coalesce(i,2) as y from junk) t on coalesce(i,3)=x and coalesce(i,4)=y and
coalesce(i,5)=y;
 i | x | y
---+---+---
(0 rows)

I think, the first query should be handled the same way second is.

Re: BUG #4926: too few pathkeys for mergeclauses

От
Greg Stark
Дата:
On Thu, Jul 16, 2009 at 9:07 PM, Roman Kononov<kononov@ftml.net> wrote:
>
> test=3D# create table junk(i int);
> CREATE TABLE
> test=3D# select * from junk left outer join (select coalesce(i,1) as x,
> coalesce(i,2) as y from junk) t on coalesce(i,3)=3Dx and coalesce(i,4)=3D=
y and
> coalesce(i,5)=3Dx;
> ERROR: =A0too few pathkeys for mergeclauses

Thanks for the bug report. That's definitely not supposed to be
happening. It's always nice when it's easy to reproduce the problem
like this.

--=20
greg
http://mit.edu/~gsstark/resume.pdf

Re: BUG #4926: too few pathkeys for mergeclauses

От
Heikki Linnakangas
Дата:
Greg Stark wrote:
> On Thu, Jul 16, 2009 at 9:07 PM, Roman Kononov<kononov@ftml.net> wrote:
>> test=# create table junk(i int);
>> CREATE TABLE
>> test=# select * from junk left outer join (select coalesce(i,1) as x,
>> coalesce(i,2) as y from junk) t on coalesce(i,3)=x and coalesce(i,4)=y and
>> coalesce(i,5)=x;
>> ERROR:  too few pathkeys for mergeclauses
>
> Thanks for the bug report. That's definitely not supposed to be
> happening. It's always nice when it's easy to reproduce the problem
> like this.

Yep. This can be further reduced into this:

CREATE TABLE a (i integer);
CREATE TABLE b (x integer, y integer);

select * from a left outer join b on i=x and i=y and i=x;

The planner is choosing a merge join, where the outer side (table a) is
sorted by (i), and the inner side is sorted by (x, y). But that doesn't
work with the merge condition (i=x AND i=y AND i=x).

Version 8.3 has the same bug, apparently introduced along with the
equivalence classes. In 8.2, the merge condition is reduced into (i=x
AND i=y), IOW the planner eliminates the duplicate condition. I believe
8.2 would otherwise have the same problem as well.

I can see two different things that you could say is at fault here:

1. We no longer eliminate the duplicate condition, but the
find_mergeclauses_for_pathkeys() + make_inner_pathkeys_for_merge()
combination relies on there being no duplicates. We should try harder to
eliminate duplicates in left join clauses.

2. make_inner_pathkeys_for_merge() should have created sort order (x, y,
x) for the inner side.

The first solution is what we probably want, to avoid unnecessary work
at execution time.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4926: too few pathkeys for mergeclauses

От
Heikki Linnakangas
Дата:
Heikki Linnakangas wrote:
> 2. make_inner_pathkeys_for_merge() should have created sort order (x, y,
> x) for the inner side.

On further thought, that would make no sense. Sort order (x, y) is
always equivalent to (x, y, x).

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4926: too few pathkeys for mergeclauses

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Version 8.3 has the same bug, apparently introduced along with the
> equivalence classes. In 8.2, the merge condition is reduced into (i=x
> AND i=y), IOW the planner eliminates the duplicate condition. I believe
> 8.2 would otherwise have the same problem as well.

The fact that 8.2 eliminates the redundant condition is more or less
accidental, I think.  It's using equal() to detect duplicate
RestrictInfos coming up from the two input relations for the join,
while later versions rely on pointer equality for that.  You can fool
8.2 by commuting the duplicate condition, but it still doesn't fail:

regression=# explain select * from a left outer join b on i=x and i=y and x=i;
                           QUERY PLAN
-----------------------------------------------------------------
 Merge Left Join  (cost=285.12..325.93 rows=2140 width=12)
   Merge Cond: ((a.i = b.x) AND (a.i = b.y) AND (a.i = b.x))
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4)
         Sort Key: a.i
         ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)
   ->  Sort  (cost=135.34..140.19 rows=1940 width=8)
         Sort Key: b.x, b.y
         ->  Seq Scan on b  (cost=0.00..29.40 rows=1940 width=8)
(8 rows)

I think what this case may show is simply that the consistency
checking I added to create_mergejoin_plan in 8.3 is too strict.
Not quite convinced yet though.

Another possible solution for this particular case is to allow the
equivclass code to deduce x=y as an equivalence class, that is
the plan should enforce that check at the scan of b and then just
have one sort key for the merge.  Not sure how complicated that is,
however, and in any case it may not fix every possible failure
case for create_mergejoin_plan.

            regards, tom lane

Re: BUG #4926: too few pathkeys for mergeclauses

От
Tom Lane
Дата:
I wrote:
> I think what this case may show is simply that the consistency
> checking I added to create_mergejoin_plan in 8.3 is too strict.
> Not quite convinced yet though.

After further review I think that is the correct approach to take.

The proximate cause of the problem is that
find_mergeclauses_for_pathkeys is selecting an order for the merge
clauses that corresponds to a noncanonical pathkey list for the
inner relation (to wit, x, y, x).  While it would be possible in
this particular example to put the clauses in x, x, y order instead,
I don't think that is necessarily possible in every case.  The clause
ordering is constrained by the outer pathkeys and what we have here
is a demonstration that the inner pathkeys needn't match the outer
ones one-to-one.  So you could have a clause that references an inner
pathkey that is also referenced by some earlier clause that matches
a different outer pathkey, and there won't be any way to make them
adjacent.

By the time the plan gets to create_mergejoin_plan, the inner pathkey
list has been reduced to canonical form (x, y), but *this does not
represent any actual change in sort order*.  (Which is why there's
no actual bug in 8.2 and before, which blithely generate plans that
involve such "incorrect" mergeclause orderings.)  So I think we should
just weaken the checks in create_mergejoin_plan to allow such cases,
ie, each mergeclause should be allowed to match any already-used
inner pathkey.

The other approach we could possibly take is to have
find_mergeclauses_for_pathkeys reject candidate mergeclauses that
produce out-of-order inner pathkeys, but that would break at least
this Assert at joinpath.c:272:

        /* Should have used them all... */
        Assert(list_length(cur_mergeclauses) == list_length(mergeclause_list));

and it'd be rather expensive to test for anyway.

            regards, tom lane

Re: BUG #4926: too few pathkeys for mergeclauses

От
Tom Lane
Дата:
"Roman Kononov" <kononov@ftml.net> writes:
> Description:        too few pathkeys for mergeclauses

I've applied a patch for this.  Thanks for the report.

            regards, tom lane