Обсуждение: [HACKERS] propose to pushdown qual into EXCEPT

Поиск
Список
Период
Сортировка

[HACKERS] propose to pushdown qual into EXCEPT

От
"Armor"
Дата:
    Recently, we find PG fails to generate an effective plan for following SQL:
    select * from (select * from table1 execpt select * from table2) as foo where foo.a > 0;
    Because PG does not pushdown qual to the none of the subquery. And I check the source code, find some comments in src/backend/optimizer/path/allpaths.c, which says "If the     subquery contains EXCEPT or EXCEPT ALL set ops we cannot push quals into it, because that could change the results".
    However, for this case, I think we can pushdown qual to the  left most subquery of EXCEPT, just like other database does. And we can get an more effective plan such as:
postgres=# explain select * from (select * from table1 except select * from table2) as foo where foo.a > 0;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Subquery Scan on foo  (cost=0.00..118.27 rows=222 width=8)
   ->  HashSetOp Except  (cost=0.00..116.05 rows=222 width=12)
         ->  Append  (cost=0.00..100.98 rows=3013 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..45.78 rows=753 width=12)
                     ->  Seq Scan on table1  (cost=0.00..38.25 rows=753 width=8)
                           Filter: (a > 0)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..55.20 rows=2260 width=12)
                     ->  Seq Scan on table2  (cost=0.00..32.60 rows=2260 width=8)
(8 rows)

   And the attached patch is a draft, it works for this case.
    
------------------
Jerry Yu
https://github.com/scarbrofair
 
Вложения

Re: [HACKERS] propose to pushdown qual into EXCEPT

От
Tom Lane
Дата:
"Armor" <yupengstone@qq.com> writes:
>     Because PG does not pushdown qual to the none of the subquery. And I check the source code, find some comments in
src/backend/optimizer/path/allpaths.c,which says "If the     subquery contains EXCEPT or EXCEPT ALL set ops we cannot
pushquals into it, because that could change the results". 
>     However, for this case, I think we can pushdown qual to the  left most subquery of EXCEPT, just like other
databasedoes. 

That is not an adequate argument for such a change being okay.  Postgres,
with its extensible set of datatypes, has to be much more careful about
the semantic soundness of optimizations than some other DBs do.

The existing behavior here dates to commit 0201dac1c319599a, which was
inspired by this thread:
https://www.postgresql.org/message-id/flat/46C15C39FEB2C44BA555E356FBCD6FA48879A6%40m0114.s-mxs.net
(BTW, several of the concrete examples discussed in that thread no longer
apply because of later changes.  But the problem still exists.)

We had convinced ourselves that pushing down a qual into UNION and
INTERSECT cases is okay even if the qual can distinguish rows that the
setop comparisons see as equal, because you would get results consistent
with the setop having chosen some legitimate set of representative row(s)
for each group of "duplicate" rows.  However that did not seem to apply to
EXCEPT, or at least I wasn't convinced enough to risk it.  I'm still not,
without a worked-out argument as to why it's okay.  In particular I'd like
to see a proof that establishes (a) why it is or is not okay to push into
the right-hand side of EXCEPT, and (b) whether ALL does or does not make a
difference.

Now you could argue that we threw all these fine points out the window
with respect to window functions in commit d222585a9f7a18f2, so maybe
it's okay to do it with respect to EXCEPT as well.  But that would lead
to deciding it's okay to push into both sides of EXCEPT, which is still
not what this patch does.  Anyway I'm not very pleased by the idea that
we'd hold EXCEPT to a weaker semantic standard than UNION and INTERSECT.
        regards, tom lane



Re: [HACKERS] propose to pushdown qual into EXCEPT

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> "Armor" <yupengstone@qq.com> writes:
> >     Because PG does not pushdown qual to the none of the subquery. And I check the source code, find some comments
insrc/backend/optimizer/path/allpaths.c, which says "If the     subquery contains EXCEPT or EXCEPT ALL set ops we
cannotpush quals into it, because that could change the results".
 
> >     However, for this case, I think we can pushdown qual to the  left most subquery of EXCEPT, just like other
databasedoes.
 
> 
> That is not an adequate argument for such a change being okay.  Postgres,
> with its extensible set of datatypes, has to be much more careful about
> the semantic soundness of optimizations than some other DBs do.

Can we use the record_image_ops mechanism here?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] propose to pushdown qual into EXCEPT

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> That is not an adequate argument for such a change being okay.  Postgres,
>> with its extensible set of datatypes, has to be much more careful about
>> the semantic soundness of optimizations than some other DBs do.

> Can we use the record_image_ops mechanism here?

Don't see how that helps?

But I went through the logic and think I convinced myself that pushdown
to LHS only is okay, and no more questionable than what happens for
INTERSECT.  Argue thus:

Assume a group of non-distinct-according-to-the-setop rows includes m rows
from left-hand side, n rows from right-hand side, of which the upper qual
would pass m' <= m and n' <= n rows respectively.

UNION: unoptimized setop produces 1 row (since we may assume the group is
nonempty).  That row might or might not pass the upper qual, so we get
0 or 1 row out.  User can be certain about what will happen only if
m'=m and n'=n (row must pass qual) or m'=n'=0 (row must not pass qual).

After optimizing by pushing qual down: we get one row if m' > 0 or n' > 0.
This is equivalent to the unoptimized behavior if we assume that the setop
always magically selects a representative row passing the qual if
possible, so OK.

INTERSECT: unoptimized setop produces 1 row if m>0 and n>0, which again
might or might not pass the upper qual, so 0 or 1 row out.

Optimized case: we get 1 row if m' > 0 and n' > 0.  This is equivalent to
unoptimized behavior if you allow that the representative row could have
been taken from either input.  (For example, if m' = m, n > 0, and n' = 0,
producing 0 rows is legit only if the hypothetical representative row was
taken from RHS.)  That's legal per SQL, though I'm unsure offhand if our
implementation actually would do that.

INTERSECT ALL: unoptimized setop produces min(m,n) rows, which might or
might not pass the upper qual, so anywhere from 0 to min(m,n) rows out.
(It is probably possible to set a tighter upper bound when m', n' are
small, but I've not bothered to reason that out.)

Optimized case: we get min(m', n') rows out, which seems fine; again
it could be produced by the unoptimized implementation selecting
representative row(s) that happen to have that many members passing the
upper qual.

But actually, our implementation doesn't produce some random collection
of min(m,n) elements of the row group.  What you get is min(m,n) copies
of a single representative row; therefore, assuming the upper qual is
nonvolatile, either they all pass or none do, so the actual output is
either 0 or min(m,n) duplicates.  (The spec says "Let R be a row that is a
duplicate of some row in ET1 or of some row in ET2 or both.  Let m be the
number of duplicates of R in ET1 and let n be the number of duplicates of
R in ET2, where m >= 0 and n >= 0.  [For INTERSECT ALL] the number of
duplicates of R that T contains is the minimum of m and n."  So depending
on what you want to assume that "duplicates of R" means, you could argue
that the spec actually requires our implementation's behavior here.
It certainly doesn't appear to forbid it.)  The optimized output is
therefore distinguishable from unoptimized, because it could produce a
number of rows that the unoptimized implementation cannot.  But nobody's
complained about it, and it's hard to see there being a legitimate beef.

EXCEPT: unoptimized setop produces 1 row if m>0 and n=0, which again
might or might not pass the upper qual, so 0 or 1 row out.

If we were to push qual into both sides: we'd get one row if m' > 0
and n' = 0.  So if the qual passes some of LHS and none of RHS, we'd
get a row, breaking SQL semantics.

If we push into LHS only: we get one row if m' > 0 and n = 0.  Certainly
then m > 0.  So this is equivalent to behavior where the setop magically
selects a representative LHS row passing the qual.  OTOH, if m' = 0, then
the setop wouldn't produce a row, but if it had that row would have failed
the upper qual, so behavior is still equivalent.

EXCEPT ALL: unoptimized setop produces max(m-n,0) rows, which might or
might not pass the upper qual, so anywhere from 0 to max(m-n,0) rows out.
(Again it is probably possible to set a tighter upper bound when m', n'
are small.)

Again, pushing into RHS would allow n to be reduced, perhaps allowing more
rows to be returned than SQL would allow, so we can't do that.

If we push into LHS only: we get max(m'-n,0) rows.  Again, it would be
possible for the unoptimized setop to select its representative rows in
such a way that this happens (it just has to prefer LHS rows passing the
qual over those that don't).

Again, this doesn't quite match reality of the current setop
implementation, since it returns max(m-n,0) identical rows, not that
many random members of the current group.  But it's still hard to believe
anyone would have a legitimate beef about it.


tl;dr: I now think what the patch proposes to do is legit.  There's a heck
of a lot of work to be done on the comments it's falsified, though.
        regards, tom lane



Re: [HACKERS] propose to pushdown qual into EXCEPT

От
Tom Lane
Дата:
I wrote:
> tl;dr: I now think what the patch proposes to do is legit.  There's a heck
> of a lot of work to be done on the comments it's falsified, though.

Hmm, wait a minute.  I mentioned before that what nodeSetOp.c actually
returns is N copies of the same representative tuple.  That in itself
doesn't break the proposed optimization, or at least so I argued ---
but the real question is which representative tuple does it pick?
The answer, as noted in the file header comment, is
* Note that SetOp does no qual checking nor projection.  The delivered* output tuples are just copies of the
first-to-arrivetuple in each* input group.
 

In HASHED mode, the first-to-arrive tuple must be from the lefthand
input, which would mean that it's passed the pushed-down qual, so
all is well.  (If no LHS tuples exist in a given group, then EXCEPT
won't output anything, so the fact that it could have collected a
representative tuple from the RHS doesn't matter.)

However, in SORTED mode, I don't see that there's anything particularly
guaranteeing the order in which tuples arrive within a sort group.
If the sort isn't stable, and I don't think all our sorting paths are,
it would be possible to return an RHS tuple as the representative one.

This breaks the proposed optimization because it would become possible
to return a tuple that doesn't pass the pushed-down qual at all.

There are at least two ways this could be dealt with.  We could add the
flag column as a low-order sort column so that it's still guaranteed that
LHS tuples arrive before RHS ones within a group.  (This'd complicate
matters in generate_nonunion_path because now the sort keys would be
different from the setop node's own grouping keys, but it's certainly
possible.)  Or we could fix it at runtime by complicating
setop_retrieve_direct so that it replaces the representative tuple
with the first LHS tuple when that arrives.  Either way, though,
more work is needed than just hacking the qual pushdown logic.
        regards, tom lane