Обсуждение: Proving IS NOT NULL inference for ScalarArrayOpExpr's

Поиск
Список
Период
Сортировка

Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
I've recently been investigating improving our plans for queries like:
    SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".

Currently the planner generates an index [only] scan so long as the number of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
    SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.

This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The rest
of the predicate proving code then becomes unable to infer that foo is not null
and therefore the planner cannot prove that the partial index is correct to
use.

(Please pardon technical details in the below background that may be way off;
I don't have a lot of experience with the Postgres codebase yet, and am still
trying to build a mental model of things.)

At first I was imagining having the parse keep track of whether an array const
expr contained any nulls and perhaps adding generated quals (in an equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
I realized that at least in many normal op cases we can safely infer that foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case where
the above doesn't hold true, but I can't immediately think of any, and indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?


Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

Thanks,
James Coleman
Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
I've become more confident that this approach is correct after
discussions with others on my team and have added the patch to the
open commitfest.

I'm attaching v2 of the patch here with a regression test (that fails
on current master, but is green both with my patch and with current
master if you remove items from the test array to make the array <=
100 items) as well as a comment detailing the reasoning in predtest.c.


On Sat, Nov 10, 2018 at 4:33 PM James Coleman <jtc331@gmail.com> wrote:
>
> I've recently been investigating improving our plans for queries like:
>     SELECT * FROM t WHERE t.foo IN (1, 2..1000);
> where the table "t" has a partial index on "foo" where "foo IS NOT NULL".
>
> Currently the planner generates an index [only] scan so long as the number of
> items in the IN expression is <= 100, but as soon as you add the 101st item
> it reverts to seq scan. If we add the explicit null check like:
>     SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
> then we go back to the desired index scan.
>
> This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
> constant array arguments into OR trees when the array size is > 100. The rest
> of the predicate proving code then becomes unable to infer that foo is not null
> and therefore the planner cannot prove that the partial index is correct to
> use.
>
> (Please pardon technical details in the below background that may be way off;
> I don't have a lot of experience with the Postgres codebase yet, and am still
> trying to build a mental model of things.)
>
> At first I was imagining having the parse keep track of whether an array const
> expr contained any nulls and perhaps adding generated quals (in an equivalence
> class?) to allow the planner to easily prove the index was correct. I'd been
> going down this track because in my mind the issue was because the planner
> needed to verify whether all of the array elements were not null.
>
> But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
> I realized that at least in many normal op cases we can safely infer that foo
> is not null when "foo <op> <array>" is true even if the array contains null
> elements.
>
> This is such a simple change that it seems like I must be missing a case where
> the above doesn't hold true, but I can't immediately think of any, and indeed
> with the attached patch all existing tests pass (including some additional
> ones I added for predtest to play around with it).
>
> Am I missing something obvious? Is this a valid approach?
>
>
> Other outstanding questions:
>
> Should I add additional tests for predtest? It already seems to cover some null
> test cases with scalar array ops, but I'd be happy to add more if desired.
>
> Should I add a test case for the resulting plan with "foo IN (...)" with an
> array with more than 100 elements?
>
> Thanks,
> James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
Also, my apologies for top posting; I forgot to remove the old email
before clicking send.


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
Is there anything I can do to solicit reviewers for the current CF?

The patch is quite small and should be fairly simple to review.

- James


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> Is there anything I can do to solicit reviewers for the current CF?

There is no active CF, which is why nothing is happening.  We'll
start looking at the 2019-01 items in January.  Right now, people
are either on vacation or working on their own patches.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Sun, 11 Nov 2018 at 10:33, James Coleman <jtc331@gmail.com> wrote:
> At first I was imagining having the parse keep track of whether an array const
> expr contained any nulls and perhaps adding generated quals (in an equivalence
> class?) to allow the planner to easily prove the index was correct. I'd been
> going down this track because in my mind the issue was because the planner
> needed to verify whether all of the array elements were not null.
>
> But as I started to dig into the predtest.c NOT NULL proofs and add test cases,
> I realized that at least in many normal op cases we can safely infer that foo
> is not null when "foo <op> <array>" is true even if the array contains null
> elements.
>
> This is such a simple change that it seems like I must be missing a case where
> the above doesn't hold true, but I can't immediately think of any, and indeed
> with the attached patch all existing tests pass (including some additional
> ones I added for predtest to play around with it).
>
> Am I missing something obvious? Is this a valid approach?

I started looking at this patch and I see that there's no check to
ensure that the IS NOT NULL pred matches the left operand in the IN()
clause.

Here's a demo of why this is bad:

create table t (a int, b int);
create index t_a_is_not_null on t(a) where a is not null;
insert into t values(null,1); -- not indexed
set enable_seqscan=0; -- force index to be used whenever possible
select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- row missing!
 a | b
---+---
(0 rows)

explain select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98
,99,100,101); -- should not use t_a_is_not_null index.


                    QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
 Bitmap Heap Scan on t  (cost=4.42..320.84 rows=1141 width=8)
   Recheck Cond: (a IS NOT NULL)
   Filter: (b = ANY

('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}'::integer[]))
   ->  Bitmap Index Scan on t_a_is_not_null  (cost=0.00..4.13 rows=2249 width=0)
(4 rows)

set enable_seqscan=1;
SET
select * from t where b

in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
-- Correct!
 a | b
---+---
   | 1
(1 row)

Basically, the planner assumes that the WHERE a IS NOT NULL index
implies WHERE b IN(1,...,101), which is definitely not the case.

Probably your new code needs to be expanded to become:

if (IsA(clause, ScalarArrayOpExpr))
{
    ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
    if (op_strict(saop->opno) &&
        clause_is_strict_for((Node *) linitial(saop->args), subexpr))
        return true;
}


> Other outstanding questions:
>
> Should I add additional tests for predtest? It already seems to cover some null
> test cases with scalar array ops, but I'd be happy to add more if desired.
>
> Should I add a test case for the resulting plan with "foo IN (...)" with an
> array with more than 100 elements?

I've not looked in detail, but perhaps the place to put the tests are
in src/test/modules/test_predtest.  This module adds a function named
test_predtest() that you can likely use more easily than trying to
EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
behaviour of the changed code.

I'll mark this as waiting on author.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Sun, 13 Jan 2019 at 14:49, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've not looked in detail, but perhaps the place to put the tests are
> in src/test/modules/test_predtest.  This module adds a function named
> test_predtest() that you can likely use more easily than trying to
> EXPLAIN plans and hoping the planner's behaviour helps to exhibit the
> behaviour of the changed code.

I just noticed that I accidentally reviewed v1 instead of v2.  I see
you found that module.  I'll keep this as waiting on author until the
other bug is fixed.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Sat, Jan 12, 2019 at 8:52 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> Basically, the planner assumes that the WHERE a IS NOT NULL index
> implies WHERE b IN(1,...,101), which is definitely not the case.
>
> Probably your new code needs to be expanded to become:
>
> if (IsA(clause, ScalarArrayOpExpr))
> {
>     ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
>     if (op_strict(saop->opno) &&
>         clause_is_strict_for((Node *) linitial(saop->args), subexpr))
>         return true;
> }

I've updated the code as well as included the sad path test in
addition to my original happy path test.

Patch v3 attached.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> [ saop_is_not_null-v3.patch ]

There's still a logical problem here, which is that in order to
prove that the ScalarArrayOpExpr yields NULL when its LHS does,
you also have to prove that the RHS is not an empty array.
Otherwise you're up against the fact that the OR of zero boolean
expressions is false not null:

regression=# select null::int = any(array[]::int[]);
 ?column? 
----------
 f
(1 row)

While this is doable when the RHS is an array constant or ArrayExpr, I'm
afraid it's likely to be a bit tedious.  (Hm, actually predicate_classify
already has logic to extract the number of elements in these cases ...
I wonder if there's any use in trying to share code?)

Minor stylistic quibble: I don't like where you inserted the new code in 
clause_is_strict_for, because it renders the comment a few lines above
that unrelatable to the code.  I'd put it at the bottom, after the
is_funcclause stanza.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> There's still a logical problem here, which is that in order to
> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
> you also have to prove that the RHS is not an empty array.
>
> Otherwise you're up against the fact that the OR of zero boolean
> expressions is false not null:
>
> regression=# select null::int = any(array[]::int[]);
>  ?column?
> ----------
>  f
> (1 row)

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

> While this is doable when the RHS is an array constant or ArrayExpr, I'm
> afraid it's likely to be a bit tedious.  (Hm, actually predicate_classify
> already has logic to extract the number of elements in these cases ...
> I wonder if there's any use in trying to share code?)

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items. My new code also has this "problem", but I figured it
was worth getting this logically correct before attempting to optimize
that especially since it already exists in one place.

> Minor stylistic quibble: I don't like where you inserted the new code in
> clause_is_strict_for, because it renders the comment a few lines above
> that unrelatable to the code.  I'd put it at the bottom, after the
> is_funcclause stanza.

Fixed in this revision.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's still a logical problem here, which is that in order to
>> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
>> you also have to prove that the RHS is not an empty array.

> I've constructed a test case running the test_predtest function on the
> expression
> "select x is null, x = any(array[]::int[]) from integers" which I
> think gets at the logic bug you're noting. In doing this I noticed
> something I don't fully understand: on master this shows that "x =
> any(array[]::int[])" proves "x is null" (while my patch shows the
> opposite). Is this because the second expression being false means
> there's can be no valid value for x? It also claims it refutes the
> same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication.  If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch.  Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

> It seems to me that this would mean that an IS NOT NULL index would
> still not be proven to be usable for a non-constant array (e.g., from
> a subquery calculated array), and in fact I've included a (failing)
> test demonstrating that fact in the attached patch. This feeds into my
> question above about there possibly being another corollary/proof that
> could be added for this case.

Hm.  That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all.  The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds.  However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four.  (ENOCAFFEINE...)  In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

> Semi-related: I noticed that predicate_classify handles an ArrayExpr
> by using list_length; this seems unnecessary if we know we can fail
> fast. I realize it's not a common problem, but I have seen extremely
> long arrays before, and maybe we should fall out of the count once we
> hit max+1 items.

Huh?  list_length() is constant-time.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> There's still a logical problem here, which is that in order to
> >> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
> >> you also have to prove that the RHS is not an empty array.
>
> > I've constructed a test case running the test_predtest function on the
> > expression
> > "select x is null, x = any(array[]::int[]) from integers" which I
> > think gets at the logic bug you're noting. In doing this I noticed
> > something I don't fully understand: on master this shows that "x =
> > any(array[]::int[])" proves "x is null" (while my patch shows the
> > opposite). Is this because the second expression being false means
> > there's can be no valid value for x? It also claims it refutes the
> > same, which is even more confusing.
>
> That sounds like a bug, but I think it's actually OK because a vacuous
> OR is necessarily false, and a false predicate "implies" any conclusion
> at all according to the customary definition of implication.  If there
> are any real optimization bugs in this area, it's probably a result of
> calling code believing more than it should about the meaning of a proof.
>
> I think these test cases don't actually prove much about the behavior
> of your patch.  Wouldn't they be handled by the generic OR-conversion
> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

> > It seems to me that this would mean that an IS NOT NULL index would
> > still not be proven to be usable for a non-constant array (e.g., from
> > a subquery calculated array), and in fact I've included a (failing)
> > test demonstrating that fact in the attached patch. This feeds into my
> > question above about there possibly being another corollary/proof that
> > could be added for this case.
>
> Hm.  That would be annoying, but on reflection I think maybe we don't
> actually need to worry about emptiness of the array after all.  The
> thing that we need to prove, at least for the implication case, is
> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
> Per above, if the ScalarArrayOp is necessarily false, then we can
> claim that the implication holds.  However, we need to work through
> all four proof rules (strong vs. weak, implication vs refutation)
> and see which ones this applies to --- I'm not sure offhand that
> the logic works for all four.  (ENOCAFFEINE...)  In any case it
> requires thinking a bit differently about what clause_is_strict_for()
> is really doing.

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

> > Semi-related: I noticed that predicate_classify handles an ArrayExpr
> > by using list_length; this seems unnecessary if we know we can fail
> > fast. I realize it's not a common problem, but I have seen extremely
> > long arrays before, and maybe we should fall out of the count once we
> > hit max+1 items.
>
> Huh?  list_length() is constant-time.

Facepalm. I'd somehow had it stuck in my head that we actually
iterated the list to count.

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think these test cases don't actually prove much about the behavior
>> of your patch.  Wouldn't they be handled by the generic OR-conversion
>> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

> Which ones are you looking at in particular? The "inline" (non-cte)
> happy and sad path cases have 102 total array elements (as does the
> happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
> tests are about the empty array case so much have 0 items (and were
> the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones.  I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

>> Hm.  That would be annoying, but on reflection I think maybe we don't
>> actually need to worry about emptiness of the array after all.  The
>> thing that we need to prove, at least for the implication case, is
>> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

> Are you thinking that implies clause_is_strict_for might not be the
> right/only place? Or just that the code in that function needs to
> consider if it should return different results in some cases to handle
> all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > On Mon, Jan 14, 2019 at 11:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think these test cases don't actually prove much about the behavior
> >> of your patch.  Wouldn't they be handled by the generic OR-conversion
> >> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?
>
> > Which ones are you looking at in particular? The "inline" (non-cte)
> > happy and sad path cases have 102 total array elements (as does the
> > happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
> > tests are about the empty array case so much have 0 items (and were
> > the ones that showed different behavior between v3 and v4).
>
> I was looking at the empty-array ones.  I don't see how that reaches
> your patch; we should not be doing predicate_implied_by_simple_clause
> on the ScalarArrayOp itself, because predicate_classify should've
> decided to treat it as an OR clause.

At one point I was getting all 'f' instead of all 't' for those, but
now I can't reproduce it going from either master to my patch or to my
patch without the element count checks, so I'm not sure how/when I was
getting that anymore.

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> Are those invariants we want to keep (and recognize as regression
> tests)? If so, I can confirm that they aren't duplicative of the rest
> of the file, and if not I can remove them.

I can't get terribly excited about them ... if we were changing their
behavior, or if there were a bug discovered here, then I'd think
differently.  But I'm not a fan of adding regression test cycles
without a reason.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Mon, Jan 14, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Hm.  That would be annoying, but on reflection I think maybe we don't
> >> actually need to worry about emptiness of the array after all.  The
> >> thing that we need to prove, at least for the implication case, is
> >> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
>
> > Are you thinking that implies clause_is_strict_for might not be the
> > right/only place? Or just that the code in that function needs to
> > consider if it should return different results in some cases to handle
> > all 4 proof rules properly?
>
> The latter is what I was envisioning, but I haven't worked out details.

The more that I look at this I'm wondering if there aren't two things
here: it seems that the existing patch (with the code that excludes
empty arrays) is likely useful on its own for cases I hadn't
originally envisioned (or tested). Specifically it seems to allow us
to conclude the result will be null if a field is known to be null. I
think I'll try to add a test specific to this, but I'm not immediately
sure I know a case where that matters. If anyone has an idea on what
direction to head with this, I'd be happy to code it up.

For the original goal of "truth of the ScalarArrayOp implies truth of
the IS NOT NULL clause", it seems to me the proper place is
predicate_implied_by_simple_clause where we already have a place to
specifically work with IS NOT NULL expressions. That will allow this
to be more targeted, work for empty arrays as well, and not
potentially introduce an optimizer bug for strictness with empty
arrays.

I'm attaching an updated patch that does that. I've also added the
"parallel" case in predicate_refuted_by_simple_clause, but I haven't
added a test for that yet. I also would like to add a test for the sad
path to parallel the happy path computed array/cte test.

While I add that though I wanted to get this updated version out to
get feedback on the approach.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Tue, 15 Jan 2019 at 12:24, James Coleman <jtc331@gmail.com> wrote:
> While I add that though I wanted to get this updated version out to
> get feedback on the approach.

I had a look at this and there's a couple of things I see:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
  /* foo IS NOT NULL refutes foo IS NULL */
  if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL.  The NullTest condition one below does
this.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand.  Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand.  This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101);   I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 15, 2019 at 12:47 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> 1. In:
>
> + if (IsA(clause, ScalarArrayOpExpr))
> + {
> + ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
> + Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
> + if (op_strict(saop->opno) &&
> + clause_is_strict_for((Node *) linitial(saop->args), subexpr))
> + return true;
> + }
> +
>   /* foo IS NOT NULL refutes foo IS NULL */
>   if (clause && IsA(clause, NullTest) &&
>
> Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
> that "clause" can't be NULL.  The NullTest condition one below does
> this.

Fixed in my local copy. I also did the same in
predicate_implied_by_simple_clause since it seems to have the same
potential issue.

After sleeping on it and looking at it again this morning, I also
realized I need to exclude weak implication in
predicate_refuted_by_simple_clause.

> 2. I was also staring predicate_implied_by_simple_clause() a bit at
> the use of clause_is_strict_for() to ensure that the IS NOT NULL's
> operand matches the ScalarArrayOpExpr left operand.  Since
> clause_is_strict_for() = "Can we prove that "clause" returns NULL if
> "subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
> left operand and subexpr is the IS NOT NULL's operand.  This means
> that a partial index with "WHERE a IS NOT NULL" should also be fine to
> use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
> must be NULL if a is NULL. Also also works for WHERE a+a
> IN(1,2,...,101);   I wonder if it's worth adding a test for that, or
> even just modify one of the existing tests to ensure you get the same
> result from it. Perhaps it's worth an additional test to ensure that x
> IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
> NULL does not refute x IN(1,2,...,101), as a strict function is free
> to return NULL even if it's input are not NULL.

Are you suggesting a different test than clause_is_strict_for to
verify the saop LHS is the same as the null test's arg? I suppose we
could use "equal()" instead?

I've added several tests, and noticed a few things:

1. The current code doesn't result in "strongly_implied_by = t" for
the "(x + x) is not null" case, but it does result in "s_i_holds = t".
This doesn't change if I switch to using "equal()" as mentioned above.

2. The tests I have for refuting "x is null" show that w_r_holds as
well as s_r_holds. I'd only expected the latter, since only
"strongly_refuted_by = t".

3. The tests I have for refuting "(x + x) is null" show that both
s_r_holds and w_r_holds. I'd expected these to be false.

I'm attaching the current version of the patch with the new tests, but
I'm not sure I understand the *_holds results mentioned above. Are
they an artifact of the data under test? Or do they suggest a
remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
understand the requirements for strongly/weakly_implied/refuted_by.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> I'm attaching the current version of the patch with the new tests, but
> I'm not sure I understand the *_holds results mentioned above. Are
> they an artifact of the data under test? Or do they suggest a
> remaining bug? I'm a bit fuzzy on what to expect for *_holds though I
> understand the requirements for strongly/weakly_implied/refuted_by.

IIRC, the "foo_holds" outputs mean "the foo relationship appears
to hold for these expressions across all tested inputs", for instance
s_i_holds means "there were no tested cases where A is true and B is not
true".  The implied_by/refuted_by outputs mean "predtest.c claims it can
prove this".  Obviously, a claimed proof for a relationship that fails
experimentally would be big trouble.  The other way around just means
that either the proof rules are inadequate to prove this particular case,
or the set of test values for the expressions don't expose the fact that
the relationship doesn't hold.

Now, if we *expected* that predtest.c should be able to prove something,
failure to do so would be a bug, but it's not a bug if we know it's
incapable of making that proof.  The second explanation might represent
a bug in the test case.

I added the foo_holds columns just as a sort of cross-check on the
test cases themselves, they don't test anything about the predtest
code.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> [ saop_is_not_null-v6.patch ]

I quite dislike taking the responsibility out of clause_is_strict_for
and putting it in the callers.  Aside from requiring duplicative code,
it means that this fails to prove anything for recursive situations
(i.e., where the ScalarArrayOp appears one or more levels down in a
clause examined by clause_is_strict_for).

If the behavior needs to vary depending on proof rule, which it
looks like it does, the way to handle that is to add flag argument(s)
to clause_is_strict_for.

I'm also not happy about the lack of comments justifying the proof
rules -- eg, it's far from obvious why you need to restrict one
case to !weak but not the other.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Wed, 16 Jan 2019 at 03:33, James Coleman <jtc331@gmail.com> wrote:
> > 2. I was also staring predicate_implied_by_simple_clause() a bit at
> > the use of clause_is_strict_for() to ensure that the IS NOT NULL's
> > operand matches the ScalarArrayOpExpr left operand.  Since
> > clause_is_strict_for() = "Can we prove that "clause" returns NULL if
> > "subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
> > left operand and subexpr is the IS NOT NULL's operand.  This means
> > that a partial index with "WHERE a IS NOT NULL" should also be fine to
> > use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
> > must be NULL if a is NULL. Also also works for WHERE a+a
> > IN(1,2,...,101);   I wonder if it's worth adding a test for that, or
> > even just modify one of the existing tests to ensure you get the same
> > result from it. Perhaps it's worth an additional test to ensure that x
> > IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
> > NULL does not refute x IN(1,2,...,101), as a strict function is free
> > to return NULL even if it's input are not NULL.
>
> Are you suggesting a different test than clause_is_strict_for to
> verify the saop LHS is the same as the null test's arg? I suppose we
> could use "equal()" instead?

I wasn't suggesting any code changes.  I just thought the code was
sufficiently hard to understand to warrant some additional tests that
ensure we don't assume that if the int4 column x is not null that also
x+x is not null. Only the reverse can be implied since int4pl is
strict.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 15, 2019 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > [ saop_is_not_null-v6.patch ]
>
> I quite dislike taking the responsibility out of clause_is_strict_for
> and putting it in the callers.  Aside from requiring duplicative code,
> it means that this fails to prove anything for recursive situations
> (i.e., where the ScalarArrayOp appears one or more levels down in a
> clause examined by clause_is_strict_for).
>
> If the behavior needs to vary depending on proof rule, which it
> looks like it does, the way to handle that is to add flag argument(s)
> to clause_is_strict_for.

The reason I moved it was that we're no longer just proving
strictness, so it seemed odd to put it in a function specifically
named to be about proving strictness.

If you'd prefer an argument like "bool allow_false" or "bool
check_saop_implies_is_null" I'm happy to make that change, but it
seems generally unhelpful to change the function implementation to
contradict the name (the empty array case that returns false even then
the lhs is null).

> I'm also not happy about the lack of comments justifying the proof
> rules -- eg, it's far from obvious why you need to restrict one
> case to !weak but not the other.

Both implication and refutation have the !weak check; it's just that
the check in the implication function was already present.

I'll add more comments inline for the proof rules, but from what I can
tell only strong implication holds for implication of IS NOT NULL. A
sample case disproving weak implication is "select null is not null,
null::int = any(array[null]::int[])" which results in "false, null",
so non-falsity does not imply non-falsity.

However I'm currently unable to come up with a case for refutation
such that truth of a ScalarArrayOp refutes IS NULL, so I'll update
code and tests for that. That also takes care of my point:

James Coleman <jtc331@gmail.com> writes:
> 2. The tests I have for refuting "x is null" show that w_r_holds as
> well as s_r_holds. I'd only expected the latter, since only
> "strongly_refuted_by = t".

But it still leaves my questions (1) and (3).

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 15, 2019 at 4:36 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I wasn't suggesting any code changes.  I just thought the code was
> sufficiently hard to understand to warrant some additional tests that
> ensure we don't assume that if the int4 column x is not null that also
> x+x is not null. Only the reverse can be implied since int4pl is
> strict.

At the risk of missing something obvious, I'm not sure I see a case
where "x is not null" does not imply "(x + x) is not null", at least
for integers. Since an integer + an integer results in an integer,
then it must imply the addition of itself is not null also?

This is the root of the questions I had:

James Coleman <jtc331@gmail.com> writes:
> 1. The current code doesn't result in "strongly_implied_by = t" for
> the "(x + x) is not null" case, but it does result in "s_i_holds = t".
> This doesn't change if I switch to using "equal()" as mentioned above.

> 3. The tests I have for refuting "(x + x) is null" show that both
> s_r_holds and w_r_holds. I'd expected these to be false.

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Wed, 16 Jan 2019 at 14:05, James Coleman <jtc331@gmail.com> wrote:
> At the risk of missing something obvious, I'm not sure I see a case
> where "x is not null" does not imply "(x + x) is not null", at least
> for integers. Since an integer + an integer results in an integer,
> then it must imply the addition of itself is not null also?

A strict function guarantees it will return NULL on any NULL input.
This does not mean it can't return NULL on a non-NULL input.

While int4pl might do what you want, some other strict function might
not. A simple example would be a strict function that decided to
return NULL when the two ints combined overflowed int.

The docs [1] define STRICT as:

 "RETURNS NULL ON NULL INPUT or STRICT indicates that the function
always returns null whenever any of its arguments are null. If this
parameter is specified, the function is not executed when there are
null arguments; instead a null result is assumed automatically."

[1] https://www.postgresql.org/docs/devel/sql-createfunction.html

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 15, 2019 at 8:14 PM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> On Wed, 16 Jan 2019 at 14:05, James Coleman <jtc331@gmail.com> wrote:
> > At the risk of missing something obvious, I'm not sure I see a case
> > where "x is not null" does not imply "(x + x) is not null", at least
> > for integers. Since an integer + an integer results in an integer,
> > then it must imply the addition of itself is not null also?
>
> A strict function guarantees it will return NULL on any NULL input.
> This does not mean it can't return NULL on a non-NULL input.
>
> While int4pl might do what you want, some other strict function might
> not. A simple example would be a strict function that decided to
> return NULL when the two ints combined overflowed int.

Yes, the claim about not all strict functions guarantee this (like
int4pl) made sense.

Is your preference in this kind of case to comment the code and/or
tests but stick with int4pl even though it doesn't demonstrate the
"problem", or try to engineer a different test case such that the
*_holds results in the tests don't seem to imply we could prove more
things than we do?

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
David Rowley
Дата:
On Wed, 16 Jan 2019 at 14:29, James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Jan 15, 2019 at 8:14 PM David Rowley
> > While int4pl might do what you want, some other strict function might
> > not. A simple example would be a strict function that decided to
> > return NULL when the two ints combined overflowed int.
>
> Yes, the claim about not all strict functions guarantee this (like
> int4pl) made sense.
>
> Is your preference in this kind of case to comment the code and/or
> tests but stick with int4pl even though it doesn't demonstrate the
> "problem", or try to engineer a different test case such that the
> *_holds results in the tests don't seem to imply we could prove more
> things than we do?

I think using x+x or whatever is fine. I doubt there's a need to
invent some new function that returns NULL on non-NULL input.  The
code you're adding has no idea about the difference between the two.
It has no way to know that anyway.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Tue, Jan 15, 2019 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I quite dislike taking the responsibility out of clause_is_strict_for
>> and putting it in the callers.

> The reason I moved it was that we're no longer just proving
> strictness, so it seemed odd to put it in a function specifically
> named to be about proving strictness.

Well, as I said upthread, it seems like we need to think a bit more
carefully about what it is that clause_is_strict_for is testing ---
and if that ends up finding that some other name is more apposite,
I'd not have any hesitation about renaming it.  But we're really
missing a bet if the ScalarArrayOp-specific check isn't inside that
recursive check of an "atomic" expression's properties.  The
routines above there only recurse through things that act like
AND or OR, but clause_is_strict_for is capable of descending
through other stuff as long as it's strict in some sense.  What
we need to be clear about here is exactly what that sense is.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Wed, 16 Jan 2019 at 14:29, James Coleman <jtc331@gmail.com> wrote:
>> Is your preference in this kind of case to comment the code and/or
>> tests but stick with int4pl even though it doesn't demonstrate the
>> "problem", or try to engineer a different test case such that the
>> *_holds results in the tests don't seem to imply we could prove more
>> things than we do?

> I think using x+x or whatever is fine. I doubt there's a need to
> invent some new function that returns NULL on non-NULL input.  The
> code you're adding has no idea about the difference between the two.
> It has no way to know that anyway.

Yeah, I never intended that the *_holds results would be "exact" in
every test case.  They're mostly there to catch egregious errors in
test construction.  Since the code we're actually testing doesn't get
to see the input or output values of the test cases, it's not really
useful to insist that the test cases exercise every possible combination
of input/output values for the given expressions.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 15, 2019 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, as I said upthread, it seems like we need to think a bit more
> carefully about what it is that clause_is_strict_for is testing ---
> and if that ends up finding that some other name is more apposite,
> I'd not have any hesitation about renaming it.  But we're really
> missing a bet if the ScalarArrayOp-specific check isn't inside that
> recursive check of an "atomic" expression's properties.  The
> routines above there only recurse through things that act like
> AND or OR, but clause_is_strict_for is capable of descending
> through other stuff as long as it's strict in some sense.  What
> we need to be clear about here is exactly what that sense is.

All right, I'll look over all of the other callers and see what makes
sense as far as naming (or perhaps consider a new parallel function;
unsure at this point.)

I might also try to see if we can edit the tests slightly to require
the recursion case to be exercised.

One other question on testing: do you think the "calculated array"
tests are good enough by themselves (i.e., remove the ones with array
constants of > 100 values)? I dislike that it's not as obvious what's
going on, but given that the code shouldn't care about array size
anyway...so if there's an inclination to fewer tests that's the first
place I'd look at cutting them.

James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> One other question on testing: do you think the "calculated array"
> tests are good enough by themselves (i.e., remove the ones with array
> constants of > 100 values)? I dislike that it's not as obvious what's
> going on, but given that the code shouldn't care about array size
> anyway...so if there's an inclination to fewer tests that's the first
> place I'd look at cutting them.

I don't have a strong opinion about that at this point.  It might be
clearer once the patch is finished; for now, there's no harm in erring
towards the more-tests direction.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Wed, Jan 16, 2019 at 8:49 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Jan 15, 2019 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Well, as I said upthread, it seems like we need to think a bit more
> > carefully about what it is that clause_is_strict_for is testing ---
> > and if that ends up finding that some other name is more apposite,
> > I'd not have any hesitation about renaming it.  But we're really
> > missing a bet if the ScalarArrayOp-specific check isn't inside that
> > recursive check of an "atomic" expression's properties.  The
> > routines above there only recurse through things that act like
> > AND or OR, but clause_is_strict_for is capable of descending
> > through other stuff as long as it's strict in some sense.  What
> > we need to be clear about here is exactly what that sense is.
>
> All right, I'll look over all of the other callers and see what makes
> sense as far as naming (or perhaps consider a new parallel function;
> unsure at this point.)

I investigated all of the callers of clause_is_strict_for and
confirmed they fall into two buckets:
- Recursive case inside the function itself.
- Callers proving a variant of IS [NOT] NULL.

Given all cases appear to be safe to extend to scalar array ops, I've
tentatively renamed the function clause_proved_for_null_test and moved
the code back into that function rather than be in the caller. This
also guarantees that beyond proving implication of IS NOT NULL and
refutation of IS NULL we also get proof of weak refutation of strict
predicates (since IS NOT NULL refutes them and we strongly imply IS
NOT NULL); I've added tests for this case.

I added many more comments explaining the proofs here, but it boils
down to the fact that non-empty array ops are really just a strict
clause, and the empty array case isn't strict, but when not returning
NULL returns false, and therefore we can still make the same proofs.

Since the empty array case is really the only interesting one, I
brought back the empty array tests, but modified them to use
calculated/non-constant arrays so that they actually hit the new code
paths. I also verified that the proofs they make are a subset of the
ones we get from existing code for constant empty arrays (it makes
sense we'd be able to prove less about possibly empty arrays than
known empty arrays.)

> I might also try to see if we can edit the tests slightly to require
> the recursion case to be exercised.

I didn't tackle this, but if you think it would add real value, then I
can look into it further.

I also updated the commit message to better match the more extended
implications of this change over just the IS NOT NULL case I was
originally pursuing.

One other thing: for known non-empty arrays we could further extend
this to prove that an IS NULL clause weakly implies the ScalarArrayOp.
I'm not sure this is worth is it; if someone things otherwise let me
know.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Alvaro Herrera
Дата:
Hello, I gave this patch a very quick scan.  I didn't check the actual
logic behind it.

This comment seems wrong:

+ * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
+ * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.

"non-falsity does not imply non-falsity"?  I suppose one of those
negations should be different ...

I think the name clause_proved_for_null_test() is a bit weird, being in
the past tense.  I'd maybe change "proved" to "proves".

s/exppresions/expresions/ in the test files.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Tue, Jan 22, 2019 at 4:26 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Hello, I gave this patch a very quick scan.  I didn't check the actual
> logic behind it.
>
> This comment seems wrong:
>
> + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
> + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
>
> "non-falsity does not imply non-falsity"?  I suppose one of those
> negations should be different ...

Earlier in the file weak implication (comments above
predicate_implied_by) is defined as "non-falsity of A implies
non-falsity of B". In this example we have NULL for A (non-false) but
false for B, so that definition doesn't hold. So I think the comment
is accurate, but I can reword if you have an idea of what you'd like
to see (I've tweaked a bit in the attached patch to start).

> I think the name clause_proved_for_null_test() is a bit weird, being in
> the past tense.  I'd maybe change "proved" to "proves".

Changed.

> s/exppresions/expresions/ in the test files.

Fixed.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
> > This comment seems wrong:
> >
> > + * However weak implication fails: e.g., "NULL IS NOT NULL" is false, but
> > + * "NULL = ANY(ARRAY[NULL])" is NULL, so non-falsity does not imply non-falsity.
> >
> > "non-falsity does not imply non-falsity"?  I suppose one of those
> > negations should be different ...
>
> Earlier in the file weak implication (comments above
> predicate_implied_by) is defined as "non-falsity of A implies
> non-falsity of B". In this example we have NULL for A (non-false) but
> false for B, so that definition doesn't hold. So I think the comment
> is accurate, but I can reword if you have an idea of what you'd like
> to see (I've tweaked a bit in the attached patch to start).

I forgot to update in v8 so attaching v9.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> I forgot to update in v8 so attaching v9.

So ... this is actively wrong.

This case shows that you can't ignore the empty-array possibility
for a ScalarArrayOpExpr with useOr = false, because
"SELECT null::int = all(array[]::int[])" yields TRUE:

contrib_regression=# select * from test_predtest($$
select x is not null, x = all(array(select i from generate_series(1,0) i))
from integers
$$);
WARNING:  strong_implied_by result is incorrect
-[ RECORD 1 ]-----+--
strong_implied_by | t
weak_implied_by   | f
strong_refuted_by | f
weak_refuted_by   | f
s_i_holds         | f
w_i_holds         | f
s_r_holds         | f
w_r_holds         | f

This case shows that you can't ignore the distinction between null
and false results once you've descended through strict function(s):

contrib_regression=# select * from test_predtest($$
select x is not null, strictf(true, x = any(array(select i from generate_series(1,0) i)))
from integers
$$);
WARNING:  strong_implied_by result is incorrect
-[ RECORD 1 ]-----+--
strong_implied_by | t
weak_implied_by   | f
strong_refuted_by | f
weak_refuted_by   | f
s_i_holds         | f
w_i_holds         | f
s_r_holds         | f
w_r_holds         | f

(In this usage, the strictf() function from the test_predtest.sql
will simply return the NOT of its second input; so it gives different
answers depending on whether the SAOP returned false or null.)

I think you could probably repair the second problem by adding
an additional argument to clause_is_strict_for() indicating whether
it has to prove the clause to be NULL or merely non-TRUE; when
recursing you'd always have to ask for the former.

The first problem could be resolved by insisting on being able to
prove the array non-empty when !useOr, but I'm not sure if it's
really worth the trouble, as opposed to just not trying to prove
anything for !useOr cases.  I'm not sure that "x op ALL(ARRAY)"
occurs often enough in the wild to justify expending code on.

The other case where being able to prove the array nonempty might
be worth something is when you've recursed and hence need to be
able to prove the SAOP's result to be NULL not just not-TRUE.
Again though, I don't know how often that occurs in practice,
so even the combination of those two motivations might not be
worth having code to check the array argument.

It'd also be worth spending some brain cells on what happens
when the SAOP's array argument is null overall, which causes
its result to be null (not false).  Maybe the logic goes
through without needing any explicit test for that case
(and indeed I couldn't break it in testing that), but it'd
likely be worth a comment.

I don't especially care for the proposed function name
"clause_proves_for_null_test".  The only thing that brings to my
mind is the question "proves WHAT, exactly?" --- and as these
examples show, being crystal clear on what it proves is essential.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
BTW, I just pushed a fix (e04a3905e) that adds a little more code
to clause_is_strict_for().  This shouldn't cause more than minor
rebasing pain for you, I hope.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Mon, Feb 18, 2019 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So ... this is actively wrong.
>
> This case shows that you can't ignore the empty-array possibility
> for a ScalarArrayOpExpr with useOr = false, because
> "SELECT null::int = all(array[]::int[])" yields TRUE:
>
> contrib_regression=# select * from test_predtest($$
> select x is not null, x = all(array(select i from generate_series(1,0) i))
> from integers
> $$);
> WARNING:  strong_implied_by result is incorrect
> -[ RECORD 1 ]-----+--
> strong_implied_by | t
> weak_implied_by   | f
> strong_refuted_by | f
> weak_refuted_by   | f
> s_i_holds         | f
> w_i_holds         | f
> s_r_holds         | f
> w_r_holds         | f
>
> This case shows that you can't ignore the distinction between null
> and false results once you've descended through strict function(s):
>
> contrib_regression=# select * from test_predtest($$
> select x is not null, strictf(true, x = any(array(select i from generate_series(1,0) i)))
> from integers
> $$);
> WARNING:  strong_implied_by result is incorrect
> -[ RECORD 1 ]-----+--
> strong_implied_by | t
> weak_implied_by   | f
> strong_refuted_by | f
> weak_refuted_by   | f
> s_i_holds         | f
> w_i_holds         | f
> s_r_holds         | f
> w_r_holds         | f
>
> (In this usage, the strictf() function from the test_predtest.sql
> will simply return the NOT of its second input; so it gives different
> answers depending on whether the SAOP returned false or null.)
>
> I think you could probably repair the second problem by adding
> an additional argument to clause_is_strict_for() indicating whether
> it has to prove the clause to be NULL or merely non-TRUE; when
> recursing you'd always have to ask for the former.

I've implemented this fix as suggested. The added argument I've
initially called "allow_false"; I don't love the name, but it is
directly what it does. I'd appreciate other suggestions (if you prefer
it change).

> The first problem could be resolved by insisting on being able to
> prove the array non-empty when !useOr, but I'm not sure if it's
> really worth the trouble, as opposed to just not trying to prove
> anything for !useOr cases.  I'm not sure that "x op ALL(ARRAY)"
> occurs often enough in the wild to justify expending code on.
>
> The other case where being able to prove the array nonempty might
> be worth something is when you've recursed and hence need to be
> able to prove the SAOP's result to be NULL not just not-TRUE.
> Again though, I don't know how often that occurs in practice,
> so even the combination of those two motivations might not be
> worth having code to check the array argument.

I've implemented this also; the thing that puts it over the edge for
me on the question of "is it worth it in the wild" is cases like "x
!op ALL(ARRAY)", since it seems plausible to me that that's (unlike
the non-negated case) a reasonably common operation. At the very least
it seems likely to be a more logically interesting operation, which
I'm taking as a proxy for common, I suppose.

> It'd also be worth spending some brain cells on what happens
> when the SAOP's array argument is null overall, which causes
> its result to be null (not false).  Maybe the logic goes
> through without needing any explicit test for that case
> (and indeed I couldn't break it in testing that), but it'd
> likely be worth a comment.

Since all SAOPs (both ALL and ANY) return NULL when the array is NULL
regardless of the LHS argument, then this case qualifies as strict.
I've included this in the above fix since both NULL constant arrays
and non-empty arrays can both be proven strict with strict operators.

Ideally I think this case would be handled elsewhere in the optimizer
by directly replacing the saop and a constant NULL array with NULL,
but this isn't the patch to do that, and at any rate I'd have to do a
bit more investigation to understand how to do such (if it's easily
possible).

> I don't especially care for the proposed function name
> "clause_proves_for_null_test".  The only thing that brings to my
> mind is the question "proves WHAT, exactly?" --- and as these
> examples show, being crystal clear on what it proves is essential.

Given the new argument, I've reverted the name change.

I also updated the tests with a new helper function "opaque_array" to
make it easy to test cases where the planner can't inspect the array.
That way we don't need to rely on CTEs as an optimization fence --
thus improving both maintainability and readability. I also noticed
there were also some tests where the array was already opaque yet I
was still using CTEs as an optimization fence; I've cleaned those up.

I've updated the comments significantly to reflect the above changes.

I've attached an updated (and rebased) patch.

James Coleman

Вложения

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> [ saop_is_not_null-v10.patch ]

I went through this again, and this time (after some more rewriting
of the comments) I satisfied myself that the logic is correct.
Hence, pushed.

I stripped down the regression test cases somewhat.  Those were
good for development, but they were about doubling the runtime of
test_predtest.sql, which seemed excessive for testing one feature.

I also tweaked it to recognize the case where we can prove the
array, rather than the scalar, to be null.  I'm not sure how useful
that is in practice, but it seemed weird that we'd exploit that
only if we can also prove the scalar to be null.

> I've implemented this fix as suggested. The added argument I've
> initially called "allow_false"; I don't love the name, but it is
> directly what it does. I'd appreciate other suggestions (if you prefer
> it change).

I was tempted to rename it to "weak", but decided that that might be
overloading that term too much in this module.  Left it as-is.

> Ideally I think this case would be handled elsewhere in the optimizer
> by directly replacing the saop and a constant NULL array with NULL,
> but this isn't the patch to do that, and at any rate I'd have to do a
> bit more investigation to understand how to do such (if it's easily
> possible).

Take a look at the ScalarArrayOp case in eval_const_expressions.
Right now it's only smart about the all-inputs-constant case.
I'm not really convinced it's worth spending cycles on the constant-
null-array case, but that'd be where to do it if we want to do it
in a general way.  (I think that what I added to clause_is_strict_for
is far cheaper, because while it's the same test, it's in a place
that we won't hit during most queries.)

> I also updated the tests with a new helper function "opaque_array" to
> make it easy to test cases where the planner can't inspect the array.

Yeah, that's a win.  I used that in most of the tests that I left in
place, but I kept a couple with long arrays just so we'd have code
coverage of the parts of clause_is_strict_for that need to detect
array size.

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > [ saop_is_not_null-v10.patch ]
>
> I went through this again, and this time (after some more rewriting
> of the comments) I satisfied myself that the logic is correct.
> Hence, pushed.

Thanks!

> I also tweaked it to recognize the case where we can prove the
> array, rather than the scalar, to be null.  I'm not sure how useful
> that is in practice, but it seemed weird that we'd exploit that
> only if we can also prove the scalar to be null.

Just for my own understanding: I thought the "if
(arrayconst->constisnull)" took care of the array constant being null?
I don't see a check on the scalar node / lhs. I do see you added a
check for the entire clause being null, but I'm not sure I understand
when that would occur (unless it's only in the recursive case?)

> Take a look at the ScalarArrayOp case in eval_const_expressions.
> Right now it's only smart about the all-inputs-constant case.
> I'm not really convinced it's worth spending cycles on the constant-
> null-array case, but that'd be where to do it if we want to do it
> in a general way.  (I think that what I added to clause_is_strict_for
> is far cheaper, because while it's the same test, it's in a place
> that we won't hit during most queries.)

Thanks for the pointer; I'll take a look if for no other reason than curiosity.

Thanks,
James Coleman


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
Tom Lane
Дата:
James Coleman <jtc331@gmail.com> writes:
> On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I also tweaked it to recognize the case where we can prove the
>> array, rather than the scalar, to be null.  I'm not sure how useful
>> that is in practice, but it seemed weird that we'd exploit that
>> only if we can also prove the scalar to be null.

> Just for my own understanding: I thought the "if
> (arrayconst->constisnull)" took care of the array constant being null?

Yeah, but only after we've already matched the subexpr to the LHS,
otherwise we'd not be bothering to determine the array's size.

On the other hand, if for some reason we know that the array side
is NULL, we can conclude that the ScalarArrayOp returns NULL
independently of what the LHS is.

> I don't see a check on the scalar node / lhs. I do see you added a
> check for the entire clause being null, but I'm not sure I understand
> when that would occur (unless it's only in the recursive case?)

Yeah, it's to handle the case where we run into a constant NULL
below a strict node.  Typically, that doesn't happen because
eval_const_expressions would have const-folded the upper node, but
it's not impossible given all the cases that clause_is_strict_for
can recurse through now.  (The coverage bot does show that code being
reached, although perhaps that only occurs for null-below-ScalarArrayOp,
in which case it might be dead if we were to make eval_const_expressions
smarter.)

            regards, tom lane


Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

От
James Coleman
Дата:
On Sat, Mar 2, 2019 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> James Coleman <jtc331@gmail.com> writes:
> > On Fri, Mar 1, 2019 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I also tweaked it to recognize the case where we can prove the
> >> array, rather than the scalar, to be null.  I'm not sure how useful
> >> that is in practice, but it seemed weird that we'd exploit that
> >> only if we can also prove the scalar to be null.
>
> > Just for my own understanding: I thought the "if
> > (arrayconst->constisnull)" took care of the array constant being null?
>
> Yeah, but only after we've already matched the subexpr to the LHS,
> otherwise we'd not be bothering to determine the array's size.
>
> On the other hand, if for some reason we know that the array side
> is NULL, we can conclude that the ScalarArrayOp returns NULL
> independently of what the LHS is.

Thanks for the detailed explanation.

> > I don't see a check on the scalar node / lhs. I do see you added a
> > check for the entire clause being null, but I'm not sure I understand
> > when that would occur (unless it's only in the recursive case?)
>
> Yeah, it's to handle the case where we run into a constant NULL
> below a strict node.  Typically, that doesn't happen because
> eval_const_expressions would have const-folded the upper node, but
> it's not impossible given all the cases that clause_is_strict_for
> can recurse through now.  (The coverage bot does show that code being
> reached, although perhaps that only occurs for null-below-ScalarArrayOp,
> in which case it might be dead if we were to make eval_const_expressions
> smarter.)

Ah, that makes sense.

James Coleman