Обсуждение: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

Поиск
Список
Период
Сортировка

Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Kimura
Дата:
Hi Hackers,

Is it fair to assume that, given the same data, a partitioned table should
return the same results as a non-partitioned table?  If that's true, then I
think I may have stumbled across a case of wrong results on boolean partitioned
tables.

In following example, I think we incorrectly skip the default partition scan:

CREATE TABLE boolpart (a bool) PARTITION BY LIST (a);
CREATE TABLE boolpart_default PARTITION OF boolpart default;
CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true');
CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false');
INSERT INTO boolpart VALUES (true), (false), (null);

EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true;
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on boolpart_f boolpart  (cost=0.00..38.10 rows=1405 width=1)
   Filter: (a IS NOT TRUE)
(2 rows)

SELECT * FROM boolpart WHERE a IS NOT true;
 a
---
 f
(1 row)

Compare that to the result of a non-partitioned table:

CREATE TABLE booltab (a bool);
INSERT INTO booltab VALUES (true), (false), (null);

EXPLAIN SELECT * FROM booltab WHERE a IS NOT true;
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on booltab  (cost=0.00..38.10 rows=1405 width=1)
   Filter: (a IS NOT TRUE)
(2 rows)

SELECT * FROM booltab WHERE a IS NOT true;
 a
---
 f

(2 rows)

I think the issue has to do with assumptions made about boolean test IS NOT
inequality logic which is different from inequality of other operators.
Specifically, "true IS NOT NULL" is not the same as "true<>NULL".

In partition pruning, match_boolean_partition_clause() tries to match partkey
with clause and outputs PARTCLAUSE_MATCH_CLAUSE and an outconst TRUE for
(IS_TRUE or IS_NOT_FALSE) and inversely FALSE for (IS_FALSE or IS_NOT_TRUE).
However, I don't think this gradularity is sufficient for "IS NOT" logic when a
NULL value partition is present.

One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
presumably being a more expensive operation, not equal is not part of the btree
opfamily for bool_ops. So, seems like that won't really fit into the current
partition pruning framework.

Then I realized that the issue is just about adding the default or null
partition in these very particular scenarios. And struct PartitionBoundInfoData
already holds that information. So if we can identify these scenarios and pass
that information into get_matching_partitions() then we can add the necessary
partitions. Attached is a very rough sketch of that idea.

Thoughts? Does this seem like a legit issue? And if so, do either of the
proposed solutions seem reasonable?

Thanks,
David

Вложения

Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Rowley
Дата:
On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote:
> Is it fair to assume that, given the same data, a partitioned table should
> return the same results as a non-partitioned table?

Yes, and also the same as when enable_partition_pruning is set to off.

> CREATE TABLE boolpart (a bool) PARTITION BY LIST (a);
> CREATE TABLE boolpart_default PARTITION OF boolpart default;
> CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true');
> CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false');
> INSERT INTO boolpart VALUES (true), (false), (null);
>
> EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Seq Scan on boolpart_f boolpart  (cost=0.00..38.10 rows=1405 width=1)
>    Filter: (a IS NOT TRUE)
> (2 rows)
>
> SELECT * FROM boolpart WHERE a IS NOT true;
>  a
> ---
>  f
> (1 row)
>
> Compare that to the result of a non-partitioned table:
>
> CREATE TABLE booltab (a bool);
> INSERT INTO booltab VALUES (true), (false), (null);
>
> EXPLAIN SELECT * FROM booltab WHERE a IS NOT true;
>                         QUERY PLAN
> -----------------------------------------------------------
>  Seq Scan on booltab  (cost=0.00..38.10 rows=1405 width=1)
>    Filter: (a IS NOT TRUE)
> (2 rows)
>
> SELECT * FROM booltab WHERE a IS NOT true;
>  a
> ---
>  f

Ouch.  That's certainly not correct.

> I think the issue has to do with assumptions made about boolean test IS NOT
> inequality logic which is different from inequality of other operators.
> Specifically, "true IS NOT NULL" is not the same as "true<>NULL".

Yeah, that's wrong.

> One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
> BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
> presumably being a more expensive operation, not equal is not part of the btree
> opfamily for bool_ops. So, seems like that won't really fit into the current
> partition pruning framework.

There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.

Effectively, I think that's the attached patch.

There seems to be a bunch of tests checking this already, all of them
assuming the incorrect plans.

David

Вложения

Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Kimura
Дата:
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote:
> > Is it fair to assume that, given the same data, a partitioned table should
> > return the same results as a non-partitioned table?
>
> Yes, and also the same as when enable_partition_pruning is set to off.

Thanks for making me aware of that GUC.

> > One idea is to use the negation operator for IS_NOT_(true|false) (i.e.
> > BooleanNotEqualOperator instead of BooleanEqualOperator). But besides
> > presumably being a more expensive operation, not equal is not part of the btree
> > opfamily for bool_ops. So, seems like that won't really fit into the current
> > partition pruning framework.
>
> There's already code to effectively handle <> operators. Just the
> PartClauseInfo.op_is_ne needs to be set to true.
> get_matching_list_bounds() then handles that by taking the inverse of
> the partitions matching the equality operator.

Ah, I missed that when I first tried to implement that approach.  Indeed, this
seems cleaner. Also, the domain space for boolean partitions is very small, so
any added cost for searching not equal seems negligible.

Thanks,
David



Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
Richard Guo
Дата:

On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.

Effectively, I think that's the attached patch.

I think there is a thinko here.

+           switch (btest->booltesttype)
+           {
+               case IS_NOT_TRUE:
+                   *noteq = true;
+                   /* fall through */
+               case IS_TRUE:
+                   *outconst = (Expr *) makeBoolConst(true, false);
+                   break;
+               case IS_NOT_FALSE:
+                   *noteq = true;
+                   /* fall through */
+               case IS_FALSE:
+                   *outconst = (Expr *) makeBoolConst(false, false);
+                   break;
+               default:
+                   Assert(false); /* hmm? */
+                   return PARTCLAUSE_UNSUPPORTED;
+           }

The *outconst should be set to true in case IS_NOT_FALSE and set to
false in case IS_NOT_TRUE, something like:

            switch (btest->booltesttype)
            {
-               case IS_NOT_TRUE:
+               case IS_NOT_FALSE:
                    *noteq = true;
                    /* fall through */
                case IS_TRUE:
                    *outconst = (Expr *) makeBoolConst(true, false);
                    break;
-               case IS_NOT_FALSE:
+               case IS_NOT_TRUE:
                    *noteq = true;
                    /* fall through */
                case IS_FALSE:

Thanks
Richard

Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
Richard Guo
Дата:

On Thu, Apr 13, 2023 at 10:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.

Effectively, I think that's the attached patch.

I think there is a thinko here.

Sorry.  It's my thinko.  In cases IS_NOT_TRUE and IS_NOT_FALSE the
op_is_ne is set to true.  So the logic in origin patch is right.

BTW, I wonder if we should elog an Error here.

                default:
-                   Assert(false); /* hmm? */
-                   return PARTCLAUSE_UNSUPPORTED;
+                   elog(ERROR, "unrecognized booltesttype: %d",
+                        (int) btest->booltesttype);
+                   break;

Otherwise the patch looks good to me.

Thanks
Richard

Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Rowley
Дата:
On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote:
> BTW, I wonder if we should elog an Error here.
>
>                 default:
> -                   Assert(false); /* hmm? */
> -                   return PARTCLAUSE_UNSUPPORTED;
> +                   elog(ERROR, "unrecognized booltesttype: %d",
> +                        (int) btest->booltesttype);
> +                   break;

I wondered about that, hence my not-so-commitable comment left in there.

My last thoughts were that maybe we should just move the IS_UNKNOWN
and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if
something is missing.

It hardly seems worth keeping the slightly earlier exit for those two
cases. That just amounts to the RelabelType check and is this the
partition key.  I doubt IS[_NOT]_UNKNOWN is common enough for us to
warrant contorting the code to make it a few dozen nanoseconds faster.
Having smaller code is probably more of a win, which we'd get if we
didn't add the ERROR you propose.

> Otherwise the patch looks good to me.

Thanks for having a look.

David



Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Kimura
Дата:
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> There seems to be a bunch of tests checking this already, all of them
> assuming the incorrect plans.

Given that the plan alone wasn't sufficient to catch this error previously,
would it be worthwhile to add some data to the tests to make it abundantly
obvious?

I had noticed that the default partition seems to be an edge case in the code.
Perhaps it's overkill, but would it be worth adding a test where the NULL
partition is not the default?

Thanks,
David



Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition

От
David Rowley
Дата:
On Thu, 13 Apr 2023 at 15:45, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote:
> > BTW, I wonder if we should elog an Error here.
> >
> >                 default:
> > -                   Assert(false); /* hmm? */
> > -                   return PARTCLAUSE_UNSUPPORTED;
> > +                   elog(ERROR, "unrecognized booltesttype: %d",
> > +                        (int) btest->booltesttype);
> > +                   break;
>
> I wondered about that, hence my not-so-commitable comment left in there.
>
> My last thoughts were that maybe we should just move the IS_UNKNOWN
> and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if
> something is missing.
>
> It hardly seems worth keeping the slightly earlier exit for those two
> cases. That just amounts to the RelabelType check and is this the
> partition key.  I doubt IS[_NOT]_UNKNOWN is common enough for us to
> warrant contorting the code to make it a few dozen nanoseconds faster.
> Having smaller code is probably more of a win, which we'd get if we
> didn't add the ERROR you propose.

After having looked at the code in more detail, I don't think it's a
good idea to move the IS_UNKNOWN and IS_NOT_UNKNOWN down into the
switch.  Having them tested early means we can return
PARTCLAUSE_UNSUPPORTED even when the clause does not match the current
partition key.  If we moved those into the switch statement, then if
the qual didn't match to the partition key, then we'd return
PARTCLAUSE_NOMATCH and we'd maybe waste further effort later trying to
match the same qual to some other partition key.

All I ended up doing was removing the Assert().  I don't really see
the need to add an ERROR.  It's not like any other value would cause
the code to misbehave.  We'll just return PARTCLAUSE_UNSUPPORTED and
no pruning would get done for that qual. I also struggle to imagine
what possible other values we could ever add to BoolTestType.

After looking a bit deeper and testing a bit more, I found another bug
in match_boolean_partition_clause() around the
equal(negate_clause((Node *) leftop), partkey). The code there just
always set *outconst to a false Const regardless of is_not_clause. I
see the code coverage tool shows that line as untested, so I fixed the
bug and wrote some tests to exercise the code.

As David Kimura suggested, I also added some data to the tables in
question and repeated the same queries again without the EXPLAIN.  I
generated the expected output with enable_partition_pruning = off then
put it back on again and saw that the same results are shown.  I
considered writing a plpgsql function that we can pass a table name
and a query and it goes and makes a temp table, populates it with the
query with enable_partition_pruning = off then tries again with
pruning on and verifies the results are the same as what's stored in
the temp table. I'll maybe go and do that for master only, it's just a
bit more than what I wanted to do in the back branches.

I've pushed the fix now.

Thanks for the report about this, David, and thank you both for the reviews.

David


David