Обсуждение: Extending constraint exclusion for implied constraints/conditions

Поиск
Список
Период
Сортировка

Extending constraint exclusion for implied constraints/conditions

От
Ashutosh Bapat
Дата:
Hi,
Here's a table I have
postgres=# \d+ tab1
                         Table "public.tab1"
 Column |  Type   | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
 val    | integer |           | plain   |              |
 val2   | integer |           | plain   |              |
Check constraints:
    "tab1_val2_check" CHECK (val2 < 30)
    "tab1_val_check" CHECK (val > 30)

and I run following query on it,
postgres=# set constraint_exclusion to on;
SET
postgres=# explain verbose select * from tab1 where val = val2;
                         QUERY PLAN                         
-------------------------------------------------------------
 Seq Scan on public.tab1  (cost=0.00..36.75 rows=11 width=8)
   Output: val, val2
   Filter: (tab1.val = tab1.val2)
 
Given the constraints and the condition in WHERE clause, it can be inferred that the query will not return any row. However, PostgreSQL scans the table as seen in the plan.

Right now, constraint exclusion code looks only at the provided conditions. If we want avoid table scan based on constraints in the above example, it will need to look at the implied conditions as well. E.g. val2 < 30 AND val = val2 => val < 30. Then the constraint exclusion can see that val < 30 AND val > 30 are contradictory and infer that the result is going to be empty. We will need to add information about the transitive inferences between operators. Can we do that in PostgreSQL? Will the benefits be worth the code, that needs to be added?

I can see some more benefits. We can use it to eliminate joins where the constraints on joining relations may cause the join to be empty e.g.
postgres=# \d+ tab2
                         Table "public.tab2"
 Column |  Type   | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
 val    | integer |           | plain   |              |
Check constraints:
    "tab2_val_check" CHECK (val < 30)

postgres=# \d+ tab1
                         Table "public.tab1"
 Column |  Type   | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
 val    | integer |           | plain   |              |
Check constraints:
    "tab1_val_check" CHECK (val > 30)

and the query with join is -  select * from tab1, tab2 where tab1.val = tab2. val OR select * from tab1, tab2 where tab1.val > tab2.val.
 
This can be extended further for the case of join between two parent tables, where we will be eliminate joins between some pairs of children. There are limitations in this case though,
1. Benefits of this optimization will be visible in case of nested loop joins. Hash and Merge join implicitly eliminate the non-joining children.
2. We need a way to push join down append path, which PostgreSQL doesn't do today.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Extending constraint exclusion for implied constraints/conditions

От
Tom Lane
Дата:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> Right now, constraint exclusion code looks only at the provided conditions.
> If we want avoid table scan based on constraints in the above example, it
> will need to look at the implied conditions as well. E.g. val2 < 30 AND val
> = val2 => val < 30. Then the constraint exclusion can see that val < 30 AND
> val > 30 are contradictory and infer that the result is going to be empty.
> We will need to add information about the transitive inferences between
> operators. Can we do that in PostgreSQL? Will the benefits be worth the
> code, that needs to be added?

I doubt it.  The extra code isn't the problem so much, it's the extra
planning cycles spent trying to make proofs that will usually fail.
What you propose will create a combinatorial explosion in the number
of proof paths to be considered.

> I can see some more benefits. We can use it to eliminate joins where the
> constraints on joining relations may cause the join to be empty e.g.

... and applying constraint exclusion to join relations would make that
problem even worse.
        regards, tom lane



Re: Extending constraint exclusion for implied constraints/conditions

От
Greg Stark
Дата:
On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I doubt it.  The extra code isn't the problem so much, it's the extra
> planning cycles spent trying to make proofs that will usually fail.
> What you propose will create a combinatorial explosion in the number
> of proof paths to be considered.

Well, not necessarily. You only need to consider constraints on
matching columns and only when there's a join column on those columns.
So you could imagine, for example, sorting all the constraints by the
eclass for the non-const side of their expression, then going through
them linearly to see where you have multiple constraints on the same
eclass.

For what it's worth I think there is a case where this is a common
optimization. When you have multiple tables partitioned the same way.
Then you ideally want to be able to turn that from an join of multiple
appends into an append of multiple joins. This is even more important
when it comes to a parallelizing executor since it lets you do the
joins in parallel.

However to get from here to there I guess you would need to turn the
join of the appends into NxM joins of every pair of subscans and then
figure out which ones to exclude. That would be pretty nuts. To do it
reasonably we probably need the partitioning infrastructure we've been
talking about where Postgres would know what the partitioning key is
and the order and range of the partitions so it can directly generate
the matching subjoins in less than n^2 time.

-- 
greg



Re: Extending constraint exclusion for implied constraints/conditions

От
Ashutosh Bapat
Дата:



On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> Right now, constraint exclusion code looks only at the provided conditions.
> If we want avoid table scan based on constraints in the above example, it
> will need to look at the implied conditions as well. E.g. val2 < 30 AND val
> = val2 => val < 30. Then the constraint exclusion can see that val < 30 AND
> val > 30 are contradictory and infer that the result is going to be empty.
> We will need to add information about the transitive inferences between
> operators. Can we do that in PostgreSQL? Will the benefits be worth the
> code, that needs to be added?

I doubt it.  The extra code isn't the problem so much, it's the extra
planning cycles spent trying to make proofs that will usually fail.
What you propose will create a combinatorial explosion in the number
of proof paths to be considered.

I can understand that it will create combinatorial explosion in the number of predicates that need to be examined by the constraint exclusion. I do not understand where come the paths gets involved here. The constraint exclusion kicks in before paths are created

 220 static void
 221 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 222              Index rti, RangeTblEntry *rte)
 223 {
 224     if (rel->reloptkind == RELOPT_BASEREL &&
 225         relation_excluded_by_constraints(root, rel, rte))
 226     {
 227         /*
 228          * We proved we don't need to scan the rel via constraint exclusion,
 229          * so set up a single dummy path for it.  Here we only check this for
 230          * regular baserels; if it's an otherrel, CE was already checked in
 231          * set_append_rel_pathlist().
 232          *
 233          * In this case, we go ahead and set up the relation's path right away
 234          * instead of leaving it for set_rel_pathlist to do.  This is because
 235          * we don't have a convention for marking a rel as dummy except by
 236          * assigning a dummy path to it.
 237          */
 238         set_dummy_rel_pathlist(rel);
 239     }

 and does not create paths for relations excluded by constraints

 295 /*     
 296  * set_rel_pathlist
 297  *    Build access paths for a base relation
 298  */
 299 static void
 300 set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 301                  Index rti, RangeTblEntry *rte)
 302 {
 303     if (IS_DUMMY_REL(rel))
 304     {
 305         /* We already proved the relation empty, so nothing more to do */
 306     }

Same in the case of join in mark_join_rel()

 663      * JOIN_INNER.
 664      */
 665     switch (sjinfo->jointype)
 666     {
 667         case JOIN_INNER:
 668             if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
 669                 restriction_is_constant_false(restrictlist, false))
 670             {
 671                 mark_dummy_rel(joinrel);
 672                 break;
 673             }
 674             add_paths_to_joinrel(root, joinrel, rel1, rel2,
 675                                  JOIN_INNER, sjinfo,
 676                                  restrictlist);
 677             add_paths_to_joinrel(root, joinrel, rel2, rel1,
 678                                  JOIN_INNER, sjinfo,
 679                                  restrictlist);
 680             break;

BTW, on a side note, I noticed, we use mark_dummy_rel() for joinrels and set_dummy_rel_pathlist() for baserel. Shouldn't we be using the same function everywhere for the same functionality (e.g. adding an append path with no child-paths).

For larger relations, the time spent in constraint exclusion might be lesser than the time taken by actual table/index scan and that to when such a scan is not going to produce any rows. So, we might want to apply the technique only when the estimated number of rows/pages are above a certain threshold and may be when the GUC is ON (like we do it today).


> I can see some more benefits. We can use it to eliminate joins where the
> constraints on joining relations may cause the join to be empty e.g.

... and applying constraint exclusion to join relations would make that
problem even worse.


The same case goes with joins, where potentially, the crossproduct of two tables is huge.
 
                        regards, tom lane



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Extending constraint exclusion for implied constraints/conditions

От
Ashutosh Bapat
Дата:



On Mon, Jul 7, 2014 at 11:46 PM, Greg Stark <stark@mit.edu> wrote:
On Mon, Jul 7, 2014 at 3:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I doubt it.  The extra code isn't the problem so much, it's the extra
> planning cycles spent trying to make proofs that will usually fail.
> What you propose will create a combinatorial explosion in the number
> of proof paths to be considered.

Well, not necessarily. You only need to consider constraints on
matching columns and only when there's a join column on those columns.
So you could imagine, for example, sorting all the constraints by the
eclass for the non-const side of their expression, then going through
them linearly to see where you have multiple constraints on the same
eclass.

For what it's worth I think there is a case where this is a common
optimization. When you have multiple tables partitioned the same way.
Then you ideally want to be able to turn that from an join of multiple
appends into an append of multiple joins. This is even more important
when it comes to a parallelizing executor since it lets you do the
joins in parallel.

Ah, right. Also, if the foreign tables come under the inheritance hierarchy,  and we want push joins to foreign servers.

However to get from here to there I guess you would need to turn the
join of the appends into NxM joins of every pair of subscans and then
figure out which ones to exclude. That would be pretty nuts. To do it
reasonably we probably need the partitioning infrastructure we've been
talking about where Postgres would know what the partitioning key is
and the order and range of the partitions so it can directly generate
the matching subjoins in less than n^2 time.

--
greg



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Extending constraint exclusion for implied constraints/conditions

От
Tom Lane
Дата:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I doubt it.  The extra code isn't the problem so much, it's the extra
>> planning cycles spent trying to make proofs that will usually fail.
>> What you propose will create a combinatorial explosion in the number
>> of proof paths to be considered.

> I can understand that it will create combinatorial explosion in the number
> of predicates that need to be examined by the constraint exclusion. I do
> not understand where come the paths gets involved here. The constraint
> exclusion kicks in before paths are created

Perhaps I should not have used the term "path" here, because I wasn't
referring to what the planner calls Paths.  I just meant that there will
be many more ways to perhaps prove a constraint-exclusion result, and the
planner will have to go through them all.  (Usually to no avail, because
how often do people write queries that are guaranteed to produce no rows?)

An example of what I meant by combinatorial explosion is that if a clause
mentions K variables, and each of those variables has been equated to M
other variables, there are (M+1)^K possible derived clauses, and it looks
to me like we'd have to consider them all to catch the sort of situation
you're talking about.

I think this is going to require a *lot* of additional planner cycles,
and TBH you've not provided any very compelling example of why it's
worthwhile.  Yeah, if you can exclude a large partition it's a win,
but how often is that going to happen in real-world queries?  The one
example you gave seemed pretty artificial to me, ie unrelated to typical
partitioning constraints.
        regards, tom lane



Re: Extending constraint exclusion for implied constraints/conditions

От
Ashutosh Bapat
Дата:
On Tue, Jul 8, 2014 at 7:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I doubt it.  The extra code isn't the problem so much, it's the extra
>> planning cycles spent trying to make proofs that will usually fail.
>> What you propose will create a combinatorial explosion in the number
>> of proof paths to be considered.

> I can understand that it will create combinatorial explosion in the number
> of predicates that need to be examined by the constraint exclusion. I do
> not understand where come the paths gets involved here. The constraint
> exclusion kicks in before paths are created

Perhaps I should not have used the term "path" here, because I wasn't
referring to what the planner calls Paths.  I just meant that there will
be many more ways to perhaps prove a constraint-exclusion result, and the
planner will have to go through them all.
(Usually to no avail, because
how often do people write queries that are guaranteed to produce no rows?)

An example of what I meant by combinatorial explosion is that if a clause
mentions K variables, and each of those variables has been equated to M
other variables, there are (M+1)^K possible derived clauses, and it looks
to me like we'd have to consider them all to catch the sort of situation
you're talking about.

I agree.
 

I think this is going to require a *lot* of additional planner cycles,
and TBH you've not provided any very compelling example of why it's
worthwhile.  Yeah, if you can exclude a large partition it's a win,
but how often is that going to happen in real-world queries?  The one
example you gave seemed pretty artificial to me, ie unrelated to typical
partitioning constraints.


Yes, the example is a "cooked" one to show the problem. But the case can be encountered easily in partitioned tables. A partitioned table (having constraints on each partition) with few table level constraints, can undergo queries with some clauses in WHERE clause which yield empty results for one or more partitions. Saving some table scans would be worth the time spent in bringing up (M+1)^K derived clauses and examining those. Greg's example about parallel joins or joins between partitioned foreign tables would yield much better results, if we have this optimization. But, I think, I will wait till parallel joins or partitioned foreign tables is a reality in PostgreSQL.
 
                        regards, tom lane



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company