Обсуждение: Allow use of stable functions with constraint exclusion

Поиск
Список
Период
Сортировка

Allow use of stable functions with constraint exclusion

От
"Marshall, Steve"
Дата:
I have developed a small patch to optimizer/util/plancat.c that
eliminates one of hte caveats associated with constraint exclusions,
namely the inability to avoid searching tables based on the results of
stable functions.   The new code expands non-volatile functions into
constant values so they can be used in the plan to determine which
tables to search.  If volatile functions are used in the query
constraints, they are correctly ignored in the planning process.

This is particularly useful for tables partitioned on a time column
where queries relative to the current system time are desirable.

I'll attach the patch file and a small SQL test case that exercises the
code.  The patch is relative to the 8.2.4 release; the SQL uses
backslash commands, so its really only appropriate for use with the psql
utility.  However, it does create a test database called test_ce_db and
run all the tests with a single command, e.g. psql -f test_ce_patch.sql

I'd appreciate any feedback anyone has, particularly on any corner cases
I may have missed.

Thanks,
Steve Marshall
*** plancat.c.orig    2007-04-30 12:17:55.785145396 -0400
--- plancat.c    2007-04-30 14:15:48.093639183 -0400
***************
*** 473,478 ****
--- 473,479 ----
      List       *constraint_pred;
      List       *safe_constraints;
      ListCell   *lc;
+     bool        found_unsafe_restrictions;

      /* Skip the test if constraint exclusion is disabled */
      if (!constraint_exclusion)
***************
*** 486,491 ****
--- 487,493 ----
       * Note: strip off RestrictInfo because predicate_refuted_by() isn't
       * expecting to see any in its predicate argument.
       */
+     found_unsafe_restrictions = false;
      safe_restrictions = NIL;
      foreach(lc, rel->baserestrictinfo)
      {
***************
*** 493,498 ****
--- 495,502 ----

          if (!contain_mutable_functions((Node *) rinfo->clause))
              safe_restrictions = lappend(safe_restrictions, rinfo->clause);
+         else
+             found_unsafe_restrictions = true;
      }

      if (predicate_refuted_by(safe_restrictions, safe_restrictions))
***************
*** 522,527 ****
--- 526,559 ----
      }

      /*
+      *  Check the restrictions against the relation constraints.
+      *  If we found mutable functions in the restrictions, try to simplify
+      *  them prior to checking.  Effectively, this folds stable and immutable
+      *  functions into constant values.
+      */
+     if (found_unsafe_restrictions)
+     {
+         List * simp_restrictions = NIL;
+         foreach(lc, rel->baserestrictinfo)
+         {
+             RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+             Node * baseNode = (Node *) rinfo->clause;
+
+             if (IsA(baseNode, FuncExpr) || IsA(baseNode, OpExpr))
+             {
+                 Node * simpNode = estimate_expression_value(baseNode);
+                 simp_restrictions = lappend(simp_restrictions, simpNode);
+             }
+             else
+             {
+                 simp_restrictions = lappend(simp_restrictions, baseNode);
+             }
+         }
+
+         if (predicate_refuted_by(safe_constraints, simp_restrictions))
+             return true;
+     }
+     /*
       * The constraints are effectively ANDed together, so we can just try to
       * refute the entire collection at once.  This may allow us to make proofs
       * that would fail if we took them individually.
***************
*** 531,538 ****
       * have volatile and nonvolatile subclauses, and it's OK to make
       * deductions with the nonvolatile parts.
       */
!     if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo))
          return true;

      return false;
  }
--- 563,572 ----
       * have volatile and nonvolatile subclauses, and it's OK to make
       * deductions with the nonvolatile parts.
       */
!     else if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo))
!     {
          return true;
+     }

      return false;
  }
--
--  Create and connect to the test database
--
CREATE DATABASE test_ce_db;
ALTER DATABASE test_ce_db OWNER TO postgres;

\connect test_ce_db;

--
--  Make a parent table and three child tables partitioned by time using created_at column.
--
CREATE TABLE test_bulletins (
    created_at timestamp with time zone NOT NULL,
    data text NOT NULL
);

CREATE TABLE test_bulletins_20060908 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-08
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-09 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060909 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-09
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-10 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060910 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-10
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-11 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

--
--  Setup environment for queries.
--
SET constraint_exclusion = on;
\pset footer off;

--
--  Do test case queries.
--
SELECT 'This query should avoid use of table test_bulletins_20060908, even with OLD CE code' as "Test case 1";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz;

SELECT 'This query should avoid use of table test_bulletins_20060908, ONLY with NEW CE code' as "Test case 2";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz + '0 days'::interval;

SELECT 'This query uses current timestamp; it should avoid use of table test_bulletins_20060908, ONLY with NEW CE code'
as"Test case 3"; 
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > now() - (now() - '2006-09-09 05:00:00+00'::timestamptz);

SELECT 'This query uses a random value; it should NOT avoid use of table test_bulletins_20060908' as "Test case 4";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz - random() * '1
hour'::interval;  


Re: Allow use of stable functions with constraint exclusion

От
Tom Lane
Дата:
"Marshall, Steve" <smarshall@wsi.com> writes:
> I have developed a small patch to optimizer/util/plancat.c that 
> eliminates one of hte caveats associated with constraint exclusions, 
> namely the inability to avoid searching tables based on the results of 
> stable functions.

Do you not understand why this is completely unsafe?
        regards, tom lane


Re: Allow use of stable functions with constraint exclusion

От
ITAGAKI Takahiro
Дата:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Marshall, Steve" <smarshall@wsi.com> writes:
> > I have developed a small patch to optimizer/util/plancat.c that 
> > eliminates one of hte caveats associated with constraint exclusions, 
> > namely the inability to avoid searching tables based on the results of 
> > stable functions.
> 
> Do you not understand why this is completely unsafe?

I think the proposal itself is very useful, because time-based
partitioning is commonly used and functions like now() or
CURRENT_TIMESTAMP are marked as stable.

I'm not clear why the optimization is unsafe. I'm confused to read the
definition of stable functions in our documentation. Which is required
for stable functions 'stable in a single table scan' or 'stable in a
SQL statements' ? If the latter definition is true, can we use them
in constraint exclusions?

| STABLE indicates that the function cannot modify the database, and
| that within a single table scan it will consistently return the same
| result for the same argument values, but that its result could change
| across SQL statements.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Allow use of immutable functions operating on constants with constraint exclusion

От
"Marshall, Steve"
Дата:
ITAGAKI Takahiro wrote:

>Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>
>
>>"Marshall, Steve" <smarshall@wsi.com> writes:
>>
>>
>>>I have developed a small patch to optimizer/util/plancat.c that
>>>eliminates one of hte caveats associated with constraint exclusions,
>>>namely the inability to avoid searching tables based on the results of
>>>stable functions.
>>>
>>>
>>Do you not understand why this is completely unsafe?
>>
>>
>
>I think the proposal itself is very useful, because time-based
>partitioning is commonly used and functions like now() or
>CURRENT_TIMESTAMP are marked as stable.
>
>I'm not clear why the optimization is unsafe. I'm confused to read the
>definition of stable functions in our documentation. Which is required
>for stable functions 'stable in a single table scan' or 'stable in a
>SQL statements' ? If the latter definition is true, can we use them
>in constraint exclusions?
>
>| STABLE indicates that the function cannot modify the database, and
>| that within a single table scan it will consistently return the same
>| result for the same argument values, but that its result could change
>| across SQL statements
>
>
The lack of safety comes from prepared statements.  If the above
optimization was used, the value of the stable function would be used
when the statement was prepared, and the query plan would then be set
using the stable function value as though it were a constant.  For
partitioned tables, this could result in a failure to scan tables needed
to meet the query constraints.

I think the optimization could work if you could exclude prepared
statements.  However, I looked at the planning code and found no clear
way to distinguish between a statement being prepared for later
execution, and a statement being planned only for immediate execution.
As a result, I don't think stable functions can (or should) be expanded
to help optimize queries using constraint exclusion.

However, I think it would be possible to expand immutable functions
operating on constants to optimize constraint exclusion.  Immutable
functions will always return the same result given the same inputs, so
this would be safe.  Currently, immutable functions are not expanded
during planning such that the first query would be optimized using
constraint exclusion, while the second query would not:

SELECT * FROM test_bulletins WHERE created_at > '2006-09-09
05:00:00+00'::timestamptz;
SELECT * FROM test_bulletins WHERE created_at > '2006-09-09
05:00:00+00'::timestamptz + '0 days'::interval;

See the attached SQL file for table creation and other SQL examples.

The real question here is if the optimization is worth the effort.
Personally, I commonly use queries of this sort, and so would be in
favor of this expansion of immutable functions operating on constrats as
an optimization.  I find it convenient to use the database to do the
time manipulation (e.g. adding intervals to timestamps).  However, the
logic to manipulate times can be pushed into application code if need
be.   I've found I have to do a lot of explaining to developers as to
why two queries that look so similar perform very differently.



--
--  Create and connect to the test database
--
CREATE DATABASE test_ce_db;
ALTER DATABASE test_ce_db OWNER TO postgres;

\connect test_ce_db;

--
--  Make a parent table and three child tables partitioned by time using created_at column.
--
CREATE TABLE test_bulletins (
    created_at timestamp with time zone NOT NULL,
    data text NOT NULL
);

CREATE TABLE test_bulletins_20060908 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-08
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-09 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060909 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-09
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-10 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

CREATE TABLE test_bulletins_20060910 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-10
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-11 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

--
--  Setup environment for queries.
--
SET constraint_exclusion = on;
\pset footer off;

--
--  Do test case queries.
--
SELECT 'This query should avoid use of table test_bulletins_20060908, even with OLD CE code' as "Test case 1";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz;

SELECT 'This query should avoid use of table test_bulletins_20060908, ONLY with NEW CE code' as "Test case 2";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz + '0 days'::interval;

SELECT 'This query uses a stable function; it should NOT be optimized' as "Test case 3";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > now() - (now() - '2006-09-09 05:00:00+00'::timestamptz);

SELECT 'This query uses a random value; it should NOT avoid use of table test_bulletins_20060908' as "Test case 4";
EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz - random() * '1
hour'::interval;  

--
SELECT 'Create a new tables for 20060907 and 20060911 reexecute test case 2' as "Test case 5";
CREATE TABLE test_bulletins_20060907 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-07
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-08 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);
CREATE TABLE test_bulletins_20060911 (CONSTRAINT time_partition_limits CHECK (((created_at >= '2006-09-11
00:00:00+00'::timestampwith time zone) AND (created_at < '2006-09-12 00:00:00+00'::timestamp with time zone))) 
)
INHERITS (test_bulletins);

EXPLAIN SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 05:00:00+00'::timestamptz + '0 days'::interval;

Re: Allow use of immutable functions operating on constants with constraint exclusion

От
Martijn van Oosterhout
Дата:
On Tue, May 08, 2007 at 08:08:28AM -0400, Marshall, Steve wrote:
> However, I think it would be possible to expand immutable functions
> operating on constants to optimize constraint exclusion.  Immutable
> functions will always return the same result given the same inputs, so
> this would be safe.  Currently, immutable functions are not expanded
> during planning such that the first query would be optimized using
> constraint exclusion, while the second query would not:

The problem with this is that at planning time you don't necessarily
have an active transaction snapshot. Prepared statements are the
obvious example, but I think even in one-off statements there's no
snapshot until after the planner has completed. This is also one of the
problems with type input/output functions looking up stuff in tables.

There was discussion about the handling type input/output functins and
casts as a sort off InitExpr that is executed once, then inserted into
the tree. However, that would still be too late to affect the planning.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Allow use of immutable functions operating on constants with constraint exclusion

От
Tom Lane
Дата:
"Marshall, Steve" <smarshall@wsi.com> writes:
> Currently, immutable functions are not expanded 
> during planning

Incorrect, see eval_const_expressions().
        regards, tom lane


Re: Allow use of immutable functions operating on constants with constraint exclusion

От
ITAGAKI Takahiro
Дата:
"Marshall, Steve" <smarshall@wsi.com> wrote:

> the first query would be optimized using 
> constraint exclusion, while the second query would not:
> 
> SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 
> 05:00:00+00'::timestamptz;
> SELECT * FROM test_bulletins WHERE created_at > '2006-09-09 
> 05:00:00+00'::timestamptz + '0 days'::interval;

Hmmm... CE seems to be still not enough to optimize complex expressions.
If I added the wrapper function, it worked.

CREATE FUNCTION timeadd(timestamptz, interval) RETURNS timestamptz   AS $$ SELECT $1 + $2; $$ LANGUAGE sql IMMUTABLE;

SELECT * FROM test_bulletins WHERE created_at >   timeadd('2006-09-09 05:00:00+00', '0 days');


I noticed that we should be careful about CE with prepared statements
and functions. Seamless partitioning requires more works.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Allow use of immutable functions operating onconstants with constraint exclusion

От
"Simon Riggs"
Дата:
On Tue, 2007-05-08 at 08:08 -0400, Marshall, Steve wrote:

> I think the optimization could work if you could exclude prepared 
> statements.  However, I looked at the planning code and found no clear
> way to distinguish between a statement being prepared for later 
> execution, and a statement being planned only for immediate execution.
> As a result, I don't think stable functions can (or should) be
> expanded to help optimize queries using constraint exclusion. 

I think it should be possible to tell the difference between planned and
immediate-execution queries. Anything coming thru exec_simple_query can
be tagged as immediate-execution, so should be allowed to make the
evaulation of stable functions in the planner as being accurate for the
rest of the statement execution also.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com