Обсуждение: Partitions not Working as Expected

Поиск
Список
Период
Сортировка

Partitions not Working as Expected

От
Shaun Thomas
Дата:
Hey guys,

I suspect I'll get an answer equivalent to "the planner treats that like
a variable," but I really hope not because it renders partitions
essentially useless to us. This is as recent as 9.1.9 and constraint
exclusion is enabled.

What I have is this test case:

CREATE TABLE part_test (
   fake INT,
   part_col TIMESTAMP WITHOUT TIME ZONE
);

CREATE TABLE part_test_1 (
   CHECK (part_col >= '2013-05-01' AND
          part_col < '2013-06-01')
) INHERITS (part_test);

CREATE TABLE part_test_2 (
   CHECK (part_col >= '2013-04-01' AND
          part_col < '2013-05-01')
) INHERITS (part_test);

And this query performs a sequence scan across all partitions:

EXPLAIN ANALYZE
SELECT * FROM part_test
  WHERE part_col > CURRENT_DATE;

The CURRENT_DATE value is clearly more recent than any of the
partitions, yet it checks them anyway. The only way to get it to
properly constrain partitions is to use a static value:

EXPLAIN ANALYZE
SELECT * FROM part_test
  WHERE part_col > '2013-06-27';

But developers never do this. Nor should they. I feel like an idiot even
asking this, because it seems so wrong, and I can't seem to come up with
a workaround other than, "Ok devs, hard code dates into all of your
queries from now on."

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
Igor Neyman
Дата:
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org [mailto:pgsql-
> performance-owner@postgresql.org] On Behalf Of Shaun Thomas
> Sent: Thursday, June 27, 2013 12:16 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Partitions not Working as Expected
>
> Hey guys,
>
> I suspect I'll get an answer equivalent to "the planner treats that like a
> variable," but I really hope not because it renders partitions essentially
> useless to us. This is as recent as 9.1.9 and constraint exclusion is enabled.
>
> What I have is this test case:
>
> CREATE TABLE part_test (
>    fake INT,
>    part_col TIMESTAMP WITHOUT TIME ZONE
> );
>
> CREATE TABLE part_test_1 (
>    CHECK (part_col >= '2013-05-01' AND
>           part_col < '2013-06-01')
> ) INHERITS (part_test);
>
> CREATE TABLE part_test_2 (
>    CHECK (part_col >= '2013-04-01' AND
>           part_col < '2013-05-01')
> ) INHERITS (part_test);
>
> And this query performs a sequence scan across all partitions:
>
> EXPLAIN ANALYZE
> SELECT * FROM part_test
>   WHERE part_col > CURRENT_DATE;
>
> The CURRENT_DATE value is clearly more recent than any of the partitions,
> yet it checks them anyway. The only way to get it to properly constrain
> partitions is to use a static value:
>
> EXPLAIN ANALYZE
> SELECT * FROM part_test
>   WHERE part_col > '2013-06-27';
>
> But developers never do this. Nor should they. I feel like an idiot even asking
> this, because it seems so wrong, and I can't seem to come up with a
> workaround other than, "Ok devs, hard code dates into all of your queries
> from now on."
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-676-8870
> sthomas@optionshouse.com
>

Doesn't have to be hardcoded.
If executed as dynamic sql, it will be re-planned properly, e.g.:

lQueryString := 'SELECT MAX(cycle_date_time) AS MaxDT
                     FROM gp_cycle_' || partition_extension::varchar ||
                     ' WHERE cell_id = ' || i_n_Cell_id::varchar ||
                     ' AND part_type_id = ' || i_n_PartType_id::varchar ||
                     ' AND cycle_date_time <= TIMESTAMP ' || quote_literal(cast(i_t_EndDate AS VARCHAR));
       IF (lQueryString IS NOT NULL) THEN
           EXECUTE lQueryString INTO lEndDate;


Regards,
Igor Neyman


Re: Partitions not Working as Expected

От
Shaun Thomas
Дата:
On 06/27/2013 12:08 PM, Igor Neyman wrote:

> Doesn't have to be hardcoded.
> If executed as dynamic sql, it will be re-planned properly, e.g.:

Well yeah. That's not really the point, though. Aside from existing
code, hard-coding is generally frowned upon. Our devs have been using
CURRENT_DATE and its ilk for over six years now.

So now I get to tell our devs to refactor six years of JAVA code and
find any place they use CURRENT_DATE, and replace it with an ORM
variable for the current date instead.

At this point I wonder why CURRENT_DATE even exists, if using it is
apparently detrimental to query execution.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
bricklen
Дата:
On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

Well yeah. That's not really the point, though. Aside from existing code, hard-coding is generally frowned upon. Our devs have been using CURRENT_DATE and its ilk for over six years now.

Would it help to put the current_date call in a wrapper function and coerce it as IMMUTABLE? A quick test shows that constraint exclusion seems to kick in, but I can't speak intelligently about whether that is wise or not.

Re: Partitions not Working as Expected

От
Dave Johansen
Дата:
On Thu, Jun 27, 2013 at 10:34 AM, bricklen <bricklen@gmail.com> wrote:
On Thu, Jun 27, 2013 at 10:17 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

Well yeah. That's not really the point, though. Aside from existing code, hard-coding is generally frowned upon. Our devs have been using CURRENT_DATE and its ilk for over six years now.

Would it help to put the current_date call in a wrapper function and coerce it as IMMUTABLE? A quick test shows that constraint exclusion seems to kick in, but I can't speak intelligently about whether that is wise or not.


Or what about something like DATE_TRUNC("DAY", now())? Or would that run into the same optimization/planner problems as CURRENT_DATE?

Re: Partitions not Working as Expected

От
Shaun Thomas
Дата:
On 06/27/2013 12:42 PM, Dave Johansen wrote:

> Or what about something like DATE_TRUNC("DAY", now())? Or would that run
> into the same optimization/planner problems as CURRENT_DATE?

Same issue. This seems to work, though I'm not entirely sure of the
implications:

UPDATE pg_proc
    SET provolatile = 'i'
  WHERE proname = 'date_in';

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
Tom Lane
Дата:
Shaun Thomas <sthomas@optionshouse.com> writes:
> On 06/27/2013 12:42 PM, Dave Johansen wrote:
>> Or what about something like DATE_TRUNC("DAY", now())? Or would that run
>> into the same optimization/planner problems as CURRENT_DATE?

> Same issue. This seems to work, though I'm not entirely sure of the
> implications:

> UPDATE pg_proc
>     SET provolatile = 'i'
>   WHERE proname = 'date_in';

That will break things: CURRENT_DATE will then be equivalent to just
writing today's date as a literal.

It's conceivable that it wouldn't break any scenario that you personally
care about, if you never use CURRENT_DATE in any view, rule, column
default expression, or cached plan; but it seems mighty risky from here.


I don't see any very good solution to your problem within the current
approach to partitioning, which is basically theorem-proving.  That
proof engine has no concept of time passing, let alone the sort of
detailed knowledge of the semantics of this particular function that
would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now,
it will always be so in the future as well".

I think most hackers agree that the way forward on partitioning involves
building hard-wired logic that selects the correct partition(s) at
run-time, so that it wouldn't particularly matter where we got the
comparison value from or whether it was a constant.  So I'm not feeling
motivated to try to hack some solution for this case into the theorem
prover.

Unfortunately, it's likely to be awhile before that next-generation
partitioning code shows up.  But major extensions to the proof engine
wouldn't be a weekend project, either...

            regards, tom lane


Re: Partitions not Working as Expected

От
"Albin, Lloyd P"
Дата:
We have also run into this with our production databases. We worked around the issue by adding an index to each child
tableso that it scans all the child index's instead of the child table's. For us this made a large performance
improvement.

CREATE INDEX part_test_1_idx ON part_test_1
  USING btree (part_col);

CREATE INDEX part_test_2_idx ON part_test_2
  USING btree (part_col);

Lloyd Albin
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Vaccine and Infectious Disease Division (VIDD)
Fred Hutchinson Cancer Research Center (FHCRC)

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Shaun Thomas
Sent: Thursday, June 27, 2013 11:16 AM
To: Dave Johansen
Cc: bricklen; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitions not Working as Expected

On 06/27/2013 12:42 PM, Dave Johansen wrote:

> Or what about something like DATE_TRUNC("DAY", now())? Or would that
> run into the same optimization/planner problems as CURRENT_DATE?

Same issue. This seems to work, though I'm not entirely sure of the
implications:

UPDATE pg_proc
    SET provolatile = 'i'
  WHERE proname = 'date_in';

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Partitions not Working as Expected

От
Shaun Thomas
Дата:
On 06/27/2013 01:42 PM, Tom Lane wrote:

> That will break things: CURRENT_DATE will then be equivalent to just
> writing today's date as a literal.

Interesting. I tested it by creating a view and a table with a default,
and it always seems to get translated to:

('now'::text)::date

But I'll take your explanation at face value, since that doesn't imply
what the output would be. What's interesting is that EnterpriseDB has
their own pg_catalog.current_date function that gets called by the
CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the
current_date function as immutable without affecting a lot of other
internals.

On EDB, this actually works:

UPDATE pg_proc
    SET provolatile = 'i'
  WHERE proname = 'current_date';

Then the plan gets pared down as desired. But again, if the date were to
roll over, I'm not sure what would happen. I wish I could test that
without fiddling with machine times.

> I don't see any very good solution to your problem within the current
> approach to partitioning, which is basically theorem-proving.  That
> proof engine has no concept of time passing, let alone the sort of
> detailed knowledge of the semantics of this particular function that
> would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now,
> it will always be so in the future as well".

I get it. From the context of two months ago, CURRENT_DATE >
'2013-06-20' would return a different answer than it would today, which
isn't really good for proofs.

The only way for it to work as "expected" would be to add a first pass
to resolve any immediate variables, which would effectively throw away
plan caches. I'd actually be OK with that.

> I think most hackers agree that the way forward on partitioning
> involves building hard-wired logic that selects the correct
> partition(s) at run-time, so that it wouldn't particularly matter
> where we got the comparison value from or whether it was a constant.

Fair enough. I'll stop telling devs to use current_date instead of ORM
injections, then. Hopefully we can track down and tweak the affected
queries on the tables we're partitioning without too much work and QA.

Thanks, Tom!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
Shaun Thomas
Дата:
On 06/27/2013 01:45 PM, Albin, Lloyd P wrote:

> We have also run into this with our production databases. We worked
> around the issue by adding an index to each child table so that it
> scans all the child index's instead of the child table's. For us
> this made a large performance improvement.

Haha. Yeah, that's assumed. I'd never use a partition set without the
constraint column in at least one index. The proof of concept was just
to illustrate that the planner doesn't even get that far in ignoring
"empty" partitions. Sure, scanning the inapplicable child tables has a
low cost, but it's not zero. With about a dozen of them, query times
increase from 0.130ms to 0.280ms for my test case. Not a lot in the long
run, but in a OLTP system, it can be fairly noticeable.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
Andrew Dunstan
Дата:
On 06/27/2013 03:14 PM, Shaun Thomas wrote:
> On 06/27/2013 01:42 PM, Tom Lane wrote:
>
>> That will break things: CURRENT_DATE will then be equivalent to just
>> writing today's date as a literal.
>
> Interesting. I tested it by creating a view and a table with a
> default, and it always seems to get translated to:
>
> ('now'::text)::date
>
> But I'll take your explanation at face value, since that doesn't imply
> what the output would be. What's interesting is that EnterpriseDB has
> their own pg_catalog.current_date function that gets called by the
> CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the
> current_date function as immutable without affecting a lot of other
> internals.
>
> On EDB, this actually works:
>
> UPDATE pg_proc
>    SET provolatile = 'i'
>  WHERE proname = 'current_date';


But that's a lie, surely. If it breaks you have nobody to blame but
yourself. There's a reason EDB haven't marked their function immutable -
it's not.

cheers

andrew



Re: Partitions not Working as Expected

От
Shaun Thomas
Дата:
On 06/27/2013 02:49 PM, Andrew Dunstan wrote:

> But that's a lie, surely. If it breaks you have nobody to blame but
> yourself. There's a reason EDB haven't marked their function
> immutable - it's not.

Well, yeah. That's why I'm testing it in a dev system. :)

None of this will probably pan out, but I need to see the limits of how
badly I can abuse the database.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Partitions not Working as Expected

От
Andres Freund
Дата:
On 2013-06-27 14:42:26 -0400, Tom Lane wrote:
> Shaun Thomas <sthomas@optionshouse.com> writes:
> > On 06/27/2013 12:42 PM, Dave Johansen wrote:
> >> Or what about something like DATE_TRUNC("DAY", now())? Or would that run
> >> into the same optimization/planner problems as CURRENT_DATE?
>
> > Same issue. This seems to work, though I'm not entirely sure of the
> > implications:
>
> > UPDATE pg_proc
> >     SET provolatile = 'i'
> >   WHERE proname = 'date_in';
>
> That will break things: CURRENT_DATE will then be equivalent to just
> writing today's date as a literal.
>
> It's conceivable that it wouldn't break any scenario that you personally
> care about, if you never use CURRENT_DATE in any view, rule, column
> default expression, or cached plan; but it seems mighty risky from here.

> I don't see any very good solution to your problem within the current
> approach to partitioning, which is basically theorem-proving.  That
> proof engine has no concept of time passing, let alone the sort of
> detailed knowledge of the semantics of this particular function that
> would allow it to conclude "if CURRENT_DATE > '2013-06-20' is true now,
> it will always be so in the future as well".

Couldn't we at least significantly improve on the status quo by
detecting we're currently planning a query that's only going to be
executed once (because it's directly executed or because were planning a
onetime plan for specific parameters) and inline stable functions before
doing the theorem proving?

Maybe I am missing something here?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Partitions not Working as Expected

От
Josh Berkus
Дата:
> At this point I wonder why CURRENT_DATE even exists, if using it is
> apparently detrimental to query execution.

It's good for inserts.  ;-)


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Partitions not Working as Expected

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> Couldn't we at least significantly improve on the status quo by
> detecting we're currently planning a query that's only going to be
> executed once (because it's directly executed or because were planning a
> onetime plan for specific parameters) and inline stable functions before
> doing the theorem proving?

I think Haas went down that rabbit hole before you.  The current
definition of stable functions is not strong enough to guarantee that a
plan-time evaluation would give the same result as a run-time
evaluation, not even in one-shot-plan cases.  The obvious reason why not
is that the planner isn't using the same snapshot that the executor will
use (which is not that easy to change, see his failed patch from a year
or so back).  But even if we rejiggered things enough so the query did
use the same snapshot that'd been used for planning, I'm not very
convinced that such an assumption would be valid.  The assumptions for
stable functions are pretty weak really.

            regards, tom lane


Re: Partitions not Working as Expected

От
Tom Lane
Дата:
Shaun Thomas <sthomas@optionshouse.com> writes:
> On 06/27/2013 01:42 PM, Tom Lane wrote:
>> That will break things: CURRENT_DATE will then be equivalent to just
>> writing today's date as a literal.

> Interesting. I tested it by creating a view and a table with a default,
> and it always seems to get translated to:
> ('now'::text)::date

Yeah, that is what the parser does with it.  The way to read that is
"a constant of type text, containing the string 'now', to which is
applied a run-time coercion to type date".  The run-time coercion is
equivalent to (and implemented by) calling text_out then date_in.
If date_in is marked immutable, then the planner will correctly conclude
that it can fold the whole thing to a date constant on sight.  Now you
have a plan with a hard-wired value for the current date, which will
begin to give wrong answers after midnight passes.  If your usage
pattern is such that no query plan survives across a day boundary,
you might not notice ... but it's still wrong.

> ... What's interesting is that EnterpriseDB has
> their own pg_catalog.current_date function that gets called by the
> CURRENT_DATE keyword.

Yeah, we really ought to do likewise in the community code.  But that
doesn't affect the fundamental semantic issue here, which is that you
can't mark the expression immutable without creating incorrect cached
plans.

            regards, tom lane