Обсуждение: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

Поиск
Список
Период
Сортировка

Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
Hi everyone,

I've been trialling different inheritance schemes for partitioning to a large number of tables.  I am looking at ~1e9
records,totaling ~200GB. 

I've found that a k-ary table inheritance tree works quite well to reduce the O(n) CHECK constraint overhead [1] in the
queryplanner when enabling partition constraint exclusion. 

I've played with binary (k=2) trees, and have found that query planning time is shorter for shallow trees where k>>2.
(Itappears that "more work" spent checking CHECK constraints is faster than to recur down the inheritance tree.  Is
thisbecause fewer table locks are involved?) 

A given tree structure (e.g. k=16) has a good query-plan time for SELECT queries in my case.  The query-plan times,
however,for UPDATE and DELETE are unfortunately quite quite bad.  (I was surprised that query-planning time was not
similaracross all three queries?) 

My machine swaps wildly when PostgreSQL plans an UPDATE or DELETE.  It does not swap for the SELECT query planning at
all. There is no noticeable memory growth by the postgres process for the SELECT plans.  There is huge memory usage
growthwhen running a query-plan for UPDATE or DELETE.  The difference is something like going from 50MB to over 10GB of
theprocess' virtual memory. 

I'm trialling PostgreSQL on a MacBook Pro having 8GB physical RAM.


Here's an example, where the DDL for the inheritance tree [2] is generated by a Python script [3].

1. Query planning time for a SELECT query

> $ echo "explain select * from ptest where id = 34324234; \q" | time -p psql ptest
>                                  QUERY PLAN
> -------------------------------------------------------------------------------
> Result  (cost=0.00..160.00 rows=48 width=4)
>   ->  Append  (cost=0.00..160.00 rows=48 width=4)
>         ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
>         ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=4)
>               Filter: (id = 34324234)
> (10 rows)
>
> real         0.99
> user         0.00
> sys          0.00
> $

2. Query planning time for a DELETE query

> $ echo "explain delete from ptest where id = 34324234; \q" | time -p psql ptest
>                               QUERY PLAN
> -------------------------------------------------------------------------
> Delete  (cost=0.00..160.00 rows=48 width=6)
>   ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
> (9 rows)
>
> real       317.14
> user         0.00
> sys          0.00
> $

3. Query planning time for an UPDATE query

> $ echo "explain update ptest set id = 34324235 where id = 34324234;
> \q" | time -p psql ptest
>                               QUERY PLAN
> -------------------------------------------------------------------------
> Update  (cost=0.00..160.00 rows=48 width=6)
>   ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
>   ->  Seq Scan on ptest_0_4_1 ptest  (cost=0.00..40.00 rows=12 width=6)
>         Filter: (id = 34324234)
> (9 rows)
>
> real       331.72
> user         0.00
> sys          0.00
> $


Query planning on the leaf nodes works properly for all query-types:

> $ echo "explain delete from ptest_0_4_1 where id = 34324234; \q" | time -p psql ptest
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Delete  (cost=0.00..40.00 rows=12 width=6)
>    ->  Seq Scan on ptest_0_4_1  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (3 rows)
>
> real         0.01
> user         0.00
> sys          0.00
>
> $ echo "explain update ptest_0_4_1 set id = 34324235 where id = 34324234; \q" | time -p psql ptest
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Update  (cost=0.00..40.00 rows=12 width=6)
>    ->  Seq Scan on ptest_0_4_1  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (3 rows)
>
> real         0.01
> user         0.00
> sys          0.00
> $


With SELECT constraint exclusion working, I can define plpgsql functions to UPDATE or DELETE the leaf tables directly,
butusing such an interface isn't terribly elegant. 

I therefore tried writing the plpgsql functions for UPDATE and DELETE anyway, with the idea of linking to a TRIGGER on
theparent ptest table.  This didn't work as expected either, unless I polluted my application's SQL queries with the
"ONLY"keyword to make sure the trigger fired [4]. 


Is the query-planning times and memory use as demonstrated above normal?  I am hoping this is just a defect in the
query-plannerthat we might be able to fix so that PostgreSQL can manage my large data set with more ease. 

Any advice appreciated,

John


[1] http://wiki.postgresql.org/wiki/Table_partitioning#SELECT.2C_UPDATE.2C_DELETE
[2] http://jpap.org/files/partition-test.txt
[3] http://jpap.org/files/partition-test.py
[4] http://archives.postgresql.org/pgsql-hackers/2008-11/msg01883.php


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
John Papandriopoulos <dr.jpap@gmail.com> writes:
> I've found that a k-ary table inheritance tree works quite well to
> reduce the O(n) CHECK constraint overhead [1] in the query planner
> when enabling partition constraint exclusion.

Um ... you mean you're creating intermediate child tables for no reason
except to reduce the number of direct descendants of any one table?
That's an utter waste of time, because the first thing the planner will
do with an inheritance tree is flatten it.  Just create *one* parent
table and make all the leaf tables direct children of it.

> My machine swaps wildly when PostgreSQL plans an UPDATE or DELETE.

This is a strong hint that you've got way too many child tables.

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/3/10 10:20 PM, Tom Lane wrote:
> John Papandriopoulos<dr.jpap@gmail.com>  writes:
>> I've found that a k-ary table inheritance tree works quite well to
>> reduce the O(n) CHECK constraint overhead [1] in the query planner
>> when enabling partition constraint exclusion.
>
> Um ... you mean you're creating intermediate child tables for no reason
> except to reduce the number of direct descendants of any one table?
> That's an utter waste of time, because the first thing the planner will
> do with an inheritance tree is flatten it.  Just create *one* parent
> table and make all the leaf tables direct children of it.
>
>> My machine swaps wildly when PostgreSQL plans an UPDATE or DELETE.
>
> This is a strong hint that you've got way too many child tables.

Thanks for your advice, Tom.

I've recreated the same example with just one parent table, and 4096 child tables.

SELECT query planning is lightning fast as before; DELETE and UPDATE cause my machine to swap.

What's different about DELETE and UPDATE here?  If I've way too many child tables, why isn't the SELECT query plan
causingthe same large memory usage? 

Kindest,
John


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
John Papandriopoulos <dr.jpap@gmail.com> writes:
> I've recreated the same example with just one parent table, and 4096 child tables.

> SELECT query planning is lightning fast as before; DELETE and UPDATE cause my machine to swap.

> What's different about DELETE and UPDATE here?

Hmm.  Rules?  Triggers?  You seem to be assuming the problem is at the
planner stage but I'm not sure you've proven that.

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/4/10 8:42 AM, Tom Lane wrote:
> John Papandriopoulos<dr.jpap@gmail.com>  writes:
>> I've recreated the same example with just one parent table, and 4096 child tables.
>
>> SELECT query planning is lightning fast as before; DELETE and UPDATE cause my machine to swap.
>
>> What's different about DELETE and UPDATE here?
>
> Hmm.  Rules?  Triggers?  You seem to be assuming the problem is at the
> planner stage but I'm not sure you've proven that.


My example starts off with a new database (e.g. createdb ptest).

I set up my schema using a machine generated SQL file [1] that simply
creates a table

   create table ptest ( id integer );

and N = 0..4095 inherited children

   create table ptest_N (
      check ( (id >= N_min) and (id <= N_max) )
   ) inherits (ptest);

that split the desired id::integer range into N buckets, one for each of
the N partitions.

I then immediately run a query-plan using EXPLAIN that exhibits the
described behavior: super-fast plan for a SELECT statement, without
swapping, and memory intensive (swapping) plans for DELETE and UPDATE.

There are no triggers, no rules, no plpgsql functions, no indexes and no
inserted data.


Is there a more simple example that might help me convince you that
we're exercising just the planner stage?

Kindest,
John

[1] http://jpap.org/files/partition-test-flat.txt


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Mladen Gogala
Дата:
Tom Lane wrote:
> Hmm.  Rules?  Triggers?  You seem to be assuming the problem is at the
> planner stage but I'm not sure you've proven that.
>
>             regards, tom lane
>
>
Hmmm, I vaguely recollect a similar thread, started by me, although with
fewer partitions. In my experience, planner doesn't do a very good job
with partitions, especially with things like "min" or "max" which should
not be resolved by a full table scan, if there are indexes on partitions.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
John Papandriopoulos <dr.jpap@gmail.com> writes:
> I set up my schema using a machine generated SQL file [1] that simply
> creates a table
>    create table ptest ( id integer );
> and N = 0..4095 inherited children
>    create table ptest_N (
>       check ( (id >= N_min) and (id <= N_max) )
>    ) inherits (ptest);

> that split the desired id::integer range into N buckets, one for each of
> the N partitions.

> I then immediately run a query-plan using EXPLAIN that exhibits the
> described behavior: super-fast plan for a SELECT statement, without
> swapping, and memory intensive (swapping) plans for DELETE and UPDATE.

[ pokes at that for a bit ... ]  Ah, I had forgotten that UPDATE/DELETE
go through inheritance_planner() while SELECT doesn't.  And
inheritance_planner() makes a copy of the querytree, including the
already-expanded range table, for each target relation.  So the memory
usage is O(N^2) in the number of child tables.

It's difficult to do much better than that in the general case where the
children might have different rowtypes from the parent: you need a
distinct targetlist for each target relation.  I expect that we can be a
lot smarter when we have true partitioning support (which among other
things is going to have to enforce that all the children have identical
column sets).  But the inheritance mechanism was never intended to scale
to anything like this number of children.

I remain of the opinion that you're using far too many child tables.
Please note the statement at the bottom of
http://www.postgresql.org/docs/9.0/static/ddl-partitioning.html:

    Partitioning using these techniques will work well with up to
    perhaps a hundred partitions; don't try to use many thousands of
    partitions.

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Mladen Gogala
Дата:
Tom Lane wrote:
>     Partitioning using these techniques will work well with up to
>     perhaps a hundred partitions; don't try to use many thousands of
>     partitions.
>
>             regards, tom lane
>
Hmmm, what happens if I need 10 years of data, in monthly partitions? It
would be 120 partitions. Can you please elaborate on that limitation?
Any plans on lifting that restriction?

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Jochen Erwied
Дата:
Sunday, December 5, 2010, 12:19:29 AM you wrote:

> Hmmm, what happens if I need 10 years of data, in monthly partitions? It
> would be 120 partitions. Can you please elaborate on that limitation?
> Any plans on lifting that restriction?

I'm running a partitioning scheme using 256 tables with a maximum of 16
million rows (namely IPv4-addresses) and a current total of about 2.5
billion rows, there are no deletes though, but lots of updates.

Using triggers or rules on the main table in my case showed to be not very
effective, so I reverted to updating the inherited tables directly. This
way you still can use a SELECT on the main table letting the optimizer do
it's work, but do not run into the problem of oversized shared memory usage
when doing DELETEs or UPDATEs

IMHO if you are using large partitioning schemes, handle the logic of which
table to update or delete in your application. In most cases extending the
underlying application will be much less work and more flexible than trying
to write a dynamic rule/trigger to do the same job.

--
Jochen Erwied     |   home: jochen@erwied.eu     +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: joe@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erwied@vodafone.de       +49-173-5404164


Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/4/10 3:38 PM, Jochen Erwied wrote:
> Sunday, December 5, 2010, 12:19:29 AM you wrote:
>
>> Hmmm, what happens if I need 10 years of data, in monthly partitions? It
>> would be 120 partitions. Can you please elaborate on that limitation?
>> Any plans on lifting that restriction?
>
> I'm running a partitioning scheme using 256 tables with a maximum of 16
> million rows (namely IPv4-addresses) and a current total of about 2.5
> billion rows, there are no deletes though, but lots of updates.
>
> Using triggers or rules on the main table in my case showed to be not very
> effective, so I reverted to updating the inherited tables directly. This
> way you still can use a SELECT on the main table letting the optimizer do
> it's work, but do not run into the problem of oversized shared memory usage
> when doing DELETEs or UPDATEs
>
> IMHO if you are using large partitioning schemes, handle the logic of which
> table to update or delete in your application. In most cases extending the
> underlying application will be much less work and more flexible than trying
> to write a dynamic rule/trigger to do the same job.
>

Sounds like my experience exactly, however I am considering forgoing an update altogether, by just combining a DELETE
withan INSERT.  I'm not sure how that might affect indexing performance as compared to an UPDATE. 

I also had trouble with triggers; but found that if you use the "ONLY" keyword, they work again: see my original post
ofthis thread.  In that case, the application SQL still retrains some simplicity.  On this topic, I think there's quite
abit of confusion and updates to the documentation would help greatly. 

John

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/4/10 2:40 PM, Tom Lane wrote:
> [ pokes at that for a bit ... ]  Ah, I had forgotten that UPDATE/DELETE
> go through inheritance_planner() while SELECT doesn't.  And
> inheritance_planner() makes a copy of the querytree, including the
> already-expanded range table, for each target relation.  So the memory
> usage is O(N^2) in the number of child tables.

Thanks for the pointer to the code and explanation.

In inheritance_planner(...) I see the memcpy of the input query tree, but for my example constraint exclusion would
onlyresult in one child being included.  Or is the O(N^2) memory usage from elsewhere? 

> It's difficult to do much better than that in the general case where the
> children might have different rowtypes from the parent: you need a
> distinct targetlist for each target relation.  I expect that we can be a
> lot smarter when we have true partitioning support (which among other
> things is going to have to enforce that all the children have identical
> column sets).

Is this the same as saying that the inheritance_planner(...) can be avoided if it were known that the children have the
samerowtype as the parent?  Is it easy to check? 

> But the inheritance mechanism was never intended to scale to anything like
> this number of children.

Unfortunately so. :(

When I push the number of child tables up to 10k, the SELECT planning starts to slow down (~1 sec), though no swapping.

> I remain of the opinion that you're using far too many child tables.
> Please note the statement at the bottom of
> http://www.postgresql.org/docs/9.0/static/ddl-partitioning.html:
>
>     Partitioning using these techniques will work well with up to
>     perhaps a hundred partitions; don't try to use many thousands of
>     partitions.

Thanks for the reference---I'm well aware of it, but it was not clear to me why: the reason I was structuring my
partitioninheritance as a tree, because I thought it was simply a case of time-to-scan the CHECK constraints at any
levelin the inheritance hierarchy.  You've been a great help in helping my understanding PostgreSQL inheritance. 

Best,
John

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/4/10 3:19 PM, Mladen Gogala wrote:
> Tom Lane wrote:
>> Partitioning using these techniques will work well with up to
>> perhaps a hundred partitions; don't try to use many thousands of
>> partitions.
>> regards, tom lane
> Hmmm, what happens if I need 10 years of data, in monthly partitions? It
> would be 120 partitions. Can you please elaborate on that limitation?
> Any plans on lifting that restriction?
>

Even with 1k partitions, I don't have any issues any of the SELECT, UPDATE or DELETE queries and with 8GB RAM.

I suppose if you're using INSERT triggers, you'd want to make sure your plpgsql function is fast: I'm partitioning by
power-of-two,so can use right-shift n-bits to quickly compute the insertion table name, rather than using an if-else-if
chain.

John

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
John Papandriopoulos <dr.jpap@gmail.com> writes:
> On 12/4/10 2:40 PM, Tom Lane wrote:
>> [ pokes at that for a bit ... ]  Ah, I had forgotten that UPDATE/DELETE
>> go through inheritance_planner() while SELECT doesn't.  And
>> inheritance_planner() makes a copy of the querytree, including the
>> already-expanded range table, for each target relation.  So the memory
>> usage is O(N^2) in the number of child tables.

> Thanks for the pointer to the code and explanation.

> In inheritance_planner(...) I see the memcpy of the input query tree, but for my example constraint exclusion would
onlyresult in one child being included.  Or is the O(N^2) memory usage from elsewhere? 

It's copying the whole range table, even though any one child query only
needs one of the child table entries.  There might be some way to
finesse that, but it's not clear how else to end up with a final plan
tree in which each child table has the correct RT index number.

You could get rid of the memory growth, at the cost of a lot of
tree-copying, by doing each child plan step in a discardable memory
context.  I'm not sure that'd be a win for normal sizes of inheritance
trees though --- you'd need to copy the querytree in and then copy the
resulting plantree out again, for each child.  (Hm, but we're doing the
front-end copy already ...)

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
I wrote:
> You could get rid of the memory growth, at the cost of a lot of
> tree-copying, by doing each child plan step in a discardable memory
> context.  I'm not sure that'd be a win for normal sizes of inheritance
> trees though --- you'd need to copy the querytree in and then copy the
> resulting plantree out again, for each child.  (Hm, but we're doing the
> front-end copy already ...)

That worked better than I thought it would --- see
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=d1001a78ce612a16ea622b558f5fc2b68c45ab4c
I'm not intending to back-patch this, but it ought to apply cleanly to
9.0.x if you want it badly enough to carry a local patch.

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/5/10 12:14 PM, Tom Lane wrote:
> I wrote:
>> You could get rid of the memory growth, at the cost of a lot of
>> tree-copying, by doing each child plan step in a discardable memory
>> context.  I'm not sure that'd be a win for normal sizes of inheritance
>> trees though --- you'd need to copy the querytree in and then copy the
>> resulting plantree out again, for each child.  (Hm, but we're doing the
>> front-end copy already ...)
>
> That worked better than I thought it would --- see
> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=d1001a78ce612a16ea622b558f5fc2b68c45ab4c
> I'm not intending to back-patch this, but it ought to apply cleanly to
> 9.0.x if you want it badly enough to carry a local patch.

Fantastic, Tom!  Thank you kindly for taking the time to create the patch.

The memory issue has indeed disappeared---there was no noticeable memory increase in the three queries below, with 4096
children. Inheritance planning overhead is around 20x for UPDATE/DELETE compared to SELECT; thankfully they are
requiredmuch less frequently in my case. 

I am still wondering whether the inheritance_planner(...) can be avoided if the rowtypes of children are the same as
theparent?  (I'm not yet sufficiently familiar with the source to determine on my own.)  If that's the case, is there a
simpletest (like cardinality of columns) that can be used to differentiate partitioning from general inheritance cases? 

Thanks again!

John


Simple partitioning test timing with 4096 children:

> $ echo "explain select * from ptest where id = 34324234; \q" | time -p psql ptest
>                                  QUERY PLAN
> ----------------------------------------------------------------------------
>  Result  (cost=0.00..80.00 rows=24 width=4)
>    ->  Append  (cost=0.00..80.00 rows=24 width=4)
>          ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=4)
>                Filter: (id = 34324234)
>          ->  Seq Scan on ptest_65 ptest  (cost=0.00..40.00 rows=12 width=4)
>                Filter: (id = 34324234)
> (6 rows)
>
> real         0.55
> user         0.00
> sys          0.00
> $ echo "explain delete from ptest where id = 34324234; \q" | time -p psql ptest
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Delete  (cost=0.00..80.00 rows=24 width=6)
>    ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
>    ->  Seq Scan on ptest_65 ptest  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (5 rows)
>
> real        10.47
> user         0.00
> sys          0.00
> $ echo "explain update ptest set id = 34324235 where id = 34324234; \q" | time -p psql ptest
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Update  (cost=0.00..80.00 rows=24 width=6)
>    ->  Seq Scan on ptest  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
>    ->  Seq Scan on ptest_65 ptest  (cost=0.00..40.00 rows=12 width=6)
>          Filter: (id = 34324234)
> (5 rows)
>
> real         9.53
> user         0.00
> sys          0.00
> $




Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
Tom Lane
Дата:
John Papandriopoulos <dr.jpap@gmail.com> writes:
> The memory issue has indeed disappeared---there was no noticeable memory increase in the three queries below, with
4096children.  Inheritance planning overhead is around 20x for UPDATE/DELETE compared to SELECT; thankfully they are
requiredmuch less frequently in my case. 

> I am still wondering whether the inheritance_planner(...) can be avoided if the rowtypes of children are the same as
theparent? 

Possibly, but it's far from a trivial change.  The difficulty is that
you'd need to generate a different plan tree structure.
inheritance_planner generates a separate subtree for each target table,
so that the ModifyTable node can execute each one separately and know
a priori which target table the rows coming out of a particular subplan
apply to.  If we expand inheritance "at the bottom" like SELECT does,
that table identifier would have to propagate up as part of the returned
rows.  It's doable but not simple.  Moreover, it's far from clear this
actually would save much, and it could easily slow things down at
execution time.

Have you done any profiling work to see where the extra time goes?
I had thought that the unreferenced RTE entries would simply be ignored
in each subplanning step, but maybe there's something that is examining
them.

            regards, tom lane

Re: Query-plan for partitioned UPDATE/DELETE slow and swaps vmem compared to SELECT

От
John Papandriopoulos
Дата:
On 12/6/10 10:03 AM, Tom Lane wrote:
> John Papandriopoulos<dr.jpap@gmail.com>  writes:
>> I am still wondering whether the inheritance_planner(...) can be avoided if the rowtypes of children are the same as
theparent? 
>
> Possibly, but it's far from a trivial change.  The difficulty is that
> you'd need to generate a different plan tree structure.
> inheritance_planner generates a separate subtree for each target table,
> so that the ModifyTable node can execute each one separately and know
> a priori which target table the rows coming out of a particular subplan
> apply to.  If we expand inheritance "at the bottom" like SELECT does,
> that table identifier would have to propagate up as part of the returned
> rows.  It's doable but not simple.  Moreover, it's far from clear this
> actually would save much, and it could easily slow things down at
> execution time.

Making more sense now... :-)

I guess the real time-saver, in the specific case of partitioning, might then come from avoiding generation of subplans
completely(rather than later dropping the dummies) by exploiting the disjointness of each partition. 

> Have you done any profiling work to see where the extra time goes?
> I had thought that the unreferenced RTE entries would simply be ignored
> in each subplanning step, but maybe there's something that is examining
> them.

I've run the following queries

   explain SELECT * FROM ptest where id = 121212;
   explain DELETE FROM ptest where id = 121212;

under the Google perftools sampling profiler with the same 4096 child inheritance tree.  Results below.


The DELETE query-planning spend a lot of time maintaining a query tree.  Might this be what you're referring to?

> Total: 11808 samples
>     1895  16.0%  16.0%     7316  62.0% _range_table_mutator
>     1426  12.1%  28.1%     1426  12.1% _lseek
>     1097   9.3%  37.4%     2854  24.2% _query_planner
>     1048   8.9%  46.3%     1577  13.4% _AllocSetAlloc
>      853   7.2%  53.5%      853   7.2% 0x00007fffffe008a5
>      762   6.5%  60.0%      762   6.5% _posix_madvise
>      696   5.9%  65.9%      696   5.9% _list_nth_cell
>      575   4.9%  70.7%      575   4.9% 0x00007fffffe00b8b
>      482   4.1%  74.8%      482   4.1% _AllocSetFreeIndex
>      271   2.3%  77.1%     1284  10.9% _new_tail_cell
>      181   1.5%  78.6%      181   1.5% 0x00007fffffe00ba7
>      173   1.5%  80.1%      173   1.5% 0x00007fffffe00bb2
>      160   1.4%  81.5%     1452  12.3% _lappend
>      159   1.3%  82.8%      159   1.3% 0x00007fffffe00b96
>      158   1.3%  84.1%      158   1.3% 0x00007fffffe00b9c
>      139   1.2%  85.3%      139   1.2% 0x00007fffffe007c1
>      136   1.2%  86.5%     1877  15.9% _MemoryContextAlloc
>      129   1.1%  87.6%      129   1.1% 0x00007fffffe00673
>      125   1.1%  88.6%      125   1.1% 0x00007fffffe008ab
>      118   1.0%  89.6%      118   1.0% 0x00007fffffe008a0
>      110   0.9%  90.6%     3055  25.9% ___inline_memcpy_chk
>      106   0.9%  91.5%      106   0.9% _strlen
>      105   0.9%  92.3%      105   0.9% 0x00007fffffe008b7
>       95   0.8%  93.1%       95   0.8% _get_tabstat_entry
>       85   0.7%  93.9%       93   0.8% _find_all_inheritors
>       75   0.6%  94.5%       75   0.6% 0x00007fffffe00b85
>       47   0.4%  94.9%       47   0.4% 0x00007fffffe008b1
>       46   0.4%  95.3%       46   0.4% 0x00007fffffe00695
>       42   0.4%  95.6%       42   0.4% ___memcpy_chk
>       30   0.3%  95.9%       30   0.3% _pqGetpwuid
>       29   0.2%  96.1%       29   0.2% 0x00007fffffe00b90
>       29   0.2%  96.4%       60   0.5% _set_base_rel_pathlists
>       28   0.2%  96.6%       28   0.2% 0x00007fffffe007bf
>       24   0.2%  96.8%       24   0.2% 0x00007fffffe007cb
>       23   0.2%  97.0%       23   0.2% 0x00007fffffe006ab
>       22   0.2%  97.2%       23   0.2% _generate_base_implied_equalities
>       20   0.2%  97.4%       20   0.2% _memcpy
>       14   0.1%  97.5%       14   0.1% 0x00007fffffe0080d
>       13   0.1%  97.6%       13   0.1% _open
>       12   0.1%  97.7%       12   0.1% 0x00007fffffe007f9
> [rest snipped]


The SELECT query-planning doesn't, where you can clearly see that a lot of time is spent amassing all children
(find_all_inheritors)that could be avoided with true partitioning support. 

> Total: 433 samples
>      111  25.6%  25.6%      111  25.6% _AllocSetAlloc
>       79  18.2%  43.9%      124  28.6% _find_all_inheritors
>       38   8.8%  52.7%       38   8.8% _lseek
>       24   5.5%  58.2%       24   5.5% _read
>       19   4.4%  62.6%       32   7.4% _new_list
>       17   3.9%  66.5%       18   4.2% _get_tabstat_entry
>       14   3.2%  69.7%       36   8.3% _MemoryContextAllocZeroAligned
>       11   2.5%  72.3%       28   6.5% _MemoryContextAllocZero
>       11   2.5%  74.8%       19   4.4% _systable_beginscan
>        8   1.8%  76.7%        8   1.8% 0x00007fffffe007c5
>        8   1.8%  78.5%        8   1.8% 0x00007fffffe00a2f
>        8   1.8%  80.4%       32   7.4% _hash_search_with_hash_value
>        7   1.6%  82.0%        7   1.6% _open
>        6   1.4%  83.4%        6   1.4% 0x00007fffffe008c8
> [rest snipped]

Kindest,
John