Re: [HACKERS] expanding inheritance in partition bound order

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: [HACKERS] expanding inheritance in partition bound order
Дата
Msg-id 4e073db7-c6a1-1819-f522-9907e2f12a39@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] expanding inheritance in partition bound order  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] expanding inheritance in partition bound order  (Beena Emerson <memissemerson@gmail.com>)
Re: [HACKERS] expanding inheritance in partition bound order  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2017/08/05 2:25, Robert Haas wrote:
> Concretely, my proposal is:
> 
> 1. Before calling RelationGetPartitionDispatchInfo, the calling code
> should use find_all_inheritors to lock all the relevant relations (or
> the planner could use find_all_inheritors to get a list of relation
> OIDs, store it in the plan in order, and then at execution time we
> visit them in that order and lock them).
> 
> 2. RelationGetPartitionDispatchInfo assumes the relations are already locked.
> 
> 3. While we're optimizing, in the first loop inside of
> RelationGetPartitionDispatchInfo, don't call heap_open().  Instead,
> use get_rel_relkind() to see whether we've got a partitioned table; if
> so, open it.  If not, there's no need.
> 
> 4. For safety, add a function bool RelationLockHeldByMe(Oid) and add
> to this loop a check if (!RelationLockHeldByMe(lfirst_oid(lc1))
> elog(ERROR, ...).  Might be interesting to stuff that check into the
> relation_open(..., NoLock) path, too.
> 
> One objection to this line of attack is that there might be a good
> case for locking only the partitioned inheritors first and then going
> back and locking the leaf nodes in a second pass, or even only when
> required for a particular row.  However, that doesn't require putting
> everything in bound order - it only requires moving the partitioned
> children to the beginning of the list.  And I think rather than having
> new logic for that, we should teach find_inheritance_children() to do
> that directly.  I have a feeling Ashutosh is going to cringe at this
> suggestion, but my idea is to do this by denormalizing: add a column
> to pg_inherits indicating whether the child is of
> RELKIND_PARTITIONED_TABLE.  Then, when find_inheritance_children scans
> pg_inherits, it can pull that flag out for free along with the
> relation OID, and qsort() first by the flag and then by the OID.  It
> can also return the number of initial elements of its return value
> which have that flag set.
> 
> Then, in find_all_inheritors, we split rels_list into
> partitioned_rels_list and other_rels_list, and process
> partitioned_rels_list in its entirety before touching other_rels_list;
> they get concatenated at the end.
> 
> Now, find_all_inheritors and find_inheritance_children can also grow a
> flag bool only_partitioned_children; if set, then we skip the
> unpartitioned children entirely.
> 
> With all that in place, you can call find_all_inheritors(blah blah,
> false) to lock the whole hierarchy, or find_all_inheritors(blah blah,
> true) to lock just the partitioned tables in the hierarchy.  You get a
> consistent lock order either way, and if you start with only the
> partitioned tables and later want the leaf partitions too, you just go
> through the partitioned children in the order they were returned and
> find_inheritance_children(blah blah, false) on each one of them and
> the lock order is exactly consistent with what you would have gotten
> if you'd done find_all_inheritors(blah blah, false) originally.

I tried implementing this in the attached set of patches.

[PATCH 2/5] Teach pg_inherits.c a bit about partitioning

Both find_inheritance_children and find_all_inheritors now list
partitioned child tables before non-partitioned ones and return
the number of partitioned tables in an optional output argument

[PATCH 3/5] Relieve RelationGetPartitionDispatchInfo() of doing locking

Anyone who wants to call RelationGetPartitionDispatchInfo() must first
acquire locks using find_all_inheritors.

TODO: Add RelationLockHeldByMe() and put if (!RelationLockHeldByMe())
elog(ERROR, ...) check in RelationGetPartitionDispatchInfo()

[PATCH 4/5] Teach expand_inherited_rtentry to use partition bound order

After locking the child tables using find_all_inheritors, we discard
the list of child table OIDs that it generates and rebuild the same
using the information returned by RelationGetPartitionDispatchInfo.

[PATCH 5/5] Store in pg_inherits if a child is a partitioned table

Catalog changes so that is_partitioned property of child tables is now
stored in pg_inherits.  This avoids consulting syscache to get that
property as is currently implemented in patch 2/5.


I haven't yet done anything about changing the timing of opening and
locking leaf partitions, because it will require some more thinking about
the required planner changes.  But the above set of patches will get us
far enough to get leaf partition sub-plans appear in the partition bound
order (same order as what partition tuple-routing uses in the executor).

With the above patches, we get the desired order of child sub-plans in
Append and ModifyTable plans for partitioned tables:

create table p (a int) partition by range (a);
create table p4 partition of p for values from (30) to (40);
create table p3 partition of p for values from (20) to (30);
create table p2 partition of p for values from (10) to (20);
create table p1 partition of p for values from (1) to (10);
create table p0 partition of p for values from (minvalue) to (1) partition
by list (a);
create table p00 partition of p0 for values in (0);
create table p01 partition of p0 for values in (-1);
create table p02 partition of p0 for values in (-2);


explain select count(*) from p;
                            QUERY PLAN
-------------------------------------------------------------------
 Aggregate  (cost=293.12..293.13 rows=1 width=8)
   ->  Append  (cost=0.00..248.50 rows=17850 width=0)
         ->  Seq Scan on p1  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p2  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p3  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p4  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p02  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p01  (cost=0.00..35.50 rows=2550 width=0)
         ->  Seq Scan on p00  (cost=0.00..35.50 rows=2550 width=0)

explain update p set a = a;
                          QUERY PLAN
--------------------------------------------------------------
 Update on p  (cost=0.00..248.50 rows=17850 width=10)
   Update on p1
   Update on p2
   Update on p3
   Update on p4
   Update on p02
   Update on p01
   Update on p00
   ->  Seq Scan on p1  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p2  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p3  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p4  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p02  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p01  (cost=0.00..35.50 rows=2550 width=10)
   ->  Seq Scan on p00  (cost=0.00..35.50 rows=2550 width=10)
(15 rows)

> P.S. While I haven't reviewed 0002 in detail, I think the concept of
> minimizing what needs to be built in RelationGetPartitionDispatchInfo
> is a very good idea.

I put this patch ahead in the list and so it's now 0001.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Causal reads take II
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] dubious error message from partition.c