Re: [HACKERS] expanding inheritance in partition bound order

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: [HACKERS] expanding inheritance in partition bound order
Дата
Msg-id f10fdc6f-d4d4-8040-734a-85216a0bb67c@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] expanding inheritance in partition bound order  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Список pgsql-hackers
On 2017/08/21 13:11, Ashutosh Bapat wrote:
> On Sat, Aug 19, 2017 at 1:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Aug 18, 2017 at 1:17 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> 0004 patch in partition-wise join patchset has code to expand
>>> partition hierarchy. That patch is expanding inheritance hierarchy in
>>> depth first manner. Robert commented that instead of depth first
>>> manner, it will be better if we expand it in partitioned tables first
>>> manner. With the latest changes in your patch-set I don't see the
>>> reason for expanding in partitioned tables first order. Can you please
>>> elaborate if we still need to expand in partitioned table first
>>> manner? May be we should just address the expansion issue in 0004
>>> instead of dividing it in two patches.
>>
>> Let me see if I can clarify.  I think there are three requirements here:
>>
>> A. Amit wants to be able to prune leaf partitions before opening and
>> locking those relations, so that pruning can be done earlier and,
>> therefore, more cheaply.
> 
> We could actually prune partitioned tables thus pruning whole
> partitioned tree. Do we want to then lock those partitioned tables but
> not the leaves in that tree?

I think it would be nice if we keep the current approach of expanding the
whole partition tree in expand_inherited_rtentry(), at least to know how
many more entries a given partitioned table will add to the query's range
table.  It would be nice, because that way, we don't have to worry *right
away* about modifying the planner to cope with some new behavior whereby
range table entries will get added at some later point.

Then, as you might already know, if we want to use the partition bound
order when expanding the whole partition tree, we will depend on the
relcache entries of the partitioned tables in that tree, which will
require us to take locks on them.

It does sound odd that we may end up locking a child *partitioned* table
that is potentially prune-able, but maybe there is some way to relinquish
that lock once we find out that it is pruned after all.

>> B. Partition-wise join wants to expand the inheritance hierarchy a
>> level at a time instead of all at once, ending up with rte->inh = true
>> entries for intermediate partitioned tables.
> 
> And create AppendRelInfos which pair children with their partitioned
> parent rather than the root.

There should be *some* way to preserve the parent-child RT index mapping
and to preserve the multi-level hierarchy, a way that doesn't map all the
child tables in a partition tree to the root table's RT index.

AppendRelInfo is one way of doing that mapping currently, but if we
continue to treat it as the only way (for the purpose of mapping), we will
be stuck with the way they are created and manipulated.  Especially, if we
are going to always depend on the fact that root->append_rel_list contains
all the required AppendRelInfos, then we will always have to fully expand
the inheritance in expand_inherited_rtentry() (by fully I mean, locking
and opening all the child tables, instead of just the partitioned tables).In a world where we don't want to open the
partitionchild tables in
 
expand_inherited_rtentry(), we cannot build the corresponding
AppendRelInfos there.  Note that this is not about completely dispelling
AppendRelInfos-for-partition-child-tables, but about doing without them
being present in root->append_rel_list.  We would still need them to be
able to use adjust_appendrel_attrs(), etc., but we can create them at a
different time and store them in a place that's not root->append_rel_list;For example, inside the RelOptInfo of the
childtable.  Or perhaps, we
 
can still add them to root->append_rel_list, but will need to be careful
about the places that depend on the timing of AppendRelInfos being present
there.

>> So in the end game I think
>> expand_inherited_rtentry looks approximately like this:
>>
>> 1. Calling find_all_inheritors with a new only-lock-the-partitions
>> flag.  This should result in locking all partitioned tables in the
>> inheritance hierarchy in breadth-first, low-OID-first order.  (When
>> the only-lock-the-partitions isn't specified, all partitioned tables
>> should still be locked before any unpartitioned tables, so that the
>> locking order in that case is consistent with what we do here.)
> 
> I am confused. When "only-lock-the-partitions" is true, do we expect
> intermediate partitioned tables to be locked? Why then "only" in the
> flag?

I guess Robert meant to say lock-only-"partitioned"-tables?

>> 2. Iterate over the partitioned tables identified in step 1 in the
>> order in which they were returned.  For each one:
>> - Decide which children can be pruned.
>> - Lock the unpruned, non-partitioned children in low-OID-first order.
>>
>> 3. Make another pass over the inheritance hierarchy, starting at the
>> root.  Traverse the whole hierarchy in breadth-first in *bound* order.
>> Add RTEs and AppendRelInfos as we go -- these will have rte->inh =
>> true for partitioned tables and rte->inh = false for leaf partitions.
> 
> These two seem to be based on the assumption that we have to lock all
> the partitioned tables even if they can be pruned.
> 
> For regular inheritance there is only a single parent, so traversing
> the list returned by find_all_inheritors suffices. For partitioned
> hierarchy, we need to know the parent of every child, which is not
> part of the find_all_inheritors() output list. Even if it returns only
> the partitioned children, they themselves may have a parent different
> from the root partition. So, we have to discard the output of
> find_all_inheritors() for partitioned hierarchy and traverse the
> children as per their orders in oids array in PartitionDesc. May be
> it's better to separate the guts of expand_inherited_rtentry(), which
> create AppendRelInfos, RTEs and rowmarks for the children into a
> separate routine. Use that routine in two different functions
> expand_inherited_rtentry() and expand_partitioned_rtentry() for
> regular inheritance and partitioned inheritance resp. The functions
> will use two different traversal methods appropriate for traversing
> the children in either case.

I just posted a patch [1] that implements something like this, but
implementation details might seem different.  It doesn't however implement
a solution to the problem that you pose that partitioned child tables that
are prune-able are locked.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/098b9c71-1915-1a2a-8d52-1a7a50ce79e8%40lab.ntt.co.jp




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Haribabu Kommi
Дата:
Сообщение: Re: [HACKERS] Pluggable storage
Следующее
От: Chris Travers
Дата:
Сообщение: Re: [HACKERS] [RFC] What would be difficult to make data modelspluggable for making PostgreSQL a multi-model database?