Re: [HACKERS] UPDATE of partition key

Поиск
Список
Период
Сортировка
От Amit Khandekar
Тема Re: [HACKERS] UPDATE of partition key
Дата
Msg-id CAJ3gD9dXSSG1PEYHLjB6wDfQHc2R3Nc2R=31fgde8SejoeHAAg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] UPDATE of partition key  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] UPDATE of partition key  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 1 June 2017 at 03:25, Robert Haas <robertmhaas@gmail.com> wrote:
> Greg/Amit's idea of using the CTID field rather than an infomask bit
> seems like a possibly promising approach.  Not everything that needs
> bit-space can use the CTID field, so using it is a little less likely
> to conflict with something else we want to do in the future than using
> a precious infomask bit.  However, I'm worried about this:
>
>     /* Make sure there is no forward chain link in t_ctid */
>     tp.t_data->t_ctid = tp.t_self;
>
> The comment does not say *why* we need to make sure that there is no
> forward chain link, but it implies that some code somewhere in the
> system does or at one time did depend on no forward link existing.
> Any such code that still exists will need to be updated.  Anybody know
> what code that might be, exactly?

I am going to have a look overall at this approach, and about code
somewhere else which might be assuming that t_ctid cannot be Invalid.

> Regarding the trigger issue, I can't claim to have a terribly strong
> opinion on this.  I think that practically anything we do here might
> upset somebody, but probably any halfway-reasonable thing we choose to
> do will be OK for most people.  However, there seems to be a
> discrepancy between the approach that got the most votes and the one
> that is implemented by the v8 patch, so that seems like something to
> fix.

Yes, I have started working on updating the patch to use that approach
(BR and AR update triggers on source and destination partition
respectively, instead of delete+insert) The approach taken by the
patch (BR update + delete+insert triggers) didn't require any changes
in the way ExecDelete() and ExecInsert() were called. Now we would
require to skip the delete/insert triggers, so some flags need to be
passed to these functions, or else have stripped down versions of
ExecDelete() and ExecInsert() which don't do other things like
RETURNING handling and firing triggers.

>
> For what it's worth, in the future, I imagine that we might allow
> adding a trigger to a partitioned table and having that cascade down
> to all descendant tables.  In that world, firing the BR UPDATE trigger
> for the old partition and the AR UPDATE trigger for the new partition
> will look a lot like the behavior the user would expect on an
> unpartitioned table, which could be viewed as a good thing.  On the
> other hand, it's still going to be a DELETE+INSERT under the hood for
> the foreseeable future, so firing the delete triggers and then the
> insert triggers is also defensible.

Ok, I was assuming that there won't be any plans to support triggers
on a partitioned table, but yes, I had imagined how the behaviour
would be in this world. Currently, users who want to have triggers on
a table that happens to be a partitioned table, have to install the
same trigger on each of the leaf partitions, since there is no other
choice. But we would never know whether a trigger on a leaf partition
was actually meant to be specifically on that individual partition or
it was actually meant to be a trigger on a root partitioned table.
Hence there is the difficulty of deciding the right behaviour in case
of triggers with row movement.

If we have an AR UPDATE trigger on root table, then during row
movement, it does not matter whether we fire the trigger on source or
destination, because it is the same single trigger cascaded on both
the partitions. If there is a trigger installed specifically on a leaf
partition, then we know that it should not be fired on other
partitions since it is specifically made for this one. And same
applies for delete and insert triggers: If installed on parent, don't
involve them in row-movement; only fire them if installed on leaf
partitions regardless of whether it was an internally generated
delete+insert due to row-movement). Similarly we can think about BR
triggers.

Of courses, DBAs should be aware of triggers that are already
installed in the table ancestors before installing a new one on a
child table.

Overall, it becomes much clearer what to do if we decide to allow
triggers on partitioned tables.

> Is there any big difference between these appraoches in terms
> of how much code is required to make this work?

You mean if we allow triggers on partitioned tables ? I think we would
have to keep some flag in the trigger data (or somewhere else) that
the trigger actually belongs to upper partitioned table, and so for
delete+insert, don't fire such trigger. Other than that, we don't have
to decide in any unique way which trigger to fire on which table.

>
> In terms of the approach taken by the patch itself, it seems
> surprising to me that the patch only calls
> ExecSetupPartitionTupleRouting when an update fails the partition
> constraint.  Note that in the insert case, we call that function at
> the start of execution;

> calling it in the middle seems to involve additional hazards;
> for example, is it really safe to add additional
> ResultRelInfos midway through the operation?

I thought since the additional ResultRelInfos go into
mtstate->mt_partitions which is independent of
estate->es_result_relations, that should be safe.

> Is it safe to take more locks midway through the operation?

I can imagine some rows already updated, when other tasks like ALTER
TABLE or CREATE INDEX happen on other partitions which are still
unlocked, and then for row movement we try to lock these other
partitions and wait for the DDL tasks to complete. But I didn't see
any particular issues with that. But correct me if you suspect a
possible issue. One issue can be if we were able to modify the table
attributes, but I believe we cannot do that for inherited columns.

> It seems like it might be a lot
> safer to decide at the beginning of the operation whether this is
> needed -- we can skip it if none of the columns involved in the
> partition key (or partition key expressions) are mentioned in the
> update.
> (There's also the issue of triggers,

The reason I thought it cannot be done at the start of the execution,
is because even if we know that update is not modifying the partition
key column, we are not certain that the final NEW row has its
partition key column unchanged, because of triggers. I understand it
might be weird for a user requiring to modify a partition key value,
but if a user does that, it will result in crash because we won't have
the partition routing setup, thinking that there is no partition key
column in the UPDATE.

And we also cannot unconditionally setup the partition routing on all
updates, for performance reasons.

> I'm not sure that it's sensible to allow a trigger on an
> individual partition to reroute an update to another partition
> what if we get an infinite loop?)

You mean, if the other table has another trigger that will again route
to the original partition ? But this infinite loop problem could occur
even for 2 normal tables ?

>
> +            if (concurrently_deleted)
> +                return NULL;
>
> I don't understand the motivation for this change, and there are no
> comments explaining it that I can see.

Yeah comments, I think, are missing. I thought in the ExecDelete()
they are there, but they are not.
If a concurrent delete already deleted the row, we should not bother
about moving the row, hence the above code.


> Perhaps the concurrency-related (i.e. EPQ) behavior here could be
> tested via the isolation tester.
WIll check.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: [HACKERS] Error while creating subscription when server isrunning in single user mode
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [HACKERS] Server ignores contents of SASLInitialResponse