Re: [HACKERS] UPDATE of partition key

Поиск
Список
Период
Сортировка
От Amit Khandekar
Тема Re: [HACKERS] UPDATE of partition key
Дата
Msg-id CAJ3gD9fOYdKwQ4ROZ4qiAJOEZn8h6VJ4DecM5bcrOGqPTU1OSA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] UPDATE of partition key  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] UPDATE of partition key  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] UPDATE of partition key  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2 June 2017 at 01:17, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 1, 2017 at 7:41 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> Regarding the trigger issue, I can't claim to have a terribly strong
>>> opinion on this.  I think that practically anything we do here might
>>> upset somebody, but probably any halfway-reasonable thing we choose to
>>> do will be OK for most people.  However, there seems to be a
>>> discrepancy between the approach that got the most votes and the one
>>> that is implemented by the v8 patch, so that seems like something to
>>> fix.
>>
>> Yes, I have started working on updating the patch to use that approach
>> (BR and AR update triggers on source and destination partition
>> respectively, instead of delete+insert) The approach taken by the
>> patch (BR update + delete+insert triggers) didn't require any changes
>> in the way ExecDelete() and ExecInsert() were called. Now we would
>> require to skip the delete/insert triggers, so some flags need to be
>> passed to these functions, or else have stripped down versions of
>> ExecDelete() and ExecInsert() which don't do other things like
>> RETURNING handling and firing triggers.
>
> See, that strikes me as a pretty good argument for firing the
> DELETE+INSERT triggers...
>
> I'm not wedded to that approach, but "what makes the code simplest?"
> is not a bad tiebreak, other things being equal.

Yes, that sounds good to me. But I think we want to wait for other's
opinion because it is quite understandable that two triggers firing on
the same partition sounds odd.

>
>>> In terms of the approach taken by the patch itself, it seems
>>> surprising to me that the patch only calls
>>> ExecSetupPartitionTupleRouting when an update fails the partition
>>> constraint.  Note that in the insert case, we call that function at
>>> the start of execution;
>>
>>> calling it in the middle seems to involve additional hazards;
>>> for example, is it really safe to add additional
>>> ResultRelInfos midway through the operation?
>>
>> I thought since the additional ResultRelInfos go into
>> mtstate->mt_partitions which is independent of
>> estate->es_result_relations, that should be safe.
>
> I don't know.  That sounds scary to me, but it might be OK.  Probably
> needs more study.
>
>>> Is it safe to take more locks midway through the operation?
>>
>> I can imagine some rows already updated, when other tasks like ALTER
>> TABLE or CREATE INDEX happen on other partitions which are still
>> unlocked, and then for row movement we try to lock these other
>> partitions and wait for the DDL tasks to complete. But I didn't see
>> any particular issues with that. But correct me if you suspect a
>> possible issue. One issue can be if we were able to modify the table
>> attributes, but I believe we cannot do that for inherited columns.
>
> It's just that it's very unlike what we do anywhere else.  I don't
> have a real specific idea in mind about what might totally break, but
> at a minimum it could certainly cause behavior that can't happen
> today.  Today, if you run a query on some tables, it will block
> waiting for any locks at the beginning of the query, and the query
> won't begin executing until it has all of the locks.  With this
> approach, you might block midway through; you might even deadlock
> midway through.  Maybe that's not overtly broken, but it's at least
> got the possibility of being surprising.
>
> Now, I'd actually kind of like to have behavior like this for other
> cases, too.  If we're inserting one row, can't we just lock the one
> partition into which it needs to get inserted, rather than all of
> them?  But I'm wary of introducing such behavior incidentally in a
> patch whose main goal is to allow UPDATE row movement.  Figuring out
> what could go wrong and fixing it seems like a substantial project all
> of its own.

Yes, I agree it makes sense trying to avoid introducing something we
haven't tried before, in this patch, as far as possible.

>
>> The reason I thought it cannot be done at the start of the execution,
>> is because even if we know that update is not modifying the partition
>> key column, we are not certain that the final NEW row has its
>> partition key column unchanged, because of triggers. I understand it
>> might be weird for a user requiring to modify a partition key value,
>> but if a user does that, it will result in crash because we won't have
>> the partition routing setup, thinking that there is no partition key
>> column in the UPDATE.
>
> I think we could avoid that issue.  Suppose we select the target
> partition based only on the original NEW tuple.  If a trigger on that
> partition subsequently modifies the tuple so that it no longer
> satisfies the partition constraint for that partition, just let it
> ERROR out normally.

Ok, so you are saying, don't allow a partition trigger to initiate the
row movement. I think we should keep this as a documented restriction.
Actually it would be unfortunate that we would have to keep this
restriction only because of implementation issue.

So, according to that, below would be the logic :

Run partition constraint check on the original NEW row.
If it succeeds :
{   Fire BR UPDATE trigger on the original partition.   Run partition constraint check again with the modified NEW row
(may be do this only if the trigger modified the partition key)   If it fails,       abort.   Else       proceed with
theusual local update.
 
}
else
{   Fire BR UPDATE trigger on original partition.   Find the right partition for the modified NEW row.   If it is the
samepartition,       proceed with the usual local update.   else       do the row movement.
 
}


> Actually, it seems like that's probably the
> *easiest* behavior to implement.  Otherwise, you might fire triggers,
> discover that you need to re-route the tuple, and then ... fire
> triggers again on the new partition, which might reroute it again?

Why would update BR trigger fire on the new partition ? On the new
partition, only BR INSERT trigger would fire if at all we decide to
fire delete+insert triggers. And insert trigger would not again cause
the tuple to be re-routed because it's an insert.

>
>>> I'm not sure that it's sensible to allow a trigger on an
>>> individual partition to reroute an update to another partition
>>> what if we get an infinite loop?)
>>
>> You mean, if the other table has another trigger that will again route
>> to the original partition ? But this infinite loop problem could occur
>> even for 2 normal tables ?
>
> How?  For a normal trigger, nothing it does can change which table is targeted.

I thought you were considering the possibility that on the new
partition, the trigger function itself is running another update stmt,
which is also possible for normal tables .

But now I think you are saying, the row that is being inserted into
the new partition might get again modified by the INSERT trigger on
the new partition, which might in turn cause it to fail the new
partition constraint. But in that case, it will not cause another row
movement, because in the new partition, it's an INSERT, not an UPDATE,
so the operation would end there, aborted.

But correct me if I you were thinking of a different scenario that can
cause infinite loop.


-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitriy Sarafannikov
Дата:
Сообщение: Re: [HACKERS] Broken hint bits (freeze)
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: [HACKERS] Do we need the gcc feature "__builtin_expect" topromote the branches prediction?