Обсуждение: Concurrency bug in UPDATE of partition-key

Поиск
Список
Период
Сортировка

Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
(Mail subject changed; original thread : [1])

On 8 March 2018 at 11:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 8 March 2018 at 09:15, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>>
>> CREATE TABLE pa_target (key integer, val text)
>>     PARTITION BY LIST (key);
>> CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
>> CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);
>> INSERT INTO pa_target VALUES (1, 'initial1');
>>
>> session1:
>> BEGIN;
>> UPDATE pa_target SET val = val || ' updated by update1' WHERE key = 1;
>> UPDATE 1
>> postgres=# SELECT * FROM pa_target ;
>>  key |             val
>> -----+-----------------------------
>>    1 | initial1 updated by update1
>> (1 row)
>>
>> session2:
>> UPDATE pa_target SET val = val || ' updated by update2', key = key + 1 WHERE
>> key = 1
>> <blocks>
>>
>> session1:
>> postgres=# COMMIT;
>> COMMIT
>>
>> <session1 unblocks and completes its UPDATE>
>>
>> postgres=# SELECT * FROM pa_target ;
>>  key |             val
>> -----+-----------------------------
>>    2 | initial1 updated by update2
>> (1 row)
>>
>> Ouch. The committed updates by session1 are overwritten by session2. This
>> clearly violates the rules that rest of the system obeys and is not
>> acceptable IMHO.
>>
>> Clearly, ExecUpdate() while moving rows between partitions is missing out on
>> re-constructing the to-be-updated tuple, based on the latest tuple in the
>> update chain. Instead, it's simply deleting the latest tuple and inserting a
>> new tuple in the new partition based on the old tuple. That's simply wrong.
>
> You are right. This need to be fixed. This is a different issue than
> the particular one that is being worked upon in this thread, and both
> these issues have different fixes.
>

As discussed above, there is a concurrency bug where during UPDATE of
a partition-key, another transaction T2 updates the same row.

> Like you said, the tuple needs to be reconstructed when ExecDelete()
> finds that the row has been updated by another transaction. We should
> send back this information from ExecDelete() (I think tupleid
> parameter gets updated in this case), and then in ExecUpdate() we
> should goto lreplace, so that the the row is fetched back similar to
> how it happens when heap_update() knows that the tuple was updated.

The attached WIP patch is how the fix is turning out to be.
ExecDelete() passes back the epqslot returned by EvalPlanQual().
ExecUpdate() uses this re-fetched tuple to re-process the row, just
like it does when heap_update() returns HeapTupleUpdated.

I need to write an isolation test for this fix.

[1] : https://www.postgresql.org/message-id/CAJ3gD9d%3DwcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ%40mail.gmail.com


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Thu, Mar 8, 2018 at 4:55 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> (Mail subject changed; original thread : [1])
>
> On 8 March 2018 at 11:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> Clearly, ExecUpdate() while moving rows between partitions is missing out on
>>> re-constructing the to-be-updated tuple, based on the latest tuple in the
>>> update chain. Instead, it's simply deleting the latest tuple and inserting a
>>> new tuple in the new partition based on the old tuple. That's simply wrong.
>>
>> You are right. This need to be fixed. This is a different issue than
>> the particular one that is being worked upon in this thread, and both
>> these issues have different fixes.
>>
>
> As discussed above, there is a concurrency bug where during UPDATE of
> a partition-key, another transaction T2 updates the same row.
>

We have one more behavior related to concurrency that would be
impacted due to row movement.  Basically, now Select .. for Key Share
will block the Update that routes the tuple to a different partition
even if the update doesn't update the key (primary/unique key) column.
Example,

create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
insert into foo values(1, 'Initial record');

Session-1
---------------
begin;
select * from foo where a=1 for key share;

Session-2
---------------
begin;
update foo set a=1, b= b|| '-> update 1' where a=1;
update foo set a=2, b= b|| '-> update 2' where a=1;  --this will block

You can see when the update moves the row to a different partition, it
will block as internally it performs delete.  I think as we have
already documented that such an update is internally Delete+Insert,
this behavior is expected, but I think it is better if we update the
docs (Row-level locks section) as well.

In particular, I am talking about below text in Row-level locks section [1].
FOR KEY SHARE
Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT
FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared
lock blocks other transactions from performing DELETE or any UPDATE
that changes the key values, but not other UPDATE,

[1] - https://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 8 March 2018 at 16:55, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> The attached WIP patch is how the fix is turning out to be.
> ExecDelete() passes back the epqslot returned by EvalPlanQual().
> ExecUpdate() uses this re-fetched tuple to re-process the row, just
> like it does when heap_update() returns HeapTupleUpdated.
>
> I need to write an isolation test for this fix.

Attached patch now has the isolation test included. The only
difference with the earlier WIP patch is about some cosmetic changes,
and some more comments.

>
> [1] : https://www.postgresql.org/message-id/CAJ3gD9d%3DwcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ%40mail.gmail.com
>
>
> --
> Thanks,
> -Amit Khandekar
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/

In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/


Doesn't this belong to PostgreSQL 11 Open Items [1] or are you proposing it as a feature enhancement for next version?
 
In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.


  if (!tuple_deleted)
- return NULL;
+ {
+ /*
+ * epqslot will be typically NULL. But when ExecDelete() finds
+ * that another transaction has concurrently updated the same
+ * row, it re-fetches the row, skips the delete, and epqslot is
+ * set to the re-fetched tuple slot. In that case, we need to
+ * do all the checks again.
+ */
+ if (TupIsNull(epqslot))
+ return NULL;
+ else
+ {
+ slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+ tuple = ExecMaterializeSlot(slot);
+ goto lreplace;
+ }
+ }

I think this will allow before row delete triggers to be fired more than once.  Normally, if the EvalPlanQual testing generates a new tuple, we don't fire the triggers again.  Now, one possibility could be that we don't skip the delete after a concurrent update and still allow inserts to use a new tuple generated by EvalPlanQual testing of delete.  However, I think we need to perform rechecks for update to check if the modified tuple still requires to be moved to new partition, right or do you have some other reason in mind?



--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
> wrote:
>>
>> Attached is a rebased patch version. Also included it in the upcoming
>> commitfest :
>> https://commitfest.postgresql.org/18/1660/
>>
>
> Doesn't this belong to PostgreSQL 11 Open Items [1] or are you proposing it
> as a feature enhancement for next version?

Yes, it needs to be in the Open items. I will have it added in that list.

>
>>
>> In the rebased version, the new test cases are added in the existing
>> isolation/specs/partition-key-update-1.spec test.
>>
>
>   if (!tuple_deleted)
> - return NULL;
> + {
> + /*
> + * epqslot will be typically NULL. But when ExecDelete() finds
> + * that another transaction has concurrently updated the same
> + * row, it re-fetches the row, skips the delete, and epqslot is
> + * set to the re-fetched tuple slot. In that case, we need to
> + * do all the checks again.
> + */
> + if (TupIsNull(epqslot))
> + return NULL;
> + else
> + {
> + slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
> + tuple = ExecMaterializeSlot(slot);
> + goto lreplace;
> + }
> + }
>
> I think this will allow before row delete triggers to be fired more than
> once.  Normally, if the EvalPlanQual testing generates a new tuple, we don't
> fire the triggers again.

If there are BR delete triggers, the tuple will be locked using
GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
run, since the tuple is already locked due to triggers having run.

But that leads me to think : The same concurrency issue can occur in
GetTupleForTrigger() also. Say, a concurrent session has already
locked the tuple, and GetTupleForTrigger() would wait and then return
the updated tuple in its last parameter newSlot. In that case, we need
to pass this slot back through ExecBRDeleteTriggers(), and further
through epqslot parameter of ExecDelete(). But yes, in this case, we
should avoid calling this trigger function the second time.

If you agree on the above, I will send an updated patch.


> Now, one possibility could be that we don't skip
> the delete after a concurrent update and still allow inserts to use a new
> tuple generated by EvalPlanQual testing of delete.  However, I think we need
> to perform rechecks for update to check if the modified tuple still requires
> to be moved to new partition, right or do you have some other reason in
> mind?

Yes, that's the reason we need to start again from lreplace; i.e., for
checking not just the partition constraints but all the other checks
that were required earlier, because the tuple has been modified. It
may even end up moving to a different partition than the one chosen
earlier.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Dilip Kumar
Дата:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/

In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.

/*
+  * If this is part of an UPDATE of partition-key, the
+  * epq tuple will contain the changes from this
+  * transaction over and above the updates done by the
+  * other transaction. The caller should now use this
+  * tuple as its NEW tuple, rather than the earlier NEW
+  * tuple.
+  */
+ if (epqslot)
+ {
+ *epqslot = my_epqslot;
+ return NULL;
+ }

I think we need simmilar fix if there are BR Delete trigger and the ExecDelete is blocked on heap_lock_tuple because the concurrent transaction is updating the same row.  Because in such case it would have already got the final tuple so the hep_delete will return MayBeUpdated.

Below test can reproduce the issue.

CREATE TABLE pa_target (key integer, val text) PARTITION BY LIST (key);
CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);

CREATE TABLE deleted_row (count int);
CREATE OR REPLACE FUNCTION br_delete() RETURNS trigger AS
$$BEGIN
insert into deleted_row values(OLD.key);
    RETURN OLD;
END;$$ LANGUAGE plpgsql;

CREATE TRIGGER test_br_trig BEFORE DELETE ON part1 FOR EACH ROW EXECUTE PROCEDURE br_delete();

INSERT INTO pa_target VALUES (1, 'initial1');

session1:
postgres=# BEGIN;
BEGIN
postgres=# UPDATE pa_target SET val = val || ' updated by update1' WHERE key = 1;
UPDATE 1

session2:
postgres=# UPDATE pa_target SET val = val || ' updated by update2', key = key + 1 WHERE key =1;
<block>

session1:
postgres=# commit;
COMMIT

UPDATE 1
postgres=# select * from pa_target ;
 key |             val             
-----+-----------------------------
   2 | initial1 updated by update2   --> session1's update is overwritten.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Concurrency bug in UPDATE of partition-key

От
Dilip Kumar
Дата:
On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/

In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.

/*
+  * If this is part of an UPDATE of partition-key, the
+  * epq tuple will contain the changes from this
+  * transaction over and above the updates done by the
+  * other transaction. The caller should now use this
+  * tuple as its NEW tuple, rather than the earlier NEW
+  * tuple.
+  */
+ if (epqslot)
+ {
+ *epqslot = my_epqslot;
+ return NULL;
+ }

I think we need simmilar fix if there are BR Delete trigger and the ExecDelete is blocked on heap_lock_tuple because the concurrent transaction is updating the same row.  Because in such case it would have already got the final tuple so the hep_delete will return MayBeUpdated.

Amit Kapila had pointed (offlist) that you are already trying to fix this issue.   I have one more question,  Suppose the first time we come for ExecDelete and fire the BR delete trigger and then realize that we need to retry because of the concurrent update.  Now, during retry, we realize that because of the latest value we don't need to route the tuple to the different partition so now we will call ExecUpdate and may fire the BRUpdate triggers as well?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 8 June 2018 at 16:44, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
>> wrote:
>>>
>>> Attached is a rebased patch version. Also included it in the upcoming
>>> commitfest :
>>> https://commitfest.postgresql.org/18/1660/
>>>
>>> In the rebased version, the new test cases are added in the existing
>>> isolation/specs/partition-key-update-1.spec test.
>>
>>
>> /*
>> +  * If this is part of an UPDATE of partition-key, the
>> +  * epq tuple will contain the changes from this
>> +  * transaction over and above the updates done by the
>> +  * other transaction. The caller should now use this
>> +  * tuple as its NEW tuple, rather than the earlier NEW
>> +  * tuple.
>> +  */
>> + if (epqslot)
>> + {
>> + *epqslot = my_epqslot;
>> + return NULL;
>> + }
>>
>> I think we need simmilar fix if there are BR Delete trigger and the
>> ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
>> is updating the same row.  Because in such case it would have already got
>> the final tuple so the hep_delete will return MayBeUpdated.
>
>
> Amit Kapila had pointed (offlist) that you are already trying to fix this
> issue.

Yes, his comment about triggers made me realize the trigger issue
which you mentioned, about which I also explained in earlier reply.

> I have one more question,  Suppose the first time we come for
> ExecDelete and fire the BR delete trigger and then realize that we need to
> retry because of the concurrent update.  Now, during retry, we realize that
> because of the latest value we don't need to route the tuple to the
> different partition so now we will call ExecUpdate and may fire the BRUpdate
> triggers as well?

No, we don't fire the BR update trigger again. We go to lreplace
label, and BR trigger code is above that label. The reason we don't
need to run the trigger check again is because if we had already fired
it before, the row must have been locked, so concurrency scenario
won't arise in the first place.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Dilip Kumar
Дата:

On Mon, Jun 11, 2018 at 10:19 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 8 June 2018 at 16:44, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
>> wrote:
>>>
>>> Attached is a rebased patch version. Also included it in the upcoming
>>> commitfest :
>>> https://commitfest.postgresql.org/18/1660/
>>>
>>> In the rebased version, the new test cases are added in the existing
>>> isolation/specs/partition-key-update-1.spec test.
>>
>>
>> /*
>> +  * If this is part of an UPDATE of partition-key, the
>> +  * epq tuple will contain the changes from this
>> +  * transaction over and above the updates done by the
>> +  * other transaction. The caller should now use this
>> +  * tuple as its NEW tuple, rather than the earlier NEW
>> +  * tuple.
>> +  */
>> + if (epqslot)
>> + {
>> + *epqslot = my_epqslot;
>> + return NULL;
>> + }
>>
>> I think we need simmilar fix if there are BR Delete trigger and the
>> ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
>> is updating the same row.  Because in such case it would have already got
>> the final tuple so the hep_delete will return MayBeUpdated.
>
>
> Amit Kapila had pointed (offlist) that you are already trying to fix this
> issue.

Yes, his comment about triggers made me realize the trigger issue
which you mentioned, about which I also explained in earlier reply.

> I have one more question,  Suppose the first time we come for
> ExecDelete and fire the BR delete trigger and then realize that we need to
> retry because of the concurrent update.  Now, during retry, we realize that
> because of the latest value we don't need to route the tuple to the
> different partition so now we will call ExecUpdate and may fire the BRUpdate
> triggers as well?

No, we don't fire the BR update trigger again. We go to lreplace
label, and BR trigger code is above that label. The reason we don't
need to run the trigger check again is because if we had already fired
it before, the row must have been locked, so concurrency scenario
won't arise in the first place.
 
Ok, that make sense. 

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
>> wrote:
>>
>> I think this will allow before row delete triggers to be fired more than
>> once.  Normally, if the EvalPlanQual testing generates a new tuple, we don't
>> fire the triggers again.
>
> If there are BR delete triggers, the tuple will be locked using
> GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
> run, since the tuple is already locked due to triggers having run.
>
> But that leads me to think : The same concurrency issue can occur in
> GetTupleForTrigger() also. Say, a concurrent session has already
> locked the tuple, and GetTupleForTrigger() would wait and then return
> the updated tuple in its last parameter newSlot. In that case, we need
> to pass this slot back through ExecBRDeleteTriggers(), and further
> through epqslot parameter of ExecDelete(). But yes, in this case, we
> should avoid calling this trigger function the second time.
>
> If you agree on the above, I will send an updated patch.
>

Sounds reasonable to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Mon, Jun 11, 2018 at 3:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
>>> wrote:
>>>
>>> I think this will allow before row delete triggers to be fired more than
>>> once.  Normally, if the EvalPlanQual testing generates a new tuple, we don't
>>> fire the triggers again.
>>
>> If there are BR delete triggers, the tuple will be locked using
>> GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
>> run, since the tuple is already locked due to triggers having run.
>>
>> But that leads me to think : The same concurrency issue can occur in
>> GetTupleForTrigger() also. Say, a concurrent session has already
>> locked the tuple, and GetTupleForTrigger() would wait and then return
>> the updated tuple in its last parameter newSlot. In that case, we need
>> to pass this slot back through ExecBRDeleteTriggers(), and further
>> through epqslot parameter of ExecDelete(). But yes, in this case, we
>> should avoid calling this trigger function the second time.
>>
>> If you agree on the above, I will send an updated patch.
>>
>
> Sounds reasonable to me.
>

Try to add a test case which covers BR trigger code path where you are
planning to update.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 11 June 2018 at 15:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 11, 2018 at 3:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
>>>> wrote:
>>>>
>>>> I think this will allow before row delete triggers to be fired more than
>>>> once.  Normally, if the EvalPlanQual testing generates a new tuple, we don't
>>>> fire the triggers again.
>>>
>>> If there are BR delete triggers, the tuple will be locked using
>>> GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
>>> run, since the tuple is already locked due to triggers having run.
>>>
>>> But that leads me to think : The same concurrency issue can occur in
>>> GetTupleForTrigger() also. Say, a concurrent session has already
>>> locked the tuple, and GetTupleForTrigger() would wait and then return
>>> the updated tuple in its last parameter newSlot. In that case, we need
>>> to pass this slot back through ExecBRDeleteTriggers(), and further
>>> through epqslot parameter of ExecDelete(). But yes, in this case, we
>>> should avoid calling this trigger function the second time.
>>>
>>> If you agree on the above, I will send an updated patch.
>>>
>>
>> Sounds reasonable to me.

Attached is v2 version of the patch. It contains the above
trigger-related issue fixed.

The updated tuple is passed back using the existing newslot parameter
of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
new epqslot parameter, it means caller wants to skip the trigger
execution, because the updated tuple needs to be again checked for
constraints. I have added comments of this behaviour in the
ExecBRDeleteTriggers() function header.

Because the trigger execution is skipped the first time ExecDelete()
is called, I didn't have to explicitly add any changes for preventing
trigger execution the second time.

> Try to add a test case which covers BR trigger code path where you are
> planning to update.

Added the test cases. Also added cases where the concurrent session
causes the EvalPlanQual() test to fail, so the partition key update
does not happen.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Dilip Kumar
Дата:
On Mon, Jun 18, 2018 at 10:21 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> Attached is v2 version of the patch. It contains the above
> trigger-related issue fixed.
>
> The updated tuple is passed back using the existing newslot parameter
> of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
> new epqslot parameter, it means caller wants to skip the trigger
> execution, because the updated tuple needs to be again checked for
> constraints. I have added comments of this behaviour in the
> ExecBRDeleteTriggers() function header.
>

Thanks for the updated patch.  I have verified the BR trigger
behaviour, its working fine with the patch.

+  CREATE FUNCTION func_footrg() RETURNS TRIGGER AS $$
+  BEGIN
+     RETURN OLD;
+  END $$ LANGUAGE PLPGSQL;
+  CREATE TRIGGER footrg_ondel BEFORE DELETE ON footrg1
+   FOR EACH ROW EXECUTE PROCEDURE func_footrg();

Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed?  For example,  in
the above trigger function, we can maintain a count in some table (e.g
how many time delete trigger got executed) and after test over we can
verify the same.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Jun 18, 2018 at 10:21 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> Attached is v2 version of the patch. It contains the above
>> trigger-related issue fixed.
>>
>> The updated tuple is passed back using the existing newslot parameter
>> of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
>> new epqslot parameter, it means caller wants to skip the trigger
>> execution, because the updated tuple needs to be again checked for
>> constraints. I have added comments of this behaviour in the
>> ExecBRDeleteTriggers() function header.
>>
>
> Thanks for the updated patch.  I have verified the BR trigger
> behaviour, its working fine with the patch.
>
> +  CREATE FUNCTION func_footrg() RETURNS TRIGGER AS $$
> +  BEGIN
> +     RETURN OLD;
> +  END $$ LANGUAGE PLPGSQL;
> +  CREATE TRIGGER footrg_ondel BEFORE DELETE ON footrg1
> +   FOR EACH ROW EXECUTE PROCEDURE func_footrg();
>
> Should we also create a test case where we can verify that some
> unnecessary or duplicate triggers are not executed?
>

I am not sure how much value we will add by having such a test.  In
general, it is good to have tests that cover various aspects of
functionality, but OTOH, we have to be careful to not overdo it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> Should we also create a test case where we can verify that some
>> unnecessary or duplicate triggers are not executed?
>>
>
> I am not sure how much value we will add by having such a test.  In
> general, it is good to have tests that cover various aspects of
> functionality, but OTOH, we have to be careful to not overdo it.

Actually I am thinking, it's not a big deal adding a RAISE statement
in trigger function in the existing testcases. It will clearly show how
many times the trigger has executed. So I will go ahead and do that.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Dilip Kumar
Дата:
On Mon, Jun 18, 2018 at 6:19 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> Should we also create a test case where we can verify that some
>>> unnecessary or duplicate triggers are not executed?
>>>
>>
>> I am not sure how much value we will add by having such a test.  In
>> general, it is good to have tests that cover various aspects of
>> functionality, but OTOH, we have to be careful to not overdo it.
>
> Actually I am thinking, it's not a big deal adding a RAISE statement
> in trigger function in the existing testcases. It will clearly show how
> many times the trigger has executed. So I will go ahead and do that.

Ok,  That makes sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 19 June 2018 at 13:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Mon, Jun 18, 2018 at 6:19 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>>> Should we also create a test case where we can verify that some
>>>> unnecessary or duplicate triggers are not executed?
>>>>
>>>
>>> I am not sure how much value we will add by having such a test.  In
>>> general, it is good to have tests that cover various aspects of
>>> functionality, but OTOH, we have to be careful to not overdo it.
>>
>> Actually I am thinking, it's not a big deal adding a RAISE statement
>> in trigger function in the existing testcases. It will clearly show how
>> many times the trigger has executed. So I will go ahead and do that.
>
> Ok,  That makes sense to me.

Could not add RAISE statement, because the isolation test does not
seem to print those statements in the output. Instead, I have dumped
some rows in a new table through the trigger function, and did a
select from that table. Attached is v3 patch.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> Could not add RAISE statement, because the isolation test does not
> seem to print those statements in the output. Instead, I have dumped
> some rows in a new table through the trigger function, and did a
> select from that table. Attached is v3 patch.
>

Comments
-----------------
1.
+ /*
+ * If the tuple was concurrently updated and the caller of this
+ * function requested for the updated tuple, skip the trigger
+ * execution.
+ */
+ if (newSlot != NULL && epqslot != NULL)
+ return false;

There is a possibility of memory leak due to above change.  Basically,
GetTupleForTrigger returns a newly allocated tuple.  We should free
the triggertuple by calling heap_freetuple(trigtuple).

2.
ExecBRDeleteTriggers(..)
{
..
+ /* If requested, pass back the concurrently updated tuple, if any. */
+ if (epqslot != NULL)
+ *epqslot = newSlot;
+
  if (trigtuple == NULL)
  return false;
+
+ /*
+ * If the tuple was concurrently updated and the caller of this
+ * function requested for the updated tuple, skip the trigger
+ * execution.
+ */
+ if (newSlot != NULL && epqslot != NULL)
+ return false;
..
}

Can't we merge the first change into second, something like:

if (newSlot != NULL && epqslot != NULL)
{
*epqslot = newSlot;
return false;
}

3.
ExecBRDeleteTriggers(..)
{
- TupleTableSlot *newSlot;
  int i;

  Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
  if (fdw_trigtuple == NULL)
  {
+ TupleTableSlot *newSlot;
+
..
}

This change is okay on its own, but I think it has nothing to do with
this patch. If you don't mind, can we remove it?

4.
+/* ----------
+ * ExecBRDeleteTriggers()
+ *
+ * Called to execute BEFORE ROW DELETE triggers.
+ *
+ * Returns false in following scenarios :
+ * 1. Trigger function returned NULL.
+ * 2. The tuple was concurrently deleted OR it was concurrently updated and the
+ * new tuple didn't pass EvalPlanQual() test.
+ * 3. The tuple was concurrently updated and the new tuple passed the
+ * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
+ * function skips the trigger function execution, because the caller has
+ * indicated that it wants to further process the updated tuple. The updated
+ * tuple slot is passed back through epqslot.
+ *
+ * In all other cases, returns true.
+ * ----------
+ */
 bool
 ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,

The above comments appear unrelated to this function, example, this
function is not at all aware of concurrent updates.  I think if you
want to add comments to this function, we can add them as a separate
patch or if you want to add them as part of this patch at least make
them succinct.

5.
+ /*
+ * If this is part of an UPDATE of partition-key, the
+ * epq tuple will contain the changes from this
+ * transaction over and above the updates done by the
+ * other transaction. The caller should now use this
+ * tuple as its NEW tuple, rather than the earlier NEW
+ * tuple.
+ */

I think we can simplify this comment as "If requested, pass back the
updated tuple if any.", something similar to what you have at some
other place in the patch.

6.
+# Session A is moving a row into another partition, but is waiting for
+# another session B that is updating the original row. The row that ends up
+# in the new partition should contain the changes made by session B.

You have named sessions as s1 and s2, but description uses A and B, it
will be better to use s1 and s2 respectively.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 20 June 2018 at 18:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>
>> Could not add RAISE statement, because the isolation test does not
>> seem to print those statements in the output. Instead, I have dumped
>> some rows in a new table through the trigger function, and did a
>> select from that table. Attached is v3 patch.
>>
>
> Comments
> -----------------
> 1.
> + /*
> + * If the tuple was concurrently updated and the caller of this
> + * function requested for the updated tuple, skip the trigger
> + * execution.
> + */
> + if (newSlot != NULL && epqslot != NULL)
> + return false;
>
> There is a possibility of memory leak due to above change.  Basically,
> GetTupleForTrigger returns a newly allocated tuple.  We should free
> the triggertuple by calling heap_freetuple(trigtuple).

Right, done.

>
> 2.
> ExecBRDeleteTriggers(..)
> {
> ..
> + /* If requested, pass back the concurrently updated tuple, if any. */
> + if (epqslot != NULL)
> + *epqslot = newSlot;
> +
>   if (trigtuple == NULL)
>   return false;
> +
> + /*
> + * If the tuple was concurrently updated and the caller of this
> + * function requested for the updated tuple, skip the trigger
> + * execution.
> + */
> + if (newSlot != NULL && epqslot != NULL)
> + return false;
> ..
> }
>
> Can't we merge the first change into second, something like:
>
> if (newSlot != NULL && epqslot != NULL)
> {
> *epqslot = newSlot;
> return false;
> }

We want to update *epqslot with whatever the value of newSlot is. So
if newSlot is NULL, epqslot should be NULL. So we can't do that in the
"if (newSlot != NULL && epqslot != NULL)" condition.

>
> 3.
> ExecBRDeleteTriggers(..)
> {
> - TupleTableSlot *newSlot;
>   int i;
>
>   Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
>   if (fdw_trigtuple == NULL)
>   {
> + TupleTableSlot *newSlot;
> +
> ..
> }
>
> This change is okay on its own, but I think it has nothing to do with
> this patch. If you don't mind, can we remove it?

Done.

>
> 4.
> +/* ----------
> + * ExecBRDeleteTriggers()
> + *
> + * Called to execute BEFORE ROW DELETE triggers.
> + *
> + * Returns false in following scenarios :
> + * 1. Trigger function returned NULL.
> + * 2. The tuple was concurrently deleted OR it was concurrently updated and the
> + * new tuple didn't pass EvalPlanQual() test.
> + * 3. The tuple was concurrently updated and the new tuple passed the
> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
> + * function skips the trigger function execution, because the caller has
> + * indicated that it wants to further process the updated tuple. The updated
> + * tuple slot is passed back through epqslot.
> + *
> + * In all other cases, returns true.
> + * ----------
> + */
>  bool
>  ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>
> The above comments appear unrelated to this function, example, this
> function is not at all aware of concurrent updates.

But with the patch, this function does become aware of concurrent
updates, because it makes use of the epqslot returned by
GetTupleForTrigger, This is similar to ExecBRUpdateTriggers() being
aware of concurrent updates.

The reason I made the return-value-related comment is because earlier
it was simple : If trigger function returned NULL, this function
returns false. But now that has changed, so its better if the change
is noted in the function header. And on HEAD, there is no function
header, so we need to explain when exactly this function returns true
or false.

> I think if you want to add comments to this function, we can add them as a separate
> patch or if you want to add them as part of this patch at least make
> them succinct.

If it looks complicated, can you please suggest something to make it a
bit simpler.

>
> 5.
> + /*
> + * If this is part of an UPDATE of partition-key, the
> + * epq tuple will contain the changes from this
> + * transaction over and above the updates done by the
> + * other transaction. The caller should now use this
> + * tuple as its NEW tuple, rather than the earlier NEW
> + * tuple.
> + */
>
> I think we can simplify this comment as "If requested, pass back the
> updated tuple if any.", something similar to what you have at some
> other place in the patch.

Ok, Done. Also changed the subsequent comment to :
/* Normal DELETE: So delete the updated row. */


>
> 6.
> +# Session A is moving a row into another partition, but is waiting for
> +# another session B that is updating the original row. The row that ends up
> +# in the new partition should contain the changes made by session B.
>
> You have named sessions as s1 and s2, but description uses A and B, it
> will be better to use s1 and s2 respectively.

Done.

Attached is v4 version of the patch. Thanks !

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 20 June 2018 at 18:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>
>> 2.
>> ExecBRDeleteTriggers(..)
>> {
>> ..
>> + /* If requested, pass back the concurrently updated tuple, if any. */
>> + if (epqslot != NULL)
>> + *epqslot = newSlot;
>> +
>>   if (trigtuple == NULL)
>>   return false;
>> +
>> + /*
>> + * If the tuple was concurrently updated and the caller of this
>> + * function requested for the updated tuple, skip the trigger
>> + * execution.
>> + */
>> + if (newSlot != NULL && epqslot != NULL)
>> + return false;
>> ..
>> }
>>
>> Can't we merge the first change into second, something like:
>>
>> if (newSlot != NULL && epqslot != NULL)
>> {
>> *epqslot = newSlot;
>> return false;
>> }
>
> We want to update *epqslot with whatever the value of newSlot is. So
> if newSlot is NULL, epqslot should be NULL. So we can't do that in the
> "if (newSlot != NULL && epqslot != NULL)" condition.
>

Why do you need to update when newslot is NULL?  Already *epqslot is
initialized as NULL in the caller (ExecUpdate). I think we only want
to update it when trigtuple is not null.  Apart from looking bit
awkward, I think it is error-prone as well because the code of
GetTupleForTrigger is such that it can return NULL with a valid value
of newSlot in which case the behavior will become undefined. The case
which I am worried is when first-time EvalPlanQual returns some valid
value of epqslot, but in the next execution after heap_lock_tuple,
returns NULL.  In practise, it won't happen because EvalPlanQual locks
the tuple, so after that heap_lock_tuple shouldn't fail again, but it
seems prudent to not rely on that unless we need to.

For now, I have removed it in the attached patch, but if you have any
valid reason, then we can add back.

>>
>> 4.
>> +/* ----------
>> + * ExecBRDeleteTriggers()
>> + *
>> + * Called to execute BEFORE ROW DELETE triggers.
>> + *
>> + * Returns false in following scenarios :
>> + * 1. Trigger function returned NULL.
>> + * 2. The tuple was concurrently deleted OR it was concurrently updated and the
>> + * new tuple didn't pass EvalPlanQual() test.
>> + * 3. The tuple was concurrently updated and the new tuple passed the
>> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
>> + * function skips the trigger function execution, because the caller has
>> + * indicated that it wants to further process the updated tuple. The updated
>> + * tuple slot is passed back through epqslot.
>> + *
>> + * In all other cases, returns true.
>> + * ----------
>> + */
>>  bool
>>  ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>>
..
>
> If it looks complicated, can you please suggest something to make it a
> bit simpler.
>

See attached.

Apart from this, I have changed few comments and fixed indentation
issues.  Let me know what you think about attached?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Alvaro Herrera
Дата:
Would you wait a little bit before pushing this?  I'd like to give this
a read too.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Sat, Jun 23, 2018 at 8:25 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Would you wait a little bit before pushing this?
>

Sure.

>  I'd like to give this
> a read too.
>

That would be great.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 23 June 2018 at 15:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 20 June 2018 at 18:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>>
>>> 2.
>>> ExecBRDeleteTriggers(..)
>>> {
>>> ..
>>> + /* If requested, pass back the concurrently updated tuple, if any. */
>>> + if (epqslot != NULL)
>>> + *epqslot = newSlot;
>>> +
>>>   if (trigtuple == NULL)
>>>   return false;
>>> +
>>> + /*
>>> + * If the tuple was concurrently updated and the caller of this
>>> + * function requested for the updated tuple, skip the trigger
>>> + * execution.
>>> + */
>>> + if (newSlot != NULL && epqslot != NULL)
>>> + return false;
>>> ..
>>> }
>>>
>>> Can't we merge the first change into second, something like:
>>>
>>> if (newSlot != NULL && epqslot != NULL)
>>> {
>>> *epqslot = newSlot;
>>> return false;
>>> }
>>
>> We want to update *epqslot with whatever the value of newSlot is. So
>> if newSlot is NULL, epqslot should be NULL. So we can't do that in the
>> "if (newSlot != NULL && epqslot != NULL)" condition.
>>
>
> Why do you need to update when newslot is NULL?  Already *epqslot is
> initialized as NULL in the caller (ExecUpdate). I think we only want
> to update it when trigtuple is not null.

But GetTupleForTrigger() updates the epqslot to NULL even when it
returns NULL. So to be consistent with it, we want to do the same
thing for ExecBRDeleteTriggers() : Update the epqslot even when
GetTupleForTrigger() returns NULL.

I think the reason we are doing "*newSlot = NULL" as the very first
thing in the GetTupleForTrigger() code, is so that callers don't have
to initialise newSlot to NULL before calling GetTupleForTrigger. And
so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
can follow the same approach while calling ExecDelete().

I understand that before calling ExecDelete() epqslot is initialized
to NULL, so it is again not required to do the same inside
GetTupleForTrigger(), but as per my above explanation, it is actually
not necessary to initialize to NULL before calling ExecDelete(). So I
can even remove that initialization.

> Apart from looking bit
> awkward, I think it is error-prone as well because the code of
> GetTupleForTrigger is such that it can return NULL with a valid value
> of newSlot in which case the behavior will become undefined. The case
> which I am worried is when first-time EvalPlanQual returns some valid
> value of epqslot, but in the next execution after heap_lock_tuple,
> returns NULL.  In practise, it won't happen because EvalPlanQual locks
> the tuple, so after that heap_lock_tuple shouldn't fail again, but it
> seems prudent to not rely on that unless we need to.

Yes, I had given a thought on exactly this. I think the intention of
the code is to pass back NULL epqslot when there is no concurrently
updated tuple. I think the code in GetTupleForTrigger() is well aware
that EvalPlanQual() will never be called twice. The only reason it
goes back to ltrmark is to re-fetch the locked tuple. The comment also
says so :
/*
* EvalPlanQual already locked the tuple, but we
* re-call heap_lock_tuple anyway as an easy way of
* re-fetching the correct tuple.  Speed is hardly a
* criterion in this path anyhow.
*/

So newSlot is supposed to be updated only once.



>
> For now, I have removed it in the attached patch, but if you have any
> valid reason, then we can add back.
>
>>>
>>> 4.
>>> +/* ----------
>>> + * ExecBRDeleteTriggers()
>>> + *
>>> + * Called to execute BEFORE ROW DELETE triggers.
>>> + *
>>> + * Returns false in following scenarios :
>>> + * 1. Trigger function returned NULL.
>>> + * 2. The tuple was concurrently deleted OR it was concurrently updated and the
>>> + * new tuple didn't pass EvalPlanQual() test.
>>> + * 3. The tuple was concurrently updated and the new tuple passed the
>>> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
>>> + * function skips the trigger function execution, because the caller has
>>> + * indicated that it wants to further process the updated tuple. The updated
>>> + * tuple slot is passed back through epqslot.
>>> + *
>>> + * In all other cases, returns true.
>>> + * ----------
>>> + */
>>>  bool
>>>  ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>>>
> ..
>>
>> If it looks complicated, can you please suggest something to make it a
>> bit simpler.
>>
>
> See attached.
>
> Apart from this, I have changed few comments and fixed indentation
> issues.  Let me know what you think about attached?

Yes, the changes look good, except for the above one point on which we
haven't concluded yet.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 23 June 2018 at 15:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>
>> Why do you need to update when newslot is NULL?  Already *epqslot is
>> initialized as NULL in the caller (ExecUpdate). I think we only want
>> to update it when trigtuple is not null.
>
> But GetTupleForTrigger() updates the epqslot to NULL even when it
> returns NULL. So to be consistent with it, we want to do the same
> thing for ExecBRDeleteTriggers() : Update the epqslot even when
> GetTupleForTrigger() returns NULL.
>
> I think the reason we are doing "*newSlot = NULL" as the very first
> thing in the GetTupleForTrigger() code, is so that callers don't have
> to initialise newSlot to NULL before calling GetTupleForTrigger. And
> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
> can follow the same approach while calling ExecDelete().
>

Yeah, we can do that if it is required.  I see your point, but I feel
that is making code bit less readable.

> I understand that before calling ExecDelete() epqslot is initialized
> to NULL, so it is again not required to do the same inside
> GetTupleForTrigger(), but as per my above explanation, it is actually
> not necessary to initialize to NULL before calling ExecDelete(). So I
> can even remove that initialization.
>

If you remove that initialization, then won't you need to do something
in ExecDelete() as well because there the patch assigns a value to
epqslot only if EvalPlanQual returns non-null value?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 25 June 2018 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 23 June 2018 at 15:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>
>>> Why do you need to update when newslot is NULL?  Already *epqslot is
>>> initialized as NULL in the caller (ExecUpdate). I think we only want
>>> to update it when trigtuple is not null.
>>
>> But GetTupleForTrigger() updates the epqslot to NULL even when it
>> returns NULL. So to be consistent with it, we want to do the same
>> thing for ExecBRDeleteTriggers() : Update the epqslot even when
>> GetTupleForTrigger() returns NULL.
>>
>> I think the reason we are doing "*newSlot = NULL" as the very first
>> thing in the GetTupleForTrigger() code, is so that callers don't have
>> to initialise newSlot to NULL before calling GetTupleForTrigger. And
>> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
>> can follow the same approach while calling ExecDelete().
>>
>
> Yeah, we can do that if it is required.

It is not required as such, and there is no correctness issue.

> I see your point, but I feel that is making code bit less readable.

I did it that way only to be consistent with the existing trigger.c
code, namely ExecBRUpdateTriggers() where it sets newSlot to NULL
immediately. If you find some appropriate comments to make that
snippet more readable, that can work.

Or else, we can go ahead with the way you did it in your patch; and
additionally mention in the ExecBRDeleteTriggers() header comments
that epqslot value is undefined if there was no concurrently updated
tuple passed. To me, explaining this last part seems confusing.

>
>> I understand that before calling ExecDelete() epqslot is initialized
>> to NULL, so it is again not required to do the same inside
>> GetTupleForTrigger(), but as per my above explanation, it is actually
>> not necessary to initialize to NULL before calling ExecDelete(). So I
>> can even remove that initialization.
>>
>
> If you remove that initialization, then won't you need to do something
> in ExecDelete() as well because there the patch assigns a value to
> epqslot only if EvalPlanQual returns non-null value?

Ah, right. So skipping the initialization before calling ExecDelete()
is not a good idea then.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Tue, Jun 26, 2018 at 11:40 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 25 June 2018 at 17:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> On 23 June 2018 at 15:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>>
>>>>
>>>> Why do you need to update when newslot is NULL?  Already *epqslot is
>>>> initialized as NULL in the caller (ExecUpdate). I think we only want
>>>> to update it when trigtuple is not null.
>>>
>>> But GetTupleForTrigger() updates the epqslot to NULL even when it
>>> returns NULL. So to be consistent with it, we want to do the same
>>> thing for ExecBRDeleteTriggers() : Update the epqslot even when
>>> GetTupleForTrigger() returns NULL.
>>>
>>> I think the reason we are doing "*newSlot = NULL" as the very first
>>> thing in the GetTupleForTrigger() code, is so that callers don't have
>>> to initialise newSlot to NULL before calling GetTupleForTrigger. And
>>> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
>>> can follow the same approach while calling ExecDelete().
>>>
>>
>> Yeah, we can do that if it is required.
>
> It is not required as such, and there is no correctness issue.
>
>> I see your point, but I feel that is making code bit less readable.
>
> I did it that way only to be consistent with the existing trigger.c
> code, namely ExecBRUpdateTriggers() where it sets newSlot to NULL
> immediately. If you find some appropriate comments to make that
> snippet more readable, that can work.
>
> Or else, we can go ahead with the way you did it in your patch; and
> additionally mention in the ExecBRDeleteTriggers() header comments
> that epqslot value is undefined if there was no concurrently updated
> tuple passed. To me, explaining this last part seems confusing.
>

Yeah, so let's leave it as it is in the patch.  I think now we should
wait and see what Alvaro has to say about the overall patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 26 June 2018 at 17:53, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Yeah, so let's leave it as it is in the patch.

Ok.

> I think now we should wait and see what Alvaro has to say about the overall patch.

Yeah, that's very good that Alvaro is also having a look at this.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Alvaro Herrera
Дата:
I was a bit surprised by the new epqslot output argument being added,
and now I think I know why: we already have es_trig_tuple_slot, so
shouldn't we be using that here instead?  Seems like it'd all be simpler ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Fri, Jun 29, 2018 at 11:22 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> I was a bit surprised by the new epqslot output argument being added,
> and now I think I know why: we already have es_trig_tuple_slot, so
> shouldn't we be using that here instead?  Seems like it'd all be simpler ...
>

Hmm, maybe, but not sure if it will be simpler.  The point is that we
don't need to always return the epqslot, it will only be returned for
the special case, so you might need to use an additional boolean
variable to indicate when to fill the epqslot or someway indicate the
same via es_trig_tuple_slot.  I think even if we somehow do that, we
need to do something additional like taking tuple from epqslot and
store it in es_trig_tuple_slot as I am not sure if we can directly
assign the slot returned by EvalPlanQual to es_trig_tuple_slot.
OTOH, the approach used by Amit Khandekar seems somewhat better as you
can directly return the slot returned by EvalPlanQual in the output
parameter.  IIUC, the same technique is already used by
GetTupleForTrigger where it returns the epqslot in an additional
parameter.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 30 June 2018 at 19:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jun 29, 2018 at 11:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> I was a bit surprised by the new epqslot output argument being added,
>> and now I think I know why: we already have es_trig_tuple_slot, so
>> shouldn't we be using that here instead?  Seems like it'd all be simpler ...

es_trig_tuple_slot is already allocated in ExecInitModifyTable(). And
the slot returned by EvalPlanQual is a separately allocated tuple
slot. I didn't get how we can assign the epqslot to
estate->es_trig_tuple_slot. That would mean throwing away the already
allocated es_trig_tuple_slot. I believe, the es_trig_tuple_slot
variable is not used for assigning already allocated slots to it.

>>
>
> Hmm, maybe, but not sure if it will be simpler.  The point is that we
> don't need to always return the epqslot, it will only be returned for
> the special case, so you might need to use an additional boolean
> variable to indicate when to fill the epqslot or someway indicate the
> same via es_trig_tuple_slot.  I think even if we somehow do that, we
> need to do something additional like taking tuple from epqslot and
> store it in es_trig_tuple_slot as I am not sure if we can directly
> assign the slot returned by EvalPlanQual to es_trig_tuple_slot.

Right, I think making use of es_trig_tuple_slot will cause redundancy
in our case, because epqslot is a separately allocated slot; so it
makes sense to pass it back separately.

> OTOH, the approach used by Amit Khandekar seems somewhat better as you
> can directly return the slot returned by EvalPlanQual in the output
> parameter.  IIUC, the same technique is already used by
> GetTupleForTrigger where it returns the epqslot in an additional
> parameter.
>



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Mon, Jul 2, 2018 at 5:06 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 30 June 2018 at 19:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Fri, Jun 29, 2018 at 11:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> I was a bit surprised by the new epqslot output argument being added,
>>> and now I think I know why: we already have es_trig_tuple_slot, so
>>> shouldn't we be using that here instead?  Seems like it'd all be simpler ...
>
> es_trig_tuple_slot is already allocated in ExecInitModifyTable(). And
> the slot returned by EvalPlanQual is a separately allocated tuple
> slot. I didn't get how we can assign the epqslot to
> estate->es_trig_tuple_slot. That would mean throwing away the already
> allocated es_trig_tuple_slot. I believe, the es_trig_tuple_slot
> variable is not used for assigning already allocated slots to it.
>
>>>
>>
>> Hmm, maybe, but not sure if it will be simpler.  The point is that we
>> don't need to always return the epqslot, it will only be returned for
>> the special case, so you might need to use an additional boolean
>> variable to indicate when to fill the epqslot or someway indicate the
>> same via es_trig_tuple_slot.  I think even if we somehow do that, we
>> need to do something additional like taking tuple from epqslot and
>> store it in es_trig_tuple_slot as I am not sure if we can directly
>> assign the slot returned by EvalPlanQual to es_trig_tuple_slot.
>
> Right, I think making use of es_trig_tuple_slot will cause redundancy
> in our case, because epqslot is a separately allocated slot; so it
> makes sense to pass it back separately.
>

Alvaro,

Can you please comment whether this addresses your concern?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Alvaro Herrera
Дата:
On 2018-Jul-09, Amit Kapila wrote:

> Alvaro,
> 
> Can you please comment whether this addresses your concern?

I was thinking that it would be a matter of passing the tuple slot to
EvalPlanQual for it to fill, rather than requiring it to fill some other
slot obtained from who-knows-where, but I realize now that that's nigh
impossible.  Thanks for the explanation and patience.

What bothers me about this whole business is how ExecBRDeleteTriggers
and ExecDelete are now completely different from their sibling routines,
but maybe there's no helping that.

Please move the output arguments at the end of argument lists; also, it
would be great if you add commentary about ExecDelete other undocumented
arguments (tupleDeleted in particular) while you're in the vicinity.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Wed, Jul 11, 2018 at 5:59 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2018-Jul-09, Amit Kapila wrote:
>
>> Alvaro,
>>
>> Can you please comment whether this addresses your concern?
>
> I was thinking that it would be a matter of passing the tuple slot to
> EvalPlanQual for it to fill, rather than requiring it to fill some other
> slot obtained from who-knows-where, but I realize now that that's nigh
> impossible.
>

Right, giving EvalPlanQual the responsibility to use the output slot
can easily turn into a huge work without much benefit.

>  Thanks for the explanation and patience.
>
> What bothers me about this whole business is how ExecBRDeleteTriggers
> and ExecDelete are now completely different from their sibling routines,
> but maybe there's no helping that.
>

Yeah, I think the differences started appearing when we decide to
overload ExecDelete for the usage of update-partition key, however,
the alternative would have been to write a new equivalent function
which can create a lot of code duplication.


> Please move the output arguments at the end of argument lists;

make sense.

> also, it
> would be great if you add commentary about ExecDelete other undocumented
> arguments (tupleDeleted in particular) while you're in the vicinity.
>

We already have some commentary in the caller of ExecDelete ("For some
reason if DELETE didn't happen ..."), but I think it will be clear if
we can add some comments atop function ExecDelete.  I will send the
updated patch shortly.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Wed, Jul 11, 2018 at 8:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 11, 2018 at 5:59 AM, Alvaro Herrera
>
>
>> Please move the output arguments at the end of argument lists;
>
> make sense.
>
>> also, it
>> would be great if you add commentary about ExecDelete other undocumented
>> arguments (tupleDeleted in particular) while you're in the vicinity.
>>
>
> We already have some commentary in the caller of ExecDelete ("For some
> reason if DELETE didn't happen ..."), but I think it will be clear if
> we can add some comments atop function ExecDelete.  I will send the
> updated patch shortly.
>

Attached, please find an updated patch based on comments by Alvaro.
See, if this looks okay to you guys.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Concurrency bug in UPDATE of partition-key

От
Amit Khandekar
Дата:
On 11 July 2018 at 09:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 11, 2018 at 8:56 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Jul 11, 2018 at 5:59 AM, Alvaro Herrera
>>
>>
>>> Please move the output arguments at the end of argument lists;
>>
>> make sense.
>>
>>> also, it
>>> would be great if you add commentary about ExecDelete other undocumented
>>> arguments (tupleDeleted in particular) while you're in the vicinity.
>>>
>>
>> We already have some commentary in the caller of ExecDelete ("For some
>> reason if DELETE didn't happen ..."), but I think it will be clear if
>> we can add some comments atop function ExecDelete.  I will send the
>> updated patch shortly.
>>
>
> Attached, please find an updated patch based on comments by Alvaro.
> See, if this looks okay to you guys.

Thanks for the patch. It looks good to me.

-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Concurrency bug in UPDATE of partition-key

От
Alvaro Herrera
Дата:
On 2018-Jul-11, Amit Kapila wrote:

> Attached, please find an updated patch based on comments by Alvaro.
> See, if this looks okay to you guys.

LGTM as far as my previous comments are concerned.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Concurrency bug in UPDATE of partition-key

От
Andres Freund
Дата:
On 2018-07-11 09:16:33 -0400, Alvaro Herrera wrote:
> On 2018-Jul-11, Amit Kapila wrote:
> 
> > Attached, please find an updated patch based on comments by Alvaro.
> > See, if this looks okay to you guys.
> 
> LGTM as far as my previous comments are concerned.

I see Amit pushed a patch here yesterday
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=40ca70ebcc9d0bc1c02937b27c44b2766617e790
, is there a need to keep the open item open, or is this the complete fix?

Greetings,

Andres Freund


Re: Concurrency bug in UPDATE of partition-key

От
Amit Kapila
Дата:
On Thu, Jul 12, 2018 at 10:14 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-07-11 09:16:33 -0400, Alvaro Herrera wrote:
>> On 2018-Jul-11, Amit Kapila wrote:
>>
>> > Attached, please find an updated patch based on comments by Alvaro.
>> > See, if this looks okay to you guys.
>>
>> LGTM as far as my previous comments are concerned.
>
> I see Amit pushed a patch here yesterday
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=40ca70ebcc9d0bc1c02937b27c44b2766617e790
> , is there a need to keep the open item open,
>

No, I have moved the item from the open issues list.  I was waiting
for the build farm, yesterday, it has shown some failure after this
commit, but it turns out to be some unrelated random failure.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com