Обсуждение: Minor bug affecting ON CONFLICT lock wait log messages

Поиск
Список
Период
Сортировка

Minor bug affecting ON CONFLICT lock wait log messages

От
Peter Geoghegan
Дата:
Attached patch fixes a bug reported privately by Stephen this morning.
He complained about deadlocking ON CONFLICT DO NOTHING statements.
There were no exclusion constraints involved, and yet they were
incorrectly indicated as being involved in log messages that related
to these deadlocks.

--
Peter Geoghegan

Вложения

Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Peter Geoghegan
Дата:
On Mon, Mar 7, 2016 at 1:46 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached patch fixes a bug reported privately by Stephen this morning.

Bump.

I would like to see this in the next point release. It shouldn't be
hard to review.

Thanks
-- 
Peter Geoghegan



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Julien Rouhaud
Дата:
On 15/03/2016 03:30, Peter Geoghegan wrote:
> On Mon, Mar 7, 2016 at 1:46 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Attached patch fixes a bug reported privately by Stephen this morning.
> 
> Bump.
> 
> I would like to see this in the next point release. It shouldn't be
> hard to review.
> 

+            reason_wait = indexInfo->ii_ExclusionOps ?
+                XLTW_RecheckExclusionConstr : XLTW_InsertIndex;

Shouldn't it be set to XLTW_InsertIndexUnique instead?


Otherwise the patch seems ok to me.

> Thanks
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Stephen Frost
Дата:
* Julien Rouhaud (julien.rouhaud@dalibo.com) wrote:
> On 15/03/2016 03:30, Peter Geoghegan wrote:
> > On Mon, Mar 7, 2016 at 1:46 PM, Peter Geoghegan <pg@heroku.com> wrote:
> >> Attached patch fixes a bug reported privately by Stephen this morning.
> >
> > Bump.
> >
> > I would like to see this in the next point release. It shouldn't be
> > hard to review.
> >
>
> +            reason_wait = indexInfo->ii_ExclusionOps ?
> +                XLTW_RecheckExclusionConstr : XLTW_InsertIndex;
>
> Shouldn't it be set to XLTW_InsertIndexUnique instead?

Actually, no, though I had the same question for Peter when I was first
reviewing this.

XLTW_InsertIndexUnique is used when building a unique index, but this is
just a check, and more to the point, it's actually a re-check of what
we're doing in nbinsert.c where we're already using XLTW_InsertIndex.

We wouldn't want to end up returning different error messages for the
same command under the same conditions just based, which is what we'd
potentially end up doing if we used XLTW_InsertIndexUnique here.

> Otherwise the patch seems ok to me.

Agreed.  I'm going to play with it a bit more but barring objections,
I'll commit and back-patch Peter's patch.

Thanks!

Stephen

Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Julien Rouhaud
Дата:
On 15/03/2016 14:18, Stephen Frost wrote:
> * Julien Rouhaud (julien.rouhaud@dalibo.com) wrote:
>> On 15/03/2016 03:30, Peter Geoghegan wrote:
>>> On Mon, Mar 7, 2016 at 1:46 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>>> Attached patch fixes a bug reported privately by Stephen this morning.
>>>
>>> Bump.
>>>
>>> I would like to see this in the next point release. It shouldn't be
>>> hard to review.
>>>
>>
>> +            reason_wait = indexInfo->ii_ExclusionOps ?
>> +                XLTW_RecheckExclusionConstr : XLTW_InsertIndex;
>>
>> Shouldn't it be set to XLTW_InsertIndexUnique instead?
> 
> Actually, no, though I had the same question for Peter when I was first
> reviewing this.
> 
> XLTW_InsertIndexUnique is used when building a unique index, but this is
> just a check, and more to the point, it's actually a re-check of what
> we're doing in nbinsert.c where we're already using XLTW_InsertIndex.
> 
> We wouldn't want to end up returning different error messages for the
> same command under the same conditions just based, which is what we'd
> potentially end up doing if we used XLTW_InsertIndexUnique here.
> 

Oh I see. Thanks for the explanation!

>> Otherwise the patch seems ok to me.
> 
> Agreed.  I'm going to play with it a bit more but barring objections,
> I'll commit and back-patch Peter's patch.
> 
> Thanks!
> 
> Stephen
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Alvaro Herrera
Дата:
Stephen Frost wrote:
> * Julien Rouhaud (julien.rouhaud@dalibo.com) wrote:

> XLTW_InsertIndexUnique is used when building a unique index, but this is
> just a check, and more to the point, it's actually a re-check of what
> we're doing in nbinsert.c where we're already using XLTW_InsertIndex.
> 
> We wouldn't want to end up returning different error messages for the
> same command under the same conditions just based, which is what we'd
> potentially end up doing if we used XLTW_InsertIndexUnique here.

Perhaps we need a new value in that enum, so that the context message is
specific to the situation at hand?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Stephen Frost
Дата:
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Julien Rouhaud (julien.rouhaud@dalibo.com) wrote:
>
> > XLTW_InsertIndexUnique is used when building a unique index, but this is
> > just a check, and more to the point, it's actually a re-check of what
> > we're doing in nbinsert.c where we're already using XLTW_InsertIndex.
> >
> > We wouldn't want to end up returning different error messages for the
> > same command under the same conditions just based, which is what we'd
> > potentially end up doing if we used XLTW_InsertIndexUnique here.
>
> Perhaps we need a new value in that enum, so that the context message is
> specific to the situation at hand?

Maybe, but that would require a new message and new translation and just
generally doesn't seem like something we'd want to back-patch for a
bugfix.  As such, if someone's interested, they could certainly propose
it to go into HEAD, but I'm not going to look at making those kinds of
changes for this bugfix.

Thanks!

Stephen

Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Peter Geoghegan
Дата:
On Tue, Mar 15, 2016 at 6:18 AM, Stephen Frost <sfrost@snowman.net> wrote:
> Agreed.  I'm going to play with it a bit more but barring objections,
> I'll commit and back-patch Peter's patch.

Thanks for taking care of this, Stephen.


-- 
Peter Geoghegan



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Peter Geoghegan
Дата:
On Tue, Mar 15, 2016 at 8:31 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> > We wouldn't want to end up returning different error messages for the
>> > same command under the same conditions just based, which is what we'd
>> > potentially end up doing if we used XLTW_InsertIndexUnique here.
>>
>> Perhaps we need a new value in that enum, so that the context message is
>> specific to the situation at hand?
>
> Maybe, but that would require a new message and new translation and just
> generally doesn't seem like something we'd want to back-patch for a
> bugfix.

Thinking about this again, I think we should use
XLTW_InsertIndexUnique after all. The resemblance of the
check_exclusion_or_unique_constraint() code to the nbtinsert.c code
seems only superficial on second thought. So, I propose fixing the fix
by changing XLTW_InsertIndex to XLTW_InsertIndexUnique.

Basically, unlike with the similar nbtinsert.c code, we're checking
someone else's tuple in the speculative insertion
check_exclusion_or_unique_constraint() case that was changed (or it's
an exclusion constraint, where even the check for our own tuple
happens only after insertion; no change there in any case). Whereas,
in the nbtinsert.c case that I incorrectly duplicated, we're
specifically indicating that we're waiting on *our own* already
physically inserted heap tuple, and say as much in the
XLTW_InsertIndex message that makes it into the log. So, the
fd658dbb300456b393536802d1145a9cea7b25d6 fix is wrong in that we now
indicate that the wait was on our own already-inserted tuple, and not
*someone else's* already-inserted tuple, as it should (we haven't
inserting anything in the first phase of speculative insertion, this
precheck). This code is not a do-over of the check in nbtinsert.c --
rather, the code in nbtinsert.c is a second phase do-over of this code
(where we've physically inserted a heap tuple + index tuple --
"speculative" though that is).

It seems fine to characterize a wait here as "checking the uniqueness
of [somebody else's] tuple", even though technically we're checking
the would-be uniqueness were we to (speculatively, or actually)
insert. However, it does not seem fine to claim ctid_wait as a tuple
we ourselves inserted.

Sorry about that. My confusion came from the fact that historically,
when check_exclusion_or_unique_constraint() was called
check_exclusion_constraint(), it (almost) was our own tuple that was
waited on.

-- 
Peter Geoghegan



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Stephen Frost
Дата:
Peter,

* Peter Geoghegan (pg@heroku.com) wrote:
> Thinking about this again, I think we should use
> XLTW_InsertIndexUnique after all. The resemblance of the
> check_exclusion_or_unique_constraint() code to the nbtinsert.c code
> seems only superficial on second thought. So, I propose fixing the fix
> by changing XLTW_InsertIndex to XLTW_InsertIndexUnique.

I'm not against changing it, but...

> Basically, unlike with the similar nbtinsert.c code, we're checking
> someone else's tuple in the speculative insertion
> check_exclusion_or_unique_constraint() case that was changed (or it's
> an exclusion constraint, where even the check for our own tuple
> happens only after insertion; no change there in any case). Whereas,
> in the nbtinsert.c case that I incorrectly duplicated, we're
> specifically indicating that we're waiting on *our own* already
> physically inserted heap tuple, and say as much in the
> XLTW_InsertIndex message that makes it into the log.

I don't quite follow how we're indicating that we're waiting on our own
already physically inserted heap tuple in the log?

> So, the
> fd658dbb300456b393536802d1145a9cea7b25d6 fix is wrong in that we now
> indicate that the wait was on our own already-inserted tuple, and not
> *someone else's* already-inserted tuple, as it should (we haven't
> inserting anything in the first phase of speculative insertion, this
> precheck). This code is not a do-over of the check in nbtinsert.c --
> rather, the code in nbtinsert.c is a second phase do-over of this code
> (where we've physically inserted a heap tuple + index tuple --
> "speculative" though that is).

One of the reasons I figured the XLTW_InsertIndex message made sense in
this case is that it'd provide a consistent result for users.
Specifically, I don't think it makes sense to produce different messages
based only on who got there first.  I don't mind having a different
message when there's something different happening (there's exclusion
constraints involved, or you're building an index, etc).

> It seems fine to characterize a wait here as "checking the uniqueness
> of [somebody else's] tuple", even though technically we're checking
> the would-be uniqueness were we to (speculatively, or actually)
> insert. However, it does not seem fine to claim ctid_wait as a tuple
> we ourselves inserted.

I don't think either message really fits here, unfortunately.  We're not
actually checking the uniqueness of someone else's tuple here either,
after all, we're waiting to see what happens with their tuple because
ours won't be unique if it goes in with that other tuple in place, if
I'm following along correctly.

One thing I can say is that the XLTW_InsertIndex at least matches the
*action* we're taking, which is that we're trying to INSERT.  While
having the ctid included in the message may be useful for debugging,
I've got doubts about including it in regular messages like this; we
certainly don't do that for the 'normal' INSERT case when we discover
a duplicate key, but that ship has sailed.

Ultimately, I guess I'm inclined to leave it as-committed.  If you
understand enough to realize what that pair of numbers after 'tuple'
means, you've probably found this thread and followed it enough to
understand what's happening.

I don't feel terribly strongly about that position and so if others
feel the XLTW_InsertIndexUnique message really would be better, I'd be
happy to commit the change.

Thanks!

Stephen

Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Peter Geoghegan
Дата:
On Mon, Mar 21, 2016 at 3:38 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> Basically, unlike with the similar nbtinsert.c code, we're checking
>> someone else's tuple in the speculative insertion
>> check_exclusion_or_unique_constraint() case that was changed (or it's
>> an exclusion constraint, where even the check for our own tuple
>> happens only after insertion; no change there in any case). Whereas,
>> in the nbtinsert.c case that I incorrectly duplicated, we're
>> specifically indicating that we're waiting on *our own* already
>> physically inserted heap tuple, and say as much in the
>> XLTW_InsertIndex message that makes it into the log.
>
> I don't quite follow how we're indicating that we're waiting on our own
> already physically inserted heap tuple in the log?

Because it's XLTW_InsertIndex in
check_exclusion_or_unique_constraint() now, this is the message:

case XLTW_InsertIndex:   cxt = gettext_noop("while inserting index tuple (%u,%u) in
relation \"%s\"");   break;

I guess what's at issue is whether or not it's okay that the (heap)
tuple TID shown (by this message) when waiting in
check_exclusion_or_unique_constraint() during ON CONFLICT is *not* our
own -- rather, it's the other guy's for this new use of
XLTW_InsertIndex. Does the message itself convey a contrary meaning,
or is it ambiguous in a way that supports either interpretation (i.e.
does that ambiguity allow us to leave things as they are)?

Certainly, this is unlike the nbtinsert.c XLTW_InsertIndex case (the
only other instance of XLTW_InsertIndex), where we must have
physically inserted already, and report specifically on our own
physically inserted TID when this message is shown. Does that
inconsistency alone make this wrong?

I think that XLTW_InsertIndex is intended to mean our own TID, based
on .1) the only precedent we have, and .2) based on the fact that
there is a distinct XLTW_InsertIndexUnique case at all.

BTW, just so the difference is clear, I point out that
XLTW_InsertIndexUnique (which I now propose to change
check_exclusion_or_unique_constraint() to use) says:

case XLTW_InsertIndexUnique:   cxt = gettext_noop("while checking uniqueness of tuple (%u,%u) in
relation \"%s\"");   break;

> One of the reasons I figured the XLTW_InsertIndex message made sense in
> this case is that it'd provide a consistent result for users.
> Specifically, I don't think it makes sense to produce different messages
> based only on who got there first.  I don't mind having a different
> message when there's something different happening (there's exclusion
> constraints involved, or you're building an index, etc).

I see your point. That's what I thought, initially.

>> It seems fine to characterize a wait here as "checking the uniqueness
>> of [somebody else's] tuple", even though technically we're checking
>> the would-be uniqueness were we to (speculatively, or actually)
>> insert. However, it does not seem fine to claim ctid_wait as a tuple
>> we ourselves inserted.
>
> I don't think either message really fits here, unfortunately.  We're not
> actually checking the uniqueness of someone else's tuple here either,
> after all, we're waiting to see what happens with their tuple because
> ours won't be unique if it goes in with that other tuple in place, if
> I'm following along correctly.

Well, we're determining if there was a conflict or not, in the first
phase of speculative insertion. That means that we need to see if an
update is required (for example). But that needs to behave just like
checking the uniqueness of an existing thing. It's just that our own
not-yet-physically-inserted "conceptual" tuple needs to be the thing
that makes this existing tuple (from some other xact) non-unique.

If that sounds weird, consider that in the standard, non-speculative
exclusion constraint case, the physical insertion actually has already
occurred, but even still we actually are waiting on the other guy's
tuple/xact (and report the other guy's TID, not our own, unlike with
nbtinsert.c). Notice that we make sure that it's another XID towards
the top of the loop within check_exclusion_or_unique_constraint()).

And so, its message says:

case XLTW_RecheckExclusionConstr:   cxt = gettext_noop("while checking exclusion constraint on tuple
(%u,%u) in relation \"%s\"");   break;

Of course, that this appeared for ON CONFLICT DO NOTHING with a B-Tree
index in 9.5.1 was wrong, but only because it said "exclusion
constraint" rather than "unique index". That's about what
XLTW_InsertIndexUnique says already, which is why I now think we
should just use XLTW_InsertIndexUnique.

> One thing I can say is that the XLTW_InsertIndex at least matches the
> *action* we're taking, which is that we're trying to INSERT.

Right, but I don't think that XLTW_InsertIndexUnique specifically
implies that we're not inserting, just as XLTW_RecheckExclusionConstr
does not specifically imply that we're not inserting (actually, we're
usually or always inserting with XLTW_RecheckExclusionConstr, so it
better not).

> I don't feel terribly strongly about that position and so if others
> feel the XLTW_InsertIndexUnique message really would be better, I'd be
> happy to commit the change.

I'd also like to hear other opinions, if any are to be had. Sorry that
I changed my mind, but it's a subtle issue, I'm sure you'll agree. I'm
not going to push on this, but I want to be sure that we're happy with
this.

To reiterate, I think it boils down to: Is it okay that this new
XLTW_InsertIndex case reports someone else's TID, while the only other
XLTW_InsertIndex case has always reported our own TID?

Discussing these sorts of "ontological" questions reminds me just how
painful UPSERT was as a project.  :-)

-- 
Peter Geoghegan



Re: Minor bug affecting ON CONFLICT lock wait log messages

От
Alvaro Herrera
Дата:
Stephen Frost wrote:

> I don't think either message really fits here, unfortunately.  We're not
> actually checking the uniqueness of someone else's tuple here either,
> after all, we're waiting to see what happens with their tuple because
> ours won't be unique if it goes in with that other tuple in place, if
> I'm following along correctly.

FWIW I think the translatability argument carries very little weight.
We add new messages to the catalog in minor releases every now and then.
We don't take it lightly of course, but avoiding so isn't a reason to
not fix a bug properly.

In this case, given the discussion, it seems to me that the right fix is
to create a new XLTW enum as I already mentioned, with a message
tailored to the specific case.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services