Обсуждение: SKIP LOCKED assert triggered

Поиск
Список
Период
Сортировка

SKIP LOCKED assert triggered

От
Simon Riggs
Дата:
The combination of these two statements in a transaction hits an
Assert in heapam.c at line 4770 on REL_14_STABLE

BEGIN;
SELECT * FROM queue LIMIT 1 FOR UPDATE SKIP LOCKED;
...
UPDATE queue SET status = 'UPDATED' WHERE id = :id;
COMMIT;

pgbench reliably finds this, running from inside a PL/pgSQL function.

Alvaro suggests we just ignore the Assert, which on testing appears to
be the right approach. Patch attached, which solves it for me.

There is no formal test that does lock then update, so I have
attempted to create one, but this has not successfully reproduced it
(attached anyway), but this code is different from the pgbench test.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Вложения

Re: SKIP LOCKED assert triggered

От
"Bossart, Nathan"
Дата:
On 11/12/21, 8:56 AM, "Simon Riggs" <simon.riggs@enterprisedb.com> wrote:
> The combination of these two statements in a transaction hits an
> Assert in heapam.c at line 4770 on REL_14_STABLE

I've been unable to reproduce this.  Do you have any tips for how to
do so?  Does there need to be some sort of concurrent workload?

Nathan


Re: SKIP LOCKED assert triggered

От
Simon Riggs
Дата:
On Wed, 1 Dec 2021 at 14:33, Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 11/12/21, 8:56 AM, "Simon Riggs" <simon.riggs@enterprisedb.com> wrote:
> > The combination of these two statements in a transaction hits an
> > Assert in heapam.c at line 4770 on REL_14_STABLE
>
> I've been unable to reproduce this.  Do you have any tips for how to
> do so?  Does there need to be some sort of concurrent workload?

That path is only ever taken when there are multiple sessions, and as
I said, pgbench finds this reliably. I guess I didn't say "use -c 2"

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: SKIP LOCKED assert triggered

От
Alvaro Herrera
Дата:
On 2021-Dec-01, Simon Riggs wrote:

> On Wed, 1 Dec 2021 at 14:33, Bossart, Nathan <bossartn@amazon.com> wrote:
> >
> > On 11/12/21, 8:56 AM, "Simon Riggs" <simon.riggs@enterprisedb.com> wrote:
> > > The combination of these two statements in a transaction hits an
> > > Assert in heapam.c at line 4770 on REL_14_STABLE
> >
> > I've been unable to reproduce this.  Do you have any tips for how to
> > do so?  Does there need to be some sort of concurrent workload?
> 
> That path is only ever taken when there are multiple sessions, and as
> I said, pgbench finds this reliably. I guess I didn't say "use -c 2"

Simon had sent me the pgbench scripts earlier, so I attach them here.
I don't actually get a crash with -c2 or -c3, but I do get almost
immediate crashes with -c4 and above.  If I run it under "rr", it
doesn't occur either.  I suspect the rr framework kills concurrency in
some way that hides the problem.  I didn't find a way to reproduce it
with isolationtester.  (If somebody wants to play with a debugger, I
find that it's much easier to reproduce by adding a short sleep after
the UpdateXmaxHintBits() call in line 4735; but that sleep occurs in a
session *other* than the one that dies.  And under rr I still don't see
a crash with a sleep there; in fact the sleep doesn't seem to occur at
all, which is weird.)

The patch does fix the crasher under pgbench, and I think it makes sense
that you can get WouldBlock and yet have the tuple marked with
XMAX_INVALID: if transaction A is writing the tuple, and transaction B
is acquiring the tuple lock, then transaction C also tries to acquire
the tuple lock but that returns nay (because of B), then transaction A
completes, then transaction B could set the XMAX_INVALID flag in time
for C to have a seizure in its way out.  So patching the assertion to
allow the case is correct.


What I don't understand is why hasn't this been reported already: this
bug is pretty old.  My only explanation is that nobody runs sufficiently-
concurrent load with SKIP LOCKED in assert-enabled builds.

[1]
https://www.postgresql.org/message-id/flat/CADLWmXUvd5Z%2BcFczi6Zj1WcTrXzipgP-wj0pZOWSaRUy%3DF0omQ%40mail.gmail.com

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/

Вложения

Re: SKIP LOCKED assert triggered

От
Alvaro Herrera
Дата:
On 2022-Jan-03, Alvaro Herrera wrote:

> What I don't understand is why hasn't this been reported already: this
> bug is pretty old.  My only explanation is that nobody runs sufficiently-
> concurrent load with SKIP LOCKED in assert-enabled builds.

Pushed, thanks Simon for reporting this problem.


> [1]
https://www.postgresql.org/message-id/flat/CADLWmXUvd5Z%2BcFczi6Zj1WcTrXzipgP-wj0pZOWSaRUy%3DF0omQ%40mail.gmail.com

Heh, I deleted a paragraph from my previous email and forgot to remove
the footnote that it referenced.

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)



Re: SKIP LOCKED assert triggered

От
Alvaro Herrera
Дата:
On 2022-Jan-04, Alvaro Herrera wrote:

> On 2022-Jan-03, Alvaro Herrera wrote:
> 
> > What I don't understand is why hasn't this been reported already: this
> > bug is pretty old.  My only explanation is that nobody runs sufficiently-
> > concurrent load with SKIP LOCKED in assert-enabled builds.
> 
> Pushed, thanks Simon for reporting this problem.

Wow, what an embarrassing problem this fix has.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)



Re: SKIP LOCKED assert triggered

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Pushed, thanks Simon for reporting this problem.

Umm ...

       Assert(TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

AFAICS, this assertion condition is constant-true,
because TM_WouldBlock is a nonzero constant.  Perhaps you meant

       Assert(result == TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

?

I'd be inclined to format it more like the adjacent Assert, too.

            regards, tom lane



Re: SKIP LOCKED assert triggered

От
Simon Riggs
Дата:
On Tue, 4 Jan 2022 at 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Pushed, thanks Simon for reporting this problem.

And causing another; my bad, apologies.

> Umm ...
>
>        Assert(TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));
>
> AFAICS, this assertion condition is constant-true,
> because TM_WouldBlock is a nonzero constant.  Perhaps you meant
>
>        Assert(result == TM_WouldBlock || !(tuple->t_data->t_infomask & HEAP_XMAX_INVALID));

Yes, I think that's what I meant.

-- 
Simon Riggs                http://www.EnterpriseDB.com/