Обсуждение: PANIC: wrong buffer passed to visibilitymap_clear

Поиск
Список
Период
Сортировка

PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT
once in the past two days.  The circumstances are not quite the same;
spurfowl's failure is in autovacuum while thorntail's is in a manual
VACUUM command.  Still, it seems clear that there's a recently-introduced
bug here somewhere.  I don't see any obvious candidate for the culprit,
though.  Any ideas?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2021-04-08%2010%3A22%3A08
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-04-09%2021%3A28%3A10



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Fri, Apr 9, 2021 at 3:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT
> once in the past two days.  The circumstances are not quite the same;
> spurfowl's failure is in autovacuum while thorntail's is in a manual
> VACUUM command.  Still, it seems clear that there's a recently-introduced
> bug here somewhere.  I don't see any obvious candidate for the culprit,
> though.  Any ideas?

They're both VACUUM ANALYZE. They must be, because the calls to
visibilitymap_clear PANIC (they don't ERROR) -- the failing
visibilitymap_clear() call must occur inside a critical section, and
all such calls are made within heapam.c (only VACUUM ANALYZE uses a
transaction and does writes). It cannot be the two calls to
visibilitymap_clear() inside vacuumlazy.c.

I suspect that you've figured this much already. Just pointing it out.

-- 
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Andres Freund
Дата:
Hi,

On 2021-04-09 18:40:27 -0400, Tom Lane wrote:
> Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT
> once in the past two days.  The circumstances are not quite the same;
> spurfowl's failure is in autovacuum while thorntail's is in a manual
> VACUUM command.  Still, it seems clear that there's a recently-introduced
> bug here somewhere.  I don't see any obvious candidate for the culprit,
> though.  Any ideas?

commit 7ab96cf6b312cfcd79cdc1a69c6bdb75de0ed30f
Author: Peter Geoghegan <pg@bowt.ie>
Date:   2021-04-06 07:49:39 -0700

    Refactor lazy_scan_heap() loop.

or some of the other changes in the vicinity could be related. There's
some changes when pages are marked as AllVisible, when their free space
is tracked etc.


Just looking at the code in heap_update: I'm a bit confused about
RelationGetBufferForTuple()'s vmbuffer and vmbuffer_other
arguments. It looks like it's not at all clear which of the two
arguments will have the vmbuffer for which of the pages?

        if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
            GetVisibilityMapPins(relation, buffer, otherBuffer,
                                 targetBlock, otherBlock, vmbuffer,
                                 vmbuffer_other);
        else
            GetVisibilityMapPins(relation, otherBuffer, buffer,
                                 otherBlock, targetBlock, vmbuffer_other,
                                 vmbuffer);

Which then would make any subsequent use of vmbuffer vs vmbuffer_new in
heap_update() bogus? Because clearly that code associates vmbuffer /
vmbuffer_new with the respective page?

    /* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
    if (PageIsAllVisible(BufferGetPage(buffer)))
    {
        all_visible_cleared = true;
        PageClearAllVisible(BufferGetPage(buffer));
        visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
                            vmbuffer, VISIBILITYMAP_VALID_BITS);
    }
    if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
    {
        all_visible_cleared_new = true;
        PageClearAllVisible(BufferGetPage(newbuf));
        visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
                            vmbuffer_new, VISIBILITYMAP_VALID_BITS);
    }

Greetings,

Andres Freund



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Andres Freund
Дата:
Hi,

On 2021-04-09 16:27:12 -0700, Peter Geoghegan wrote:
> They're both VACUUM ANALYZE. They must be, because the calls to
> visibilitymap_clear PANIC (they don't ERROR) -- the failing
> visibilitymap_clear() call must occur inside a critical section, and
> all such calls are made within heapam.c (only VACUUM ANALYZE uses a
> transaction and does writes). It cannot be the two calls to
> visibilitymap_clear() inside vacuumlazy.c.

There's a stacktrace at the bottom of the spurfowl report:

======-=-====== stack trace: pgsql.build/src/test/regress/tmp_check/data/core ======-=-======
[New LWP 24172]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker regression                                        '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f77a7967428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  0x00007f77a7967428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f77a796902a in __GI_abort () at abort.c:89
#2  0x000000000095cf8d in errfinish (filename=<optimized out>, filename@entry=0x9c3fa0 "visibilitymap.c",
lineno=lineno@entry=155,funcname=funcname@entry=0x9c41c0 <__func__.13853> "visibilitymap_clear") at elog.c:680
 
#3  0x0000000000501498 in visibilitymap_clear (rel=rel@entry=0x7f77a96d2d28, heapBlk=<optimized out>, buf=buf@entry=0,
flags=flags@entry=3'\\003') at visibilitymap.c:155
 
#4  0x00000000004e6380 in heap_update (relation=relation@entry=0x7f77a96d2d28, otid=otid@entry=0x2c0394c,
newtup=newtup@entry=0x2c03948,cid=0, crosscheck=crosscheck@entry=0x0, wait=wait@entry=true, tmfd=0x7ffe119d2c20,
lockmode=0x7ffe119d2c1c)at heapam.c:3993
 
#5  0x00000000004e7d70 in simple_heap_update (relation=relation@entry=0x7f77a96d2d28, otid=otid@entry=0x2c0394c,
tup=tup@entry=0x2c03948)at heapam.c:4211
 
#6  0x00000000005811a9 in CatalogTupleUpdate (heapRel=0x7f77a96d2d28, otid=0x2c0394c, tup=0x2c03948) at indexing.c:309
#7  0x00000000005efc32 in update_attstats (relid=16928, inh=inh@entry=false, natts=natts@entry=1,
vacattrstats=vacattrstats@entry=0x2b3c030)at analyze.c:1746
 
#8  0x00000000005f264a in update_attstats (vacattrstats=0x2b3c030, natts=1, inh=false, relid=<optimized out>) at
analyze.c:589
#9  do_analyze_rel (onerel=onerel@entry=0x7f77a95c1070, params=params@entry=0x2aba36c, va_cols=va_cols@entry=0x0,
acquirefunc=<optimizedout>, relpages=33, inh=inh@entry=false, in_outer_xact=false, elevel=13) at analyze.c:589
 
#10 0x00000000005f2d8d in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x2aba36c,
va_cols=0x0,in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:261
 
#11 0x0000000000671721 in vacuum (relations=0x2b492b8, params=params@entry=0x2aba36c, bstrategy=<optimized out>,
bstrategy@entry=0x2aba4e8,isTopLevel=isTopLevel@entry=true) at vacuum.c:478
 
#12 0x000000000048f02d in autovacuum_do_vac_analyze (bstrategy=0x2aba4e8, tab=0x2aba368) at autovacuum.c:3316
#13 do_autovacuum () at autovacuum.c:2537
#14 0x0000000000779d76 in AutoVacWorkerMain (argv=0x0, argc=0) at autovacuum.c:1715
#15 0x0000000000779e79 in StartAutoVacWorker () at autovacuum.c:1500
#16 0x0000000000788324 in StartAutovacuumWorker () at postmaster.c:5539
#17 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5243
#18 <signal handler called>
#19 0x00007f77a7a2f5b3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#20 0x0000000000788668 in ServerLoop () at postmaster.c:1701
#21 0x000000000078a187 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x2a408c0) at postmaster.c:1409
#22 0x0000000000490e48 in main (argc=8, argv=0x2a408c0) at main.c:209
$1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad = {24172, 1001, 0 <repeats 26 times>}, _kill =
{si_pid= 24172, si_uid = 1001}, _timer = {si_tid = 24172, si_overrun = 1001, si_sigval = {sival_int = 0, sival_ptr =
0x0}},_rt = {si_pid = 24172, si_uid = 1001, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 24172,
si_uid= 1001, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x3e900005e6c, _addr_lsb = 0,
_addr_bnd= {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 4299262287468, si_fd = 0}}}
 

Greetings,

Andres Freund



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Andres Freund
Дата:
On 2021-04-09 16:27:39 -0700, Andres Freund wrote:
> Just looking at the code in heap_update: I'm a bit confused about
> RelationGetBufferForTuple()'s vmbuffer and vmbuffer_other
> arguments. It looks like it's not at all clear which of the two
> arguments will have the vmbuffer for which of the pages?
> 
>         if (otherBuffer == InvalidBuffer || targetBlock <= otherBlock)
>             GetVisibilityMapPins(relation, buffer, otherBuffer,
>                                  targetBlock, otherBlock, vmbuffer,
>                                  vmbuffer_other);
>         else
>             GetVisibilityMapPins(relation, otherBuffer, buffer,
>                                  otherBlock, targetBlock, vmbuffer_other,
>                                  vmbuffer);

Oh, I missed that the arguments to GetVisibilityMapPins are
appropriately swapped too.

Greetings,

Andres Freund



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
I've managed to reproduce this locally, by dint of running the
src/bin/scripts tests over and over and tweaking the timing by
trying different "taskset" parameters to vary the number of CPUs
available.  I find that I duplicated the report from spurfowl,
particularly

(gdb) bt
#0  0x00007f67bb6807d5 in raise () from /lib64/libc.so.6
#1  0x00007f67bb669895 in abort () from /lib64/libc.so.6
#2  0x000000000094ce37 in errfinish (filename=<optimized out>,
    lineno=<optimized out>,
    funcname=0x9ac120 <__func__.1> "visibilitymap_clear") at elog.c:680
#3  0x0000000000488b8c in visibilitymap_clear (rel=rel@entry=0x7f67b2837330,
    heapBlk=<optimized out>, buf=buf@entry=0, flags=flags@entry=3 '\003')
                             ^^^^^^^^^^^^^^^
    at visibilitymap.c:155
#4  0x000000000055cd87 in heap_update (relation=0x7f67b2837330,
    otid=0x7f67b274744c, newtup=0x7f67b2747448, cid=<optimized out>,
    crosscheck=<optimized out>, wait=<optimized out>, tmfd=0x7ffecf4d5700,
    lockmode=0x7ffecf4d56fc) at heapam.c:3993
#5  0x000000000055dd61 in simple_heap_update (
    relation=relation@entry=0x7f67b2837330, otid=otid@entry=0x7f67b274744c,
    tup=tup@entry=0x7f67b2747448) at heapam.c:4211
#6  0x00000000005e531c in CatalogTupleUpdate (heapRel=0x7f67b2837330,
    otid=0x7f67b274744c, tup=0x7f67b2747448) at indexing.c:309
#7  0x00000000006420f9 in update_attstats (relid=1255, inh=false,
    natts=natts@entry=30, vacattrstats=vacattrstats@entry=0x19c9fc0)
    at analyze.c:1758
#8  0x00000000006430dd in update_attstats (vacattrstats=0x19c9fc0, natts=30,
    inh=false, relid=<optimized out>) at analyze.c:1646
#9  do_analyze_rel (onerel=<optimized out>, params=0x7ffecf4d5e50,
    va_cols=0x0, acquirefunc=<optimized out>, relpages=86,
    inh=<optimized out>, in_outer_xact=false, elevel=13) at analyze.c:589
#10 0x00000000006447a1 in analyze_rel (relid=<optimized out>,
    relation=<optimized out>, params=params@entry=0x7ffecf4d5e50, va_cols=0x0,
    in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:261
#11 0x00000000006a5718 in vacuum (relations=0x19c8158, params=0x7ffecf4d5e50,
    bstrategy=<optimized out>, isTopLevel=<optimized out>) at vacuum.c:478
#12 0x00000000006a5c94 in ExecVacuum (pstate=pstate@entry=0x1915970,
    vacstmt=vacstmt@entry=0x18ed5c8, isTopLevel=isTopLevel@entry=true)
    at vacuum.c:254
#13 0x000000000083c32c in standard_ProcessUtility (pstmt=0x18ed918,
    queryString=0x18eca20 "ANALYZE pg_catalog.pg_proc;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x18eda08, qc=0x7ffecf4d61c0) at utility.c:826

I'd not paid much attention to that point before, but now it
seems there is no question that heap_update is reaching line 3993

        visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
                            vmbuffer, VISIBILITYMAP_VALID_BITS);

without having changed "vmbuffer" from its initial value of
InvalidBuffer.  It looks that way both at frame 3 and frame 4:

(gdb) f 4
#4  0x000000000055cd87 in heap_update (relation=0x7f67b2837330,
    otid=0x7f67b274744c, newtup=0x7f67b2747448, cid=<optimized out>,
    crosscheck=<optimized out>, wait=<optimized out>, tmfd=0x7ffecf4d5700,
    lockmode=0x7ffecf4d56fc) at heapam.c:3993
3993                    visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
(gdb) i locals
...
vmbuffer = 0
vmbuffer_new = 0
...

It is also hard to doubt that somebody broke this in the last-minute
commit blizzard.  There are no reports of this PANIC in the buildfarm for
the last month, but we're now up to four (last I checked) since Thursday.

While the first thing that comes to mind is a logic bug in heap_update
itself, that code doesn't seem to have changed much in the last few days.
Moreover, why is it that this only seems to be happening within
do_analyze_rel -> update_attstats?  (We only have two stack traces
positively showing that, but all the buildfarm reports look like the
failure is happening within manual or auto analyze of a system catalog.
Fishy as heck.)

Just eyeing the evidence on hand, I'm wondering if something has decided
it can start setting the page-all-visible bit without adequate lock,
perhaps only in system catalogs.  heap_update is clearly assuming that
that flag won't change underneath it, and if it did, it's clear how this
symptom would ensue.

Too tired to take it further tonight ... discuss among yourselves.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Sat, Apr 10, 2021 at 10:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just eyeing the evidence on hand, I'm wondering if something has decided
> it can start setting the page-all-visible bit without adequate lock,
> perhaps only in system catalogs.  heap_update is clearly assuming that
> that flag won't change underneath it, and if it did, it's clear how this
> symptom would ensue.

Does this patch seem to fix the problem?

-- 
Peter Geoghegan

Вложения

Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Apr 10, 2021 at 10:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just eyeing the evidence on hand, I'm wondering if something has decided
>> it can start setting the page-all-visible bit without adequate lock,
>> perhaps only in system catalogs.  heap_update is clearly assuming that
>> that flag won't change underneath it, and if it did, it's clear how this
>> symptom would ensue.

> Does this patch seem to fix the problem?

Hmm ... that looks pretty suspicious, I agree, but why wouldn't an
exclusive buffer lock be enough to prevent concurrency with heap_update?

(I have zero faith in being able to show that this patch fixes the
problem by testing, given how hard it is to reproduce.  We need to
convince ourselves that this is a fix by logic.)

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Sun, Apr 11, 2021 at 8:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Does this patch seem to fix the problem?
>
> Hmm ... that looks pretty suspicious, I agree, but why wouldn't an
> exclusive buffer lock be enough to prevent concurrency with heap_update?

I don't have any reason to believe that using a super-exclusive lock
during heap page vacuuming is necessary. My guess is that returning to
doing it that way might make the buildfarm green again. That would at
least confirm my suspicion that this code is relevant. The
super-exclusive lock might have been masking the problem for a long
time.

How about temporarily committing this patch, just to review how it
affects the buildfarm?

-- 
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Sun, Apr 11, 2021 at 9:10 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I don't have any reason to believe that using a super-exclusive lock
> during heap page vacuuming is necessary. My guess is that returning to
> doing it that way might make the buildfarm green again. That would at
> least confirm my suspicion that this code is relevant. The
> super-exclusive lock might have been masking the problem for a long
> time.

This isn't just any super-exclusive lock, either -- we were calling
ConditionalLockBufferForCleanup() at this point.

I now think that there is a good chance that we are seeing these
symptoms because the "conditional-ness" went away -- we accidentally
relied on that.

-- 
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> This isn't just any super-exclusive lock, either -- we were calling
> ConditionalLockBufferForCleanup() at this point.

> I now think that there is a good chance that we are seeing these
> symptoms because the "conditional-ness" went away -- we accidentally
> relied on that.

Ah, I see it.  In the fragment of heap_update where we have to do some
TOAST work (starting at line 3815) we transiently *release our lock*
on the old tuple's page.  Unlike the earlier fragments that did that,
this code path has no provision for rechecking whether the page has
become all-visible, so if that does happen while we're without the
lock then we lose.  (It does look like RelationGetBufferForTuple
knows about updating vmbuffer, but there's one code path through the
if-nest at 3850ff that doesn't call that.)

So the previous coding in vacuumlazy didn't tickle this because it would
only set the all-visible bit on a page it had superexclusive lock on;
that is, continuing to hold the pin was sufficient.  Nonetheless, if
four out of five paths through heap_update take care of this matter,
I'd say it's heap_update's bug not vacuumlazy's bug that the fifth path
doesn't.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
I wrote:
> (It does look like RelationGetBufferForTuple
> knows about updating vmbuffer, but there's one code path through the
> if-nest at 3850ff that doesn't call that.)

Although ... isn't RelationGetBufferForTuple dropping the ball on this
point too, in the code path at the end where it has to extend the relation?

I'm now inclined to think that we should toss every single line of that
code, take RelationGetBufferForTuple out of the equation, and have just
*one* place that rechecks for PageAllVisible having just become set.
It's a rare enough case that optimizing it is completely not worth the
code complexity and risk (er, reality) of hard-to-locate bugs.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
I wrote:
> I'm now inclined to think that we should toss every single line of that
> code, take RelationGetBufferForTuple out of the equation, and have just
> *one* place that rechecks for PageAllVisible having just become set.
> It's a rare enough case that optimizing it is completely not worth the
> code complexity and risk (er, reality) of hard-to-locate bugs.

Alternatively, we could do what you suggested and redefine things
so that one is only allowed to set the all-visible bit while holding
superexclusive lock; which again would allow an enormous simplification
in heap_update and cohorts.  Either way, it's hard to argue that
heap_update hasn't crossed the complexity threshold where it's
impossible to maintain safely.  We need to simplify it.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Sun, Apr 11, 2021 at 10:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alternatively, we could do what you suggested and redefine things
> so that one is only allowed to set the all-visible bit while holding
> superexclusive lock; which again would allow an enormous simplification
> in heap_update and cohorts.

Great detective work.

I would rather not go back to requiring a superexclusive lock in
vacuumlazy.c (outside of pruning), actually -- I was only pointing out
that that had changed, and was likely to be relevant. It wasn't a real
proposal.

I think that it would be hard to justify requiring a super-exclusive
lock just to call PageSetAllVisible(). PD_ALL_VISIBLE is fundamentally
redundant information, so somehow it feels like the wrong design.

> Either way, it's hard to argue that
> heap_update hasn't crossed the complexity threshold where it's
> impossible to maintain safely.  We need to simplify it.

It is way too complicated. I don't think that I quite understand your
first proposal right now, so I'll need to go think about it.

-- 
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, Apr 11, 2021 at 10:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Either way, it's hard to argue that
>> heap_update hasn't crossed the complexity threshold where it's
>> impossible to maintain safely.  We need to simplify it.

> It is way too complicated. I don't think that I quite understand your
> first proposal right now, so I'll need to go think about it.

It wasn't very clear, because I hadn't thought it through very much;
but what I'm imagining is that we discard most of the thrashing around
all-visible rechecks and have just one such test somewhere very late
in heap_update, after we've successfully acquired a target buffer for
the update and are no longer going to possibly need to release any
buffer lock.  If at that one point we see the page is all-visible
and we don't have the vmbuffer, then we have to release all our locks
and go back to "l2".  Which is less efficient than some of the existing
code paths, but given how hard this problem is to reproduce, it seems
clear that optimizing for the occurrence is just not worth it.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Sun, Apr 11, 2021 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It wasn't very clear, because I hadn't thought it through very much;
> but what I'm imagining is that we discard most of the thrashing around
> all-visible rechecks and have just one such test somewhere very late
> in heap_update, after we've successfully acquired a target buffer for
> the update and are no longer going to possibly need to release any
> buffer lock.  If at that one point we see the page is all-visible
> and we don't have the vmbuffer, then we have to release all our locks
> and go back to "l2".  Which is less efficient than some of the existing
> code paths, but given how hard this problem is to reproduce, it seems
> clear that optimizing for the occurrence is just not worth it.

Oh! That sounds way better.

This reminds me of the tupgone case that I exorcised from vacuumlazy.c
(in the same commit that stopped using a superexclusive lock). It was
also described as an optimization that wasn't quite worth it. But I
don't quite buy that. ISTM that there is a better explanation: it
evolved the appearance of being an optimization that might make sense.
Which was just camouflage.

-- 
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, Apr 11, 2021 at 11:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It wasn't very clear, because I hadn't thought it through very much;
>> but what I'm imagining is that we discard most of the thrashing around
>> all-visible rechecks and have just one such test somewhere very late
>> in heap_update, after we've successfully acquired a target buffer for
>> the update and are no longer going to possibly need to release any
>> buffer lock.  If at that one point we see the page is all-visible
>> and we don't have the vmbuffer, then we have to release all our locks
>> and go back to "l2".  Which is less efficient than some of the existing
>> code paths, but given how hard this problem is to reproduce, it seems
>> clear that optimizing for the occurrence is just not worth it.

> Oh! That sounds way better.

After poking at this for awhile, it seems like it won't work very nicely.
The problem is that once we've invoked the toaster, we really don't want
to just abandon that work; we'd leak any toasted out-of-line data that
was created.

So I think we have to stick with the current basic design, and just
tighten things up to make sure that visibility pins are accounted for
in the places that are missing it.

Hence, I propose the attached.  It passes check-world, but that proves
absolutely nothing of course :-(.  I wonder if there is any way to
exercise these code paths deterministically.

(I have realized BTW that I was exceedingly fortunate to reproduce
the buildfarm report here --- I have run hundreds of additional
cycles of the same test scenario without getting a second failure.)

            regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 03d4abc938..265b31e981 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3784,7 +3784,7 @@ l2:
          * overhead would be unchanged, that doesn't seem necessarily
          * worthwhile.
          */
-        if (PageIsAllVisible(BufferGetPage(buffer)) &&
+        if (PageIsAllVisible(page) &&
             visibilitymap_clear(relation, block, vmbuffer,
                                 VISIBILITYMAP_ALL_FROZEN))
             cleared_all_frozen = true;
@@ -3846,36 +3846,46 @@ l2:
          * first".  To implement this, we must do RelationGetBufferForTuple
          * while not holding the lock on the old page, and we must rely on it
          * to get the locks on both pages in the correct order.
+         *
+         * Another consideration is that we need visibility map page pin(s) if
+         * we will have to clear the all-visible flag on either page.  If we
+         * call RelationGetBufferForTuple, we rely on it to acquire any such
+         * pins; but if we don't, we have to handle that here.  Hence we need
+         * a loop.
          */
-        if (newtupsize > pagefree)
-        {
-            /* Assume there's no chance to put heaptup on same page. */
-            newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
-                                               buffer, 0, NULL,
-                                               &vmbuffer_new, &vmbuffer);
-        }
-        else
+        for (;;)
         {
+            if (newtupsize > pagefree)
+            {
+                /* It doesn't fit, must use RelationGetBufferForTuple. */
+                newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
+                                                   buffer, 0, NULL,
+                                                   &vmbuffer_new, &vmbuffer);
+                /* We're all done. */
+                break;
+            }
+            /* Acquire VM page pin if needed and we don't have it. */
+            if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
+                visibilitymap_pin(relation, block, &vmbuffer);
             /* Re-acquire the lock on the old tuple's page. */
             LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
             /* Re-check using the up-to-date free space */
             pagefree = PageGetHeapFreeSpace(page);
-            if (newtupsize > pagefree)
+            if (newtupsize > pagefree ||
+                (vmbuffer == InvalidBuffer && PageIsAllVisible(page)))
             {
                 /*
-                 * Rats, it doesn't fit anymore.  We must now unlock and
-                 * relock to avoid deadlock.  Fortunately, this path should
-                 * seldom be taken.
+                 * Rats, it doesn't fit anymore, or somebody just now set the
+                 * all-visible flag.  We must now unlock and loop to avoid
+                 * deadlock.  Fortunately, this path should seldom be taken.
                  */
                 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-                newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
-                                                   buffer, 0, NULL,
-                                                   &vmbuffer_new, &vmbuffer);
             }
             else
             {
-                /* OK, it fits here, so we're done. */
+                /* We're all done. */
                 newbuf = buffer;
+                break;
             }
         }
     }
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 37a1be4114..ffc89685bf 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -293,9 +293,13 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
  *    happen if space is freed in that page after heap_update finds there's not
  *    enough there).  In that case, the page will be pinned and locked only once.
  *
- *    For the vmbuffer and vmbuffer_other arguments, we avoid deadlock by
- *    locking them only after locking the corresponding heap page, and taking
- *    no further lwlocks while they are locked.
+ *    We also handle the possibility that the all-visible flag will need to be
+ *    cleared on one or both pages.  If so, pin on the associated visibility map
+ *    page must be acquired before acquiring buffer lock(s), to avoid possibly
+ *    doing I/O while holding buffer locks.  The pins are passed back to the
+ *    caller using the input-output arguments vmbuffer and vmbuffer_other.
+ *    Note that in some cases the caller might have already acquired such pins,
+ *    which is indicated by these arguments not being InvalidBuffer on entry.
  *
  *    We normally use FSM to help us find free space.  However,
  *    if HEAP_INSERT_SKIP_FSM is specified, we just append a new empty page to
@@ -666,6 +670,8 @@ loop:
     if (otherBuffer != InvalidBuffer)
     {
         Assert(otherBuffer != buffer);
+        targetBlock = BufferGetBlockNumber(buffer);
+        Assert(targetBlock > otherBlock);

         if (unlikely(!ConditionalLockBuffer(otherBuffer)))
         {
@@ -674,10 +680,16 @@ loop:
             LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

             /*
-             * Because the buffer was unlocked for a while, it's possible,
-             * although unlikely, that the page was filled. If so, just retry
-             * from start.
+             * Because the buffers were unlocked for a while, it's possible,
+             * although unlikely, that an all-visible flag became set or that
+             * somebody used up the available space in the new page.  We can
+             * use GetVisibilityMapPins to deal with the first case.  In the
+             * second case, just retry from start.
              */
+            GetVisibilityMapPins(relation, otherBuffer, buffer,
+                                 otherBlock, targetBlock, vmbuffer_other,
+                                 vmbuffer);
+
             if (len > PageGetHeapFreeSpace(page))
             {
                 LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);

Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Mon, Apr 12, 2021 at 9:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I think we have to stick with the current basic design, and just
> tighten things up to make sure that visibility pins are accounted for
> in the places that are missing it.
>
> Hence, I propose the attached.  It passes check-world, but that proves
> absolutely nothing of course :-(.  I wonder if there is any way to
> exercise these code paths deterministically.

This approach seems reasonable to me. At least you've managed to
structure the visibility map page pin check as concomitant with the
existing space recheck.

> (I have realized BTW that I was exceedingly fortunate to reproduce
> the buildfarm report here --- I have run hundreds of additional
> cycles of the same test scenario without getting a second failure.)

In the past I've had luck with RR's chaos mode (most notably with the
Jepsen SSI bug). That didn't work for me here, though I might just
have not persisted with it for long enough. I should probably come up
with a shell script that runs the same thing hundreds of times or more
in chaos mode, while making sure that useless recordings don't
accumulate.

The feature is described here:

https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html

You only have to be lucky once. Once that happens, you're left with a
recording to review and re-review at your leisure. This includes all
Postgres backends, maybe even pg_regress and other scaffolding (if
that's what you're after).

But that's for debugging, not testing. The only way that we'll ever be
able to test stuff like this is with something like Alexander
Korotkov's stop events patch [1]. That infrastructure should be added
sooner rather than later.

[1] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp+Z++i4BGQoffKip6JDWngEA+g7Z-XmQ@mail.gmail.com
--
Peter Geoghegan



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Andres Freund
Дата:
Hi,

On 2021-04-11 13:55:30 -0400, Tom Lane wrote:
> Either way, it's hard to argue that heap_update hasn't crossed the
> complexity threshold where it's impossible to maintain safely.  We
> need to simplify it.

Yea, I think we're well beyond that point. I can see a few possible
steps to wrangle the existing complexity into an easier to understand
shape:

- Rename heapam.c goto labels, they're useless to understand what is
  happening.

- Move HeapTupleSatisfiesUpdate() call and the related branches
  afterwards into its own function.

- Move "temporarily mark it locked" branch into its own function. It's a
  minimal implementation of tuple locking, so it seems more than
  separate enough.

- Move the "store the new tuple" part into its own function (pretty much
  the critical section).

- Probably worth unifying the exit paths - there's a fair bit of
  duplication by now...

Half related:

- I think we might also need to do something about the proliferation of
  bitmaps in heap_update(). We now separately allocate 5 bitmapsets -
  that strikes me as fairly insane.


However, these would not really address the complexity in itself, just
make it easier to manage.

ISTM that a lot of the complexity is related to needing to retry (and
avoiding doing so unnecessarily), which in turn is related to avoiding
deadlocks. We actually know how to not need that to the same degree -
the (need_toast || newtupsize > pagefree) locks the tuple and afterwards
has a lot more freedom. We obviously can't just always do that, due to
the WAL logging overhead.

I wonder if we could make that path avoid the WAL logging overhead. We
don't actually need a full blown tuple lock, potentially even with its
own multixact, here.

The relevant comment (in heap_lock_tuple()) says:
    /*
     * XLOG stuff.  You might think that we don't need an XLOG record because
     * there is no state change worth restoring after a crash.  You would be
     * wrong however: we have just written either a TransactionId or a
     * MultiXactId that may never have been seen on disk before, and we need
     * to make sure that there are XLOG entries covering those ID numbers.
     * Else the same IDs might be re-used after a crash, which would be
     * disastrous if this page made it to disk before the crash.  Essentially
     * we have to enforce the WAL log-before-data rule even in this case.
     * (Also, in a PITR log-shipping or 2PC environment, we have to have XLOG
     * entries for everything anyway.)
     */

But I don't really think that doing full-blown WAL tuple-locking WAL
logging is really the right solution.

- The "next xid" concerns are at least as easily solved by WAL logging a
  distinct "highest xid assigned" record when necessary. Either by
  having a shared memory variable saying "latestLoggedXid" or such, or
  by having end-of-recovery advance nextXid to beyond what ExtendCLOG()
  extended to. That reduces the overhead to at most once-per-xact (and
  commonly smaller) or nothing, respectively.

- While there's obviously a good bit of simplicity ensuring that a
  replica is exactly the same ("Also, in a PITR log-shipping or 2PC
  environment ..."), we don't actually enforce that strictly anyway -
  so I am not sure why it's necessary to pay the price here.

But maybe I'm all wet here, I certainly haven't had enough coffee.

Greetings,

Andres Freund



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Apr 12, 2021 at 9:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hence, I propose the attached.  It passes check-world, but that proves
>> absolutely nothing of course :-(.  I wonder if there is any way to
>> exercise these code paths deterministically.

> This approach seems reasonable to me. At least you've managed to
> structure the visibility map page pin check as concomitant with the
> existing space recheck.

Thanks for looking it over.  Do you have an opinion on whether or not
to back-patch?  As far as we know, these bugs aren't exposed in the
back branches for lack of code that would set the all-visible flag
without superexclusive lock.  But I'd still say that heap_update
is failing to honor its API contract in these edge cases, and that
seems like something that could bite us after future back-patches.
Or there might be third-party code that can set all-visible flags.
So I'm kind of tempted to back-patch, but it's a judgment call.

            regards, tom lane



Re: PANIC: wrong buffer passed to visibilitymap_clear

От
Peter Geoghegan
Дата:
On Mon, Apr 12, 2021 at 6:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thanks for looking it over.  Do you have an opinion on whether or not
> to back-patch?  As far as we know, these bugs aren't exposed in the
> back branches for lack of code that would set the all-visible flag
> without superexclusive lock.  But I'd still say that heap_update
> is failing to honor its API contract in these edge cases, and that
> seems like something that could bite us after future back-patches.

If we assume that a scenario like the one we've been debugging can
never happen in the backbranches, then we must also assume that your
fix has negligible risk in the backbranches, because of how it is
structured. And so I agree -- I lean towards backpatching.

-- 
Peter Geoghegan