Обсуждение: Poorly thought out code in vacuum

Поиск
Список
Период
Сортировка

Poorly thought out code in vacuum

От
Tom Lane
Дата:
Lately I have noticed buildfarm member jaguar occasionally failing
regression tests with
ERROR:  invalid memory alloc request size 1073741824
during a VACUUM, as for example at
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2012-01-04%2023%3A05%3A02

Naturally I supposed it to be a consequence of the CLOBBER_CACHE_ALWAYS
option, ie, somebody accessing an already-freed cache entry.  I managed
to duplicate and debug it here after an afternoon of trying, and it's
not actually related to that at all, except perhaps to the extent that
CLOBBER_CACHE_ALWAYS slows down some things enormously more than others.
The failure comes when a ResourceOwner tries to enlarge its
pinned-buffers array to more than 1GB, and that's not a typo: there are
umpteen entries for the same buffer, whose PrivateRefCount is 134217727
and trying to go higher.  A stack trace soon revealed the culprit, which
is this change in lazy_vacuum_heap():

@@ -932,7 +965,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)        tblk =
ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);       buf = ReadBufferExtended(onerel, MAIN_FORKNUM,
tblk,RBM_NORMAL,                                 vac_strategy);
 
-        LockBufferForCleanup(buf);
+        if (!ConditionalLockBufferForCleanup(buf))
+            continue;        tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats);        /* Now that
we'vecompacted the page, record its available space */
 

introduced in
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=bbb6e559c4ea0fb4c346beda76736451dc24eb4e

The "continue" just results in another iteration of the containing tight
loop, and of course that immediately re-pins the same buffer without
having un-pinned it.  So if someone else sits on their pin of the buffer
for long enough, kaboom.

We could fix the direct symptom by inserting UnlockReleaseBuffer()
in front of the continue, but AFAICS that just makes this into a
busy-waiting equivalent of waiting unconditionally, so I don't really
see why we should not revert the above fragment altogether.  However,
I suppose Robert had something more intelligent in mind than a tight
loop when the buffer can't be exclusively locked, so maybe there is
some other change that should be made here instead.
        regards, tom lane


Re: Poorly thought out code in vacuum

От
Simon Riggs
Дата:
On Fri, Jan 6, 2012 at 12:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> We could fix the direct symptom by inserting UnlockReleaseBuffer()
> in front of the continue, but AFAICS that just makes this into a
> busy-waiting equivalent of waiting unconditionally, so I don't really
> see why we should not revert the above fragment altogether.  However,
> I suppose Robert had something more intelligent in mind than a tight
> loop when the buffer can't be exclusively locked, so maybe there is
> some other change that should be made here instead.

Skipping the block completely isn't feasible. The only options are to
wait or to do out of order cleaning.

The conditional behaviour in vacuum_scan_heap() is much more important
that it is here, so just waiting for the lock wouldn't be too bad. Out
of order cleaning could be very expensive in terms of I/O and could
make that less robust. So I'd say lets just wait normally for the
lock.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I suppose Robert had something more intelligent in mind than a tight
> loop when the buffer can't be exclusively locked, so maybe there is
> some other change that should be made here instead.

My intention was to skip the tuple, but I failed to notice the unusual
way in which this loop iterates.  How about something like the
attached?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: Poorly thought out code in vacuum

От
Simon Riggs
Дата:
On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I suppose Robert had something more intelligent in mind than a tight
>> loop when the buffer can't be exclusively locked, so maybe there is
>> some other change that should be made here instead.
>
> My intention was to skip the tuple, but I failed to notice the unusual
> way in which this loop iterates.  How about something like the
> attached?

It solves the waiting issue, but leaves unused tuples in the heap that
previously would have been removed.

I don't think that is a solution.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Fri, Jan 6, 2012 at 9:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I suppose Robert had something more intelligent in mind than a tight
>>> loop when the buffer can't be exclusively locked, so maybe there is
>>> some other change that should be made here instead.
>>
>> My intention was to skip the tuple, but I failed to notice the unusual
>> way in which this loop iterates.  How about something like the
>> attached?
>
> It solves the waiting issue, but leaves unused tuples in the heap that
> previously would have been removed.
>
> I don't think that is a solution.

Uh, we discussed this before the patch was committed, and you agreed
it made sense to do that in the second heap scan just as we do it in
the first heap scan.  If you now see a problem with that, that's fine,
but please explain what the problem is, rather than just saying it's
not acceptable for the patch to do the thing that the patch was
designed to do.

Actually, on further review, I do see another problem: we need to pass
the scan_all flag down to lazy_vacuum_heap, and only do this
conditionally if that flag is not set.  Otherwise we might skip a page
but still advance relfrozenxid, which would definitely be bad.  But
that looks easy to fix, and I don't see any other reason why this
would be either unsafe or undesirable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Poorly thought out code in vacuum

От
Simon Riggs
Дата:
On Fri, Jan 6, 2012 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 6, 2012 at 9:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 6, 2012 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Thu, Jan 5, 2012 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I suppose Robert had something more intelligent in mind than a tight
>>>> loop when the buffer can't be exclusively locked, so maybe there is
>>>> some other change that should be made here instead.
>>>
>>> My intention was to skip the tuple, but I failed to notice the unusual
>>> way in which this loop iterates.  How about something like the
>>> attached?
>>
>> It solves the waiting issue, but leaves unused tuples in the heap that
>> previously would have been removed.
>>
>> I don't think that is a solution.
>
> Uh, we discussed this before the patch was committed, and you agreed
> it made sense to do that in the second heap scan just as we do it in
> the first heap scan.  If you now see a problem with that, that's fine,
> but please explain what the problem is, rather than just saying it's
> not acceptable for the patch to do the thing that the patch was
> designed to do.

My name is on that patch, so of course I accept responsibility for the
earlier decision making.

I *have* explained what is wrong. It "leaves unused tuples in the heap
that would previously have been removed".

More simply: lazy_vacuum_page() does some work and we can't skip that
work just because we don't get the lock. Elsewhere in the patch we
skipped getting the lock when there was no work to be done. In
lazy_vacuum_heap() we only visit pages that need further work, hence
every page is important.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Fri, Jan 6, 2012 at 10:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I *have* explained what is wrong. It "leaves unused tuples in the heap
> that would previously have been removed".
>
> More simply: lazy_vacuum_page() does some work and we can't skip that
> work just because we don't get the lock. Elsewhere in the patch we
> skipped getting the lock when there was no work to be done.

That is not true.  We skip vacuumable pages in the first heap pass
even if they contain dead tuples, unless scan_all is set or we can get
the buffer lock without waiting.

> In
> lazy_vacuum_heap() we only visit pages that need further work, hence
> every page is important.

If the system were to crash after the first heap pass and the index
vacuuming, but before the second heap pass, nothing bad would happen.
The next vacuum would collect those same dead tuples, scan the indexes
for them (and not find them, since we already removed them), and then
remove them from the heap.

We already made a decision that it's a good idea to skip vacuuming a
page on which we can't immediately obtain a cleanup lock, because
waiting for the cleanup lock can cause the autovacuum worker to get
stuck indefinitely, starving the table and in some cases the entire
cluster of adequate vacuuming.  That logic is just as valid in the
second heap pass as it is in the first one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Poorly thought out code in vacuum

От
Tom Lane
Дата:
I started to wonder how likely it would be that some other process would
sit on a buffer pin for so long as to allow 134217727 iterations of
ReadBufferExtended, even given the slowdowns associated with
CLOBBER_CACHE_ALWAYS.  This led to some fruitless searching for possible
deadlock conditions, but eventually I realized that there's a much
simpler explanation: if PrivateRefCount > 1 then
ConditionalLockBufferForCleanup always fails.  This means that once
ConditionalLockBufferForCleanup has failed once, the currently committed
code in lazy_vacuum_heap is guaranteed to loop until it gets a failure
in enlarging the ResourceOwner buffer-reference array.  Which in turn
means that neither of you ever actually exercised the skip case, or
you'd have noticed the problem.  Nor are the current regression tests
well designed to exercise the case.  (There might well be failures of
this type happening from time to time in autovacuum, but we'd not see
any reported failure in the buildfarm, unless we went trawling in
postmaster logs.)

So at this point I've got serious doubts as to the quality of testing of
that whole patch, not just this part.
        regards, tom lane


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I started to wonder how likely it would be that some other process would
> sit on a buffer pin for so long as to allow 134217727 iterations of
> ReadBufferExtended, even given the slowdowns associated with
> CLOBBER_CACHE_ALWAYS.  This led to some fruitless searching for possible
> deadlock conditions, but eventually I realized that there's a much
> simpler explanation: if PrivateRefCount > 1 then
> ConditionalLockBufferForCleanup always fails.  This means that once
> ConditionalLockBufferForCleanup has failed once, the currently committed
> code in lazy_vacuum_heap is guaranteed to loop until it gets a failure
> in enlarging the ResourceOwner buffer-reference array.  Which in turn
> means that neither of you ever actually exercised the skip case, or
> you'd have noticed the problem.  Nor are the current regression tests
> well designed to exercise the case.  (There might well be failures of
> this type happening from time to time in autovacuum, but we'd not see
> any reported failure in the buildfarm, unless we went trawling in
> postmaster logs.)
>
> So at this point I've got serious doubts as to the quality of testing of
> that whole patch, not just this part.

I tested the case where we skip a block during the first pass, but I
admit that I punted on testing the case where we skip a block during
the second pass, because I couldn't think of a good way to exercise
it.  Any suggestions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Poorly thought out code in vacuum

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So at this point I've got serious doubts as to the quality of testing of
>> that whole patch, not just this part.

> I tested the case where we skip a block during the first pass, but I
> admit that I punted on testing the case where we skip a block during
> the second pass, because I couldn't think of a good way to exercise
> it.  Any suggestions?

Hack ConditionalLockBufferForCleanup to have a 50% probability of
failure regardless of anything else, for instance via
static int ctr = 0;
if ((++ctr) % 2)    return false;
        regards, tom lane


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Fri, Jan 6, 2012 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jan 6, 2012 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So at this point I've got serious doubts as to the quality of testing of
>>> that whole patch, not just this part.
>
>> I tested the case where we skip a block during the first pass, but I
>> admit that I punted on testing the case where we skip a block during
>> the second pass, because I couldn't think of a good way to exercise
>> it.  Any suggestions?
>
> Hack ConditionalLockBufferForCleanup to have a 50% probability of
> failure regardless of anything else, for instance via
>
>        static int ctr = 0;
>
>        if ((++ctr) % 2)
>                return false;

Oh, that's brilliant.  OK, I'll go try that.

Note to self: Try to remember to take that hack back out before committing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Poorly thought out code in vacuum

От
Robert Haas
Дата:
On Fri, Jan 6, 2012 at 12:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Oh, that's brilliant.  OK, I'll go try that.

All right, that test does in fact reveal the broken-ness of the
current code, and the patch I committed upthread does seem to fix it,
so I've committed that.

After further reflection, I'm thinking that skipping the *second* heap
pass when a cleanup lock is not immediately available is safe even
during an anti-wraparound vacuum (i.e. scan_all = true), because the
only thing the second pass is doing is changing dead line pointers to
unused, and failing to do that doesn't prevent relfrozenxid
advancement: the dead line pointer doesn't contain an XID.  The next
vacuum will clean it up: it'll see that there's a dead line pointer,
add that to the list of TIDs to be killed if seen during the index
vacuum (it won't be, since we got that far the first time, but vacuum
doesn't know or care), and then try again during the second heap pass
to mark that dead line pointer unused.  This might be worth a code
comment, since it's not entirely obvious.  At any rate, it seems that
the worst thing that happens from skipping a block during the second
pass is that we postpone reclaiming a line pointer whose tuple storage
has already been recovered, which is pretty far down in the weeds.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company