Обсуждение: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

Поиск
Список
Период
Сортировка

Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Tom Lane
Дата:
While the CLOBBER_FREED_MEMORY hack does a fairly good job of catching
stale pointers to already-freed memory, commit fd496129d160950e exhibits
a case that is not caught at all: RelationBuildRowSecurity was copying
*pointers into disk buffers* into backend-local relcaches.  This would
of course work just as long as the relevant system catalog pages stayed
in shared buffers ... which is probably long enough that you'd never
notice it in typical developer testing.

I wonder if there's anything we can do to catch such cases more
mechanically?

One brute-force answer is to run the regression tests with a very small
shared_buffers setting; but it's not clear what is small enough, nor
what might be so small as to cause failures.

Another idea is to teach Valgrind that whenever a backend reduces its
pin count on a shared buffer to zero, that buffer should become undefined
memory.  But I don't know if that will help --- if the buffer is then
re-accessed, is Valgrind able to distinguish freshly-computed pointers
into it from stale ones?

Ideas?
        regards, tom lane



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Peter Geoghegan
Дата:
On Sat, Jan 24, 2015 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Another idea is to teach Valgrind that whenever a backend reduces its
> pin count on a shared buffer to zero, that buffer should become undefined
> memory.

That should be fairly straightforward to implement.

> But I don't know if that will help --- if the buffer is then
> re-accessed, is Valgrind able to distinguish freshly-computed pointers
> into it from stale ones?

I don't think so. However, I think that
VALGRIND_CHECK_VALUE_IS_DEFINED() might be used. I believe you could
have Valgrind builds deference a pointer, and make sure that it
pointed into defined memory. But what would the generally useful choke
points for such a check be?

-- 
Peter Geoghegan



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Tom Lane
Дата:
Peter Geoghegan <pg@heroku.com> writes:
> On Sat, Jan 24, 2015 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another idea is to teach Valgrind that whenever a backend reduces its
>> pin count on a shared buffer to zero, that buffer should become undefined
>> memory.

> That should be fairly straightforward to implement.

>> But I don't know if that will help --- if the buffer is then
>> re-accessed, is Valgrind able to distinguish freshly-computed pointers
>> into it from stale ones?

> I don't think so. However, I think that
> VALGRIND_CHECK_VALUE_IS_DEFINED() might be used. I believe you could
> have Valgrind builds deference a pointer, and make sure that it
> pointed into defined memory. But what would the generally useful choke
> points for such a check be?

Not sure.  There are wide swaths of the system where it would be perfectly
valid to see a pointer into buffer storage, so long as you still had a pin
on that page.

However, after further consideration it seems like even without solving
the buffer-reaccess problem, a Valgrind tweak such as above would have
caught this bug, and probably most other similar bugs.  Running with a
large shared_buffers value actually works in our favor for this: you're
unlikely to get aliasing between different pages occupying the same
buffer.  And most queries don't (intentionally) re-access the same page,
so while detection of a stale pointer wouldn't be certain it'd be fairly
probable.
        regards, tom lane



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Noah Misch
Дата:
On Sat, Jan 24, 2015 at 01:48:43PM -0800, Peter Geoghegan wrote:
> On Sat, Jan 24, 2015 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Another idea is to teach Valgrind that whenever a backend reduces its
> > pin count on a shared buffer to zero, that buffer should become undefined
> > memory.
> 
> That should be fairly straightforward to implement.

+1



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Jim Nasby
Дата:
On 1/24/15 3:31 PM, Tom Lane wrote:
> Another idea is to teach Valgrind that whenever a backend reduces its
> pin count on a shared buffer to zero, that buffer should become undefined
> memory.

<paranoia>

Shouldn't this technically tie in with ResourceOwners? If a pointer takes the pin count from 1 to 2, then that pointer
shouldbe invalid by the time the pin count goes from 2 to 1...
 

I'm worried that a simple test when pin count is 0 could miss some cases of pointers just happening to be cleared by a
secondpart of the code even though the pin count has already dropped.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> On 1/24/15 3:31 PM, Tom Lane wrote:
>> Another idea is to teach Valgrind that whenever a backend reduces its
>> pin count on a shared buffer to zero, that buffer should become undefined
>> memory.

> <paranoia>

> Shouldn't this technically tie in with ResourceOwners?

No.  ResourceOwner is just a mechanism to ensure that we remember to call
UnpinBuffer, it has no impact on what the semantics of the pin count are.
The *instant* the pin count goes to zero, another backend is entitled to
recycle that buffer for some other purpose.
        regards, tom lane



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Jim Nasby
Дата:
On 1/26/15 4:51 PM, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>> On 1/24/15 3:31 PM, Tom Lane wrote:
>>> Another idea is to teach Valgrind that whenever a backend reduces its
>>> pin count on a shared buffer to zero, that buffer should become undefined
>>> memory.
>
>> <paranoia>
>
>> Shouldn't this technically tie in with ResourceOwners?
>
> No.  ResourceOwner is just a mechanism to ensure that we remember to call
> UnpinBuffer, it has no impact on what the semantics of the pin count are.
> The *instant* the pin count goes to zero, another backend is entitled to
> recycle that buffer for some other purpose.

But one backend can effectively "pin" a buffer more than once, no? If so, then ISTM there's some risk that code path A
pinsand forgets to unpin, but path B accidentally unpins for A.
 

But as you say, this is all academic until the pin count hits 0, so it's probably not worth worrying about.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Greg Stark
Дата:

On Tue, Jan 27, 2015 at 12:03 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
But one backend can effectively "pin" a buffer more than once, no? If so, then ISTM there's some risk that code path A pins and forgets to unpin, but path B accidentally unpins for A.

The danger is that there's a codepath that refers to memory it doesn't have a pin on but that there is no actual test in our regression suite that doesn't actually have a second pin on the same buffer. If there is in fact no reachable code path at all without the second pin then there's no active bug, only a bad coding practice. But if there are code paths that we just aren't testing then that's a real bug.

IIRC CLOBBER_FREED_MEMORY only affects palloc'd blocks. Do we not have a mode that automatically removes any buffer as soon as it's not pinned? That seems like it would be a valuable addition.

Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I suppose the same might not be true for rarely called codepaths or in cases where the buffers are usually already pinned.


--
greg

Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Jim Nasby
Дата:
On 1/26/15 6:11 PM, Greg Stark wrote:
>
> On Tue, Jan 27, 2015 at 12:03 AM, Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>> wrote:
>
>     But one backend can effectively "pin" a buffer more than once, no? If so, then ISTM there's some risk that code
pathA pins and forgets to unpin, but path B accidentally unpins for A.
 
>
>
> The danger is that there's a codepath that refers to memory it doesn't have a pin on but that there is no actual test
inour regression suite that doesn't actually have a second pin on the same buffer. If there is in fact no reachable
codepath at all without the second pin then there's no active bug, only a bad coding practice. But if there are code
pathsthat we just aren't testing then that's a real bug.
 
>
> IIRC CLOBBER_FREED_MEMORY only affects palloc'd blocks. Do we not have a mode that automatically removes any buffer
assoon as it's not pinned? That seems like it would be a valuable addition.
 

By setting BufferDesc.tag to 0?

On a related note... I'm confused by this part of UnpinBuffer. How is refcount ending up > 0??
Assert(ref->refcount > 0);ref->refcount--;if (ref->refcount == 0){    /* I'd better not still hold any locks on the
buffer*/    Assert(!LWLockHeldByMe(buf->content_lock));    Assert(!LWLockHeldByMe(buf->io_in_progress_lock));
 
    LockBufHdr(buf);
    /* Decrement the shared reference count */    Assert(buf->refcount > 0);    buf->refcount--;


BTW, I certainly see no evidence of CLOBBER_FREED_MEMORY coming into play here.

> Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I
supposethe same might not be true for rarely called codepaths or in cases where the buffers are usually already
pinned.

Yeah, that's what I was thinking. If there's some easy way to correctly associate pins with specific code paths
(owners?)then maybe it's worth doing so; but I don't think it's worth much effort.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 1/26/15 6:11 PM, Greg Stark wrote:
>> Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I
supposethe same might not be true for rarely called codepaths or in cases where the buffers are usually already
pinned.

> Yeah, that's what I was thinking. If there's some easy way to correctly associate pins with specific code paths
(owners?)then maybe it's worth doing so; but I don't think it's worth much effort.
 

If you have a working set larger than shared_buffers, then yeah it's
likely that reference-after-unpin bugs would manifest pretty quickly.
This does not necessarily translate into something reproducible or
debuggable, however; and besides that you'd really rather that such
bugs not get into the field in the first place.

The point of my Valgrind proposal was to provide a mechanism that would
have a chance of catching such bugs in a *development* context, and
provide some hint of where in the codebase the fault is, too.
        regards, tom lane



Re: Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

От
Jim Nasby
Дата:
On 1/27/15 5:16 PM, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> On 1/26/15 6:11 PM, Greg Stark wrote:
>>> Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I
supposethe same might not be true for rarely called codepaths or in cases where the buffers are usually already
pinned.
>
>> Yeah, that's what I was thinking. If there's some easy way to correctly associate pins with specific code paths
(owners?)then maybe it's worth doing so; but I don't think it's worth much effort.
 
>
> If you have a working set larger than shared_buffers, then yeah it's
> likely that reference-after-unpin bugs would manifest pretty quickly.
> This does not necessarily translate into something reproducible or
> debuggable, however; and besides that you'd really rather that such
> bugs not get into the field in the first place.
>
> The point of my Valgrind proposal was to provide a mechanism that would
> have a chance of catching such bugs in a *development* context, and
> provide some hint of where in the codebase the fault is, too.

That's what I was looking for two; I was just wondering if there was an easy way to also cover the case of one path
forgettingto Unpin and a second path accidentally Unpinning (with neither dropping the refcount to 0). It sounds like
it'sjust not worth worrying about that though.
 

Do you think there's merit to having bufmgr.c do something special when refcount hits 0 in a CLOBBER_FREED_MEMORY
build?It seems like it's a lot easier to enable that than to setup valgrind (though I've never tried the latter).
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com