Обсуждение: missing RelationCloseSmgr in FreeFakeRelcacheEntry?
Hi, I've started a valgrind run earlier when trying to run the regression tests with valgrind --error-exitcode=122 (to cause the regression tests to fail visibly) but it crashed frequently... One of them was: ==2184== Invalid write of size 8 ==2184== at 0x76787F: smgrclose (smgr.c:284) ==2184== by 0x4ED717: xact_redo_commit_internal (xact.c:4676) ==2184== by 0x4ED7FF: xact_redo_commit (xact.c:4712) ==2184== by 0x4EDB0D: xact_redo (xact.c:4838) ==2184== by 0x50355D: StartupXLOG (xlog.c:6809) ==2184== by 0x70AA1E: StartupProcessMain (startup.c:224) ==2184== by 0x512B38: AuxiliaryProcessMain (bootstrap.c:429) ==2184== by 0x709C43: StartChildProcess (postmaster.c:5063) ==2184== by 0x7086EA: PostmasterStateMachine (postmaster.c:3544) ==2184== by 0x7072F1: reaper (postmaster.c:2801) ==2184== by 0x57B325F: ??? (in /lib/x86_64-linux-gnu/libc-2.17.so) ==2184== by 0x585F822: __select_nocancel (syscall-template.S:81) ==2184== Address 0x5f63410 is 5,584 bytes inside a recently re-allocated block of size 8,192 alloc'd ==2184== at 0x402ACEC: malloc (vg_replace_malloc.c:270) ==2184== by 0x8B3F8E: AllocSetAlloc (aset.c:854) ==2184== by 0x8B623B: MemoryContextAlloc (mcxt.c:581) ==2184== by 0x8B5F93: MemoryContextCreate (mcxt.c:522) ==2184== by 0x8B33C4: AllocSetContextCreate (aset.c:444) ==2184== by 0x8B55DD: MemoryContextInit (mcxt.c:110) ==2184== by 0x703B17: PostmasterMain (po Which seems to indicate a dangling ->rd_smgr pointer, confusing the heck out of me because I couldn't see how that'd happen. Looking a bit closer it seems to me that the fake relcache infrastructure seems to neglect the chance that something used the fake entry to read something which will have done a RelationOpenSmgr(). Which in turn will have set rel->rd_smgr->smgr_owner to rel. When we then pfree() the fake relation in FreeFakeRelcacheEntry we have a dangling pointer. It confuses me a bit that this doesn't cause issues during recovery more frequently... The code seems to have been that way since a213f1ee6c5a1bbe1f074ca201975e76ad2ed50c which introduced fake relcache entries. It looks like XLogCloseRelationCache() indirectly has done a RelationCloseSmgr(). diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 5429d5e..ee7c892 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode)voidFreeFakeRelcacheEntry(Relation fakerel){ + RelationCloseSmgr(fakerel); pfree(fakerel);} Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi Heikki, All, On 2013-10-29 02:16:23 +0100, Andres Freund wrote: > Looking a bit closer it seems to me that the fake relcache > infrastructure seems to neglect the chance that something used the fake > entry to read something which will have done a RelationOpenSmgr(). Which > in turn will have set rel->rd_smgr->smgr_owner to rel. > When we then pfree() the fake relation in FreeFakeRelcacheEntry we have > a dangling pointer. > > It confuses me a bit that this doesn't cause issues during recovery more > frequently... The code seems to have been that way since > a213f1ee6c5a1bbe1f074ca201975e76ad2ed50c which introduced fake relcache > entries. It looks like XLogCloseRelationCache() indirectly has done a > RelationCloseSmgr(). This looks like it was caused by a213f1ee6c5a1bbe1f074ca201975e76ad2ed50c which you committed, any chance you could commit the trivial fix? > diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c > index 5429d5e..ee7c892 100644 > --- a/src/backend/access/transam/xlogutils.c > +++ b/src/backend/access/transam/xlogutils.c > @@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode) > void > FreeFakeRelcacheEntry(Relation fakerel) > { > + RelationCloseSmgr(fakerel); > pfree(fakerel); > } Greetings, Andres Freund
On 29.10.2013 03:16, Andres Freund wrote: > Hi, > > I've started a valgrind run earlier when trying to run the regression > tests with valgrind --error-exitcode=122 (to cause the regression tests > to fail visibly) but it crashed frequently... > > One of them was: > > ==2184== Invalid write of size 8 > ==2184== at 0x76787F: smgrclose (smgr.c:284) > ==2184== by 0x4ED717: xact_redo_commit_internal (xact.c:4676) > ==2184== by 0x4ED7FF: xact_redo_commit (xact.c:4712) > ==2184== by 0x4EDB0D: xact_redo (xact.c:4838) > ==2184== by 0x50355D: StartupXLOG (xlog.c:6809) > ==2184== by 0x70AA1E: StartupProcessMain (startup.c:224) > ==2184== by 0x512B38: AuxiliaryProcessMain (bootstrap.c:429) > ==2184== by 0x709C43: StartChildProcess (postmaster.c:5063) > ==2184== by 0x7086EA: PostmasterStateMachine (postmaster.c:3544) > ==2184== by 0x7072F1: reaper (postmaster.c:2801) > ==2184== by 0x57B325F: ??? (in /lib/x86_64-linux-gnu/libc-2.17.so) > ==2184== by 0x585F822: __select_nocancel (syscall-template.S:81) > ==2184== Address 0x5f63410 is 5,584 bytes inside a recently re-allocated block of size 8,192 alloc'd > ==2184== at 0x402ACEC: malloc (vg_replace_malloc.c:270) > ==2184== by 0x8B3F8E: AllocSetAlloc (aset.c:854) > ==2184== by 0x8B623B: MemoryContextAlloc (mcxt.c:581) > ==2184== by 0x8B5F93: MemoryContextCreate (mcxt.c:522) > ==2184== by 0x8B33C4: AllocSetContextCreate (aset.c:444) > ==2184== by 0x8B55DD: MemoryContextInit (mcxt.c:110) > ==2184== by 0x703B17: PostmasterMain (po > > Which seems to indicate a dangling ->rd_smgr pointer, confusing the heck > out of me because I couldn't see how that'd happen. > > Looking a bit closer it seems to me that the fake relcache > infrastructure seems to neglect the chance that something used the fake > entry to read something which will have done a RelationOpenSmgr(). Which > in turn will have set rel->rd_smgr->smgr_owner to rel. > When we then pfree() the fake relation in FreeFakeRelcacheEntry we have > a dangling pointer. Yeah, that's clearly a bug. > diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c > index 5429d5e..ee7c892 100644 > --- a/src/backend/access/transam/xlogutils.c > +++ b/src/backend/access/transam/xlogutils.c > @@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode) > void > FreeFakeRelcacheEntry(Relation fakerel) > { > + RelationCloseSmgr(fakerel); > pfree(fakerel); > } Hmm, I don't think that's a very good solution. Firstly, it will force the underlying files to be closed, hurting performance. Fake relcache entries are used in heapam when clearing visibility map bits, which might happen frequently enough for that to matter. Secondly, it will fail if you create two fake relcache entries for the same relfilenode. Freeing the first will close the smgr entry, and freeing the second will try to close the same smgr entry again. We never do that in the current code, but it seems like a possible source of bugs in the future. How about the attached instead? - Heikki
Вложения
On 2013-11-04 09:38:27 +0200, Heikki Linnakangas wrote: > On 29.10.2013 03:16, Andres Freund wrote: > >Hi, > > > >I've started a valgrind run earlier when trying to run the regression > >tests with valgrind --error-exitcode=122 (to cause the regression tests > >to fail visibly) but it crashed frequently... > > > >One of them was: > >... > >Which seems to indicate a dangling ->rd_smgr pointer, confusing the heck > >out of me because I couldn't see how that'd happen. > > > >Looking a bit closer it seems to me that the fake relcache > >infrastructure seems to neglect the chance that something used the fake > >entry to read something which will have done a RelationOpenSmgr(). Which > >in turn will have set rel->rd_smgr->smgr_owner to rel. > >When we then pfree() the fake relation in FreeFakeRelcacheEntry we have > >a dangling pointer. > > Yeah, that's clearly a bug. > > >diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c > >index 5429d5e..ee7c892 100644 > >--- a/src/backend/access/transam/xlogutils.c > >+++ b/src/backend/access/transam/xlogutils.c > >@@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode) > > void > > FreeFakeRelcacheEntry(Relation fakerel) > > { > >+ RelationCloseSmgr(fakerel); > > pfree(fakerel); > > } > > Hmm, I don't think that's a very good solution. Firstly, it will force the > underlying files to be closed, hurting performance. Fake relcache entries > are used in heapam when clearing visibility map bits, which might happen > frequently enough for that to matter. Well, it will only close them when they actually were smgropen()ed which will not always be the case. Although it probably is in the visibility map case. Feels like premature optimization to me. > Secondly, it will fail if you create > two fake relcache entries for the same relfilenode. Freeing the first will > close the smgr entry, and freeing the second will try to close the same smgr > entry again. We never do that in the current code, but it seems like a > possible source of bugs in the future. I think if we go there we'll need refcounted FakeRelcacheEntry's anyway. Otherwise we'll end up with a relation smgropen()ed multiple times in the same backend and such which doesn't seem like a promising course to me since the smgr itself isn't refcounted. > How about the attached instead? > diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c > index 5429d5e..f732e71 100644 > --- a/src/backend/access/transam/xlogutils.c > +++ b/src/backend/access/transam/xlogutils.c > @@ -422,7 +422,11 @@ CreateFakeRelcacheEntry(RelFileNode rnode) > rel->rd_lockInfo.lockRelId.dbId = rnode.dbNode; > rel->rd_lockInfo.lockRelId.relId = rnode.relNode; > > - rel->rd_smgr = NULL; > + /* > + * Open the underlying storage, but don't set rd_owner. We want the > + * SmgrRelation to persist after the fake relcache entry is freed. > + */ > + rel->rd_smgr = smgropen(rnode, InvalidBackendId); > > return rel; > } I don't really like that - that will mean we'll leak open smgr handles for every relation touched via smgr during recovery since they will never be closed unless the relation is dropped or such. And in some databases there can be huge amounts of relations. Since recovery is long lived that doesn't seem like a good idea, besides the memory usage it will also bloat smgr's internal hash which actually seems just as likely to hurt performance. I think intentionally not using the owner mechanism also is dangerous - what if we have longer lived fake relcache entries and we've just processed sinval messages? Then we'll have a ->rd_smgr pointing into nowhere. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04.11.2013 11:35, Andres Freund wrote: > On 2013-11-04 09:38:27 +0200, Heikki Linnakangas wrote: >> Secondly, it will fail if you create >> two fake relcache entries for the same relfilenode. Freeing the first will >> close the smgr entry, and freeing the second will try to close the same smgr >> entry again. We never do that in the current code, but it seems like a >> possible source of bugs in the future. > > I think if we go there we'll need refcounted FakeRelcacheEntry's > anyway. Otherwise we'll end up with a relation smgropen()ed multiple > times in the same backend and such which doesn't seem like a promising > course to me since the smgr itself isn't refcounted. > >> How about the attached instead? > >> diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c >> index 5429d5e..f732e71 100644 >> --- a/src/backend/access/transam/xlogutils.c >> +++ b/src/backend/access/transam/xlogutils.c >> @@ -422,7 +422,11 @@ CreateFakeRelcacheEntry(RelFileNode rnode) >> rel->rd_lockInfo.lockRelId.dbId = rnode.dbNode; >> rel->rd_lockInfo.lockRelId.relId = rnode.relNode; >> >> - rel->rd_smgr = NULL; >> + /* >> + * Open the underlying storage, but don't set rd_owner. We want the >> + * SmgrRelation to persist after the fake relcache entry is freed. >> + */ >> + rel->rd_smgr = smgropen(rnode, InvalidBackendId); >> >> return rel; >> } > > I don't really like that - that will mean we'll leak open smgr handles > for every relation touched via smgr during recovery since they will > never be closed unless the relation is dropped or such. And in some > databases there can be huge amounts of relations. > Since recovery is long lived that doesn't seem like a good idea, besides > the memory usage it will also bloat smgr's internal hash which actually > seems just as likely to hurt performance. That's the way SmgrRelations are supposed to be used. Opened on first use, and kept open forever. We never smgrclose() during normal operation either, unless the relation is dropped or something like that. And see how XLogReadBuffer() also calls smgropen() with no corresponding smgrclose(). > I think intentionally not using the owner mechanism also is dangerous - > what if we have longer lived fake relcache entries and we've just > processed sinval messages? Then we'll have a ->rd_smgr pointing into > nowhere. Hmm, the startup process doesn't participate in sinval messaging at all, does it? - Heikki
On 2013-11-04 14:37:53 +0200, Heikki Linnakangas wrote: > On 04.11.2013 11:35, Andres Freund wrote: > >On 2013-11-04 09:38:27 +0200, Heikki Linnakangas wrote: > >>diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c > >>index 5429d5e..f732e71 100644 > >>--- a/src/backend/access/transam/xlogutils.c > >>+++ b/src/backend/access/transam/xlogutils.c > >>@@ -422,7 +422,11 @@ CreateFakeRelcacheEntry(RelFileNode rnode) > >> rel->rd_lockInfo.lockRelId.dbId = rnode.dbNode; > >> rel->rd_lockInfo.lockRelId.relId = rnode.relNode; > >> > >>- rel->rd_smgr = NULL; > >>+ /* > >>+ * Open the underlying storage, but don't set rd_owner. We want the > >>+ * SmgrRelation to persist after the fake relcache entry is freed. > >>+ */ > >>+ rel->rd_smgr = smgropen(rnode, InvalidBackendId); > >> > >> return rel; > >> } > > > >I don't really like that - that will mean we'll leak open smgr handles > >for every relation touched via smgr during recovery since they will > >never be closed unless the relation is dropped or such. And in some > >databases there can be huge amounts of relations. > >Since recovery is long lived that doesn't seem like a good idea, besides > >the memory usage it will also bloat smgr's internal hash which actually > >seems just as likely to hurt performance. > > That's the way SmgrRelations are supposed to be used. Opened on first use, > and kept open forever. We never smgrclose() during normal operation either, > unless the relation is dropped or something like that. And see how > XLogReadBuffer() also calls smgropen() with no corresponding smgrclose(). Ok, that's quite the fair argument although I'd say normal backends won't open many relations in comparison to the startup process when doing replication. But that certainly isn't sufficient argument for changing logic in the back branches or even HEAD. > >I think intentionally not using the owner mechanism also is dangerous - > >what if we have longer lived fake relcache entries and we've just > >processed sinval messages? Then we'll have a ->rd_smgr pointing into > >nowhere. > Hmm, the startup process doesn't participate in sinval messaging at all, > does it? Well, not sinval but inval, in hot standby via commit messages. What about just unowning the smgr entry with if (rel->rd_smgr != NULL) smgrsetowner(NULL, rel->rd_smgr) when closing the fake relcache entry? That shouldn't require any further changes than changing the comment in smgrsetowner() that this isn't something expected to frequently happen. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-04 13:48:32 +0100, Andres Freund wrote: > > Hmm, the startup process doesn't participate in sinval messaging at all, > > does it? > > Well, not sinval but inval, in hot standby via commit messages. Err, that's bullshit, sorry for that. We send the messages via sinval, but never (probably at least?) process sinval messages. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-04 13:48:32 +0100, Andres Freund wrote: > What about just unowning the smgr entry with > if (rel->rd_smgr != NULL) > smgrsetowner(NULL, rel->rd_smgr) > when closing the fake relcache entry? > > That shouldn't require any further changes than changing the comment in > smgrsetowner() that this isn't something expected to frequently happen. Attached is a patch doing it like that, it required slightmy more invasive changes than that. With the patch applied we survive replay of a primary's make check run under valgrind without warnings. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Tue, Nov 5, 2013 at 08:36:32PM +0100, Andres Freund wrote: > On 2013-11-04 13:48:32 +0100, Andres Freund wrote: > > What about just unowning the smgr entry with > > if (rel->rd_smgr != NULL) > > smgrsetowner(NULL, rel->rd_smgr) > > when closing the fake relcache entry? > > > > That shouldn't require any further changes than changing the comment in > > smgrsetowner() that this isn't something expected to frequently happen. > > Attached is a patch doing it like that, it required slightmy more > invasive changes than that. With the patch applied we survive replay of > a primary's make check run under valgrind without warnings. Where are we on this patch? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 03/06/2014 02:18 AM, Bruce Momjian wrote: > On Tue, Nov 5, 2013 at 08:36:32PM +0100, Andres Freund wrote: >> On 2013-11-04 13:48:32 +0100, Andres Freund wrote: >>> What about just unowning the smgr entry with >>> if (rel->rd_smgr != NULL) >>> smgrsetowner(NULL, rel->rd_smgr) >>> when closing the fake relcache entry? >>> >>> That shouldn't require any further changes than changing the comment in >>> smgrsetowner() that this isn't something expected to frequently happen. >> >> Attached is a patch doing it like that, it required slightmy more >> invasive changes than that. With the patch applied we survive replay of >> a primary's make check run under valgrind without warnings. > > Where are we on this patch? Committed now, with small changes. I made the new smgrclearowner function to check that the SmgrRelation object is indeed owned by the owner we expect, so that it doesn't unown it if it's actually owned by someone else. That shouldn't happen, but it seemed prudent to check. Thanks Andres. I tried to reproduce the valgrind message you reported, but couldn't. How did you do it? Did this commit fix it? And thanks for the nudge, Bruce. - Heikki
On 2014-03-07 13:50:27 +0200, Heikki Linnakangas wrote: > Thanks Andres. I tried to reproduce the valgrind message you reported, but > couldn't. How did you do it? Did this commit fix it? I previously could reproduce the issue by either forcing a server into recovery or using a replica. That seems to be fine now. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services