Обсуждение: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Hi hackers! $subj. Steps to reproduce: create extension if not exists amcheck; create table if not exists t1(i int); begin; insert into t1 values(1); prepare transaction 'x'; create index concurrently i1 on t1(i); commit prepared 'x'; select bt_index_check('i1', true); I observe: NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1" I expect: awaiting 'x' commit before index is created, correct index after. This happens because WaitForLockersMultiple() does not take prepared xacts into account. Meanwhile CREATE INDEX CONCURRENTLYexpects that locks are dropped only when transaction commit is visible. This issue affects pg_repack and similar machinery based on CIC. PFA draft of a fix. Best regards, Andrey Borodin.
Вложения
Steps to reproduce:
create extension if not exists amcheck;
create table if not exists t1(i int);
begin;
insert into t1 values(1);
prepare transaction 'x';
create index concurrently i1 on t1(i);
commit prepared 'x';
select bt_index_check('i1', true);
I observe:
NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.
CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?..
> 19 дек. 2020 г., в 22:22, Victor Yegorov <vyegorov@gmail.com> написал(а): > > CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?.. CREATE INDEX CONCURRENTLY cannot run inside a transaction block. Best regards, Andrey Borodin.
I observe:
NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.
I agree, that behaviour is unexpected. But getting a notice that requires me
to re-create the index some time later is not better (from DBA perspective).
Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions?
Andrey Borodin <x4mmm@yandex-team.ru> writes: > This happens because WaitForLockersMultiple() does not take prepared > xacts into account. Ugh, clearly an oversight. > Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only > when transaction commit is visible. Don't follow your point here --- I'm pretty sure that prepared xacts continue to hold their locks. > PFA draft of a fix. Haven't you completely broken VirtualXactLock()? Certainly, whether the target is a normal or prepared transaction shouldn't alter the meaning of the "wait" flag. In general, I wonder whether WaitForLockersMultiple and GetLockConflicts need to gain an additional parameter indicating whether to consider prepared xacts. It's not clear to me that their current behavior is wrong for all possible uses. regards, tom lane
> 19 дек. 2020 г., в 22:48, Victor Yegorov <vyegorov@gmail.com> написал(а): > > сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>: > I observe: > NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1" > I expect: awaiting 'x' commit before index is created, correct index after. > > I agree, that behaviour is unexpected. But getting a notice that requires me > to re-create the index some time later is not better (from DBA perspective). > > Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions? This is not a real NOTICE. I just used a bit altered amcheck to diagnose the problem. It's an incorrect index. It lacks sometuples. It will not find existing data, failing to provide "read committed" consistency guaranties. Fix waits for prepared xacts just like any other transaction. > 19 дек. 2020 г., в 22:57, Tom Lane <tgl@sss.pgh.pa.us> написал(а): > >> Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only >> when transaction commit is visible. > > Don't follow your point here --- I'm pretty sure that prepared xacts > continue to hold their locks. Uhmm, yes, locks are there. Relation is locked with RowExclusiveLock, but this lock is ignored by WaitForLockers(heaplocktag,ShareLock, true). Locking relation with ShareLock works as expected. >> PFA draft of a fix. > > Haven't you completely broken VirtualXactLock()? Certainly, whether the > target is a normal or prepared transaction shouldn't alter the meaning > of the "wait" flag. You are right, the patch has a bug here. > In general, I wonder whether WaitForLockersMultiple and GetLockConflicts > need to gain an additional parameter indicating whether to consider > prepared xacts. It's not clear to me that their current behavior is wrong > for all possible uses. I don't see usages besides indexing stuff. But maybe it worth to analyse each case... BTW do we need a test for this? Will isolation test be good at checking this? Thanks! Best regards, Andrey Borodin.
> 19 дек. 2020 г., в 23:25, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > BTW do we need a test for this? Will isolation test be good at checking this? PFA patch set with isolation test for the $subj and fix for VirtualXactLock() bug. I think I'll register the thread on January CF. Thanks! Best regards, Andrey Borodin.
Вложения
On Sat, Dec 19, 2020 at 12:57:41PM -0500, Tom Lane wrote: > Andrey Borodin <x4mmm@yandex-team.ru> writes: >> This happens because WaitForLockersMultiple() does not take prepared >> xacts into account. > > Ugh, clearly an oversight. This looks to be the case since 295e639 where virtual XIDs have been introduced. So this is an old bug. > Don't follow your point here --- I'm pretty sure that prepared xacts > continue to hold their locks. Yes, that's what I recall as well. > Haven't you completely broken VirtualXactLock()? Certainly, whether the > target is a normal or prepared transaction shouldn't alter the meaning > of the "wait" flag. Yep. > In general, I wonder whether WaitForLockersMultiple and GetLockConflicts > need to gain an additional parameter indicating whether to consider > prepared xacts. It's not clear to me that their current behavior is wrong > for all possible uses. WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY, where it seems to me we need to care about all the cases related to concurrent build, validation and index drop. The other caller of GetLockConflicts() is for conflict resolution in standbys where it is fine to ignore 2PC transactions as these cannot be cancelled. So I agree that we are going to need more control with a new option argument to be able to control if 2PC transactions are ignored or not. Hmm. The approach taken by the patch looks to be back-patchable. Based on the lack of complaints on the matter, we could consider instead putting an error in WaitForLockersMultiple() if there is at least one numPrepXact which would at least avoid inconsistent data. But I don't think what's proposed here is bad either. VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing that VirtualTransactionIdIsPreparedXact() combined with LocalTransactionIdIsValid() would be enough to do the job. - Assert(VirtualTransactionIdIsValid(vxid)); + Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid)); + + if (VirtualTransactionIdIsPreparedXact(vxid)) [...] #define VirtualTransactionIdIsPreparedXact(vxid) \ ((vxid).backendId == InvalidBackendId) This would allow the case where backendId and localTransactionId are both invalid. So it would be better to also check in VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no? -- Michael
Вложения
> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а): > >> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts >> need to gain an additional parameter indicating whether to consider >> prepared xacts. It's not clear to me that their current behavior is wrong >> for all possible uses. > > WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY, > where it seems to me we need to care about all the cases related to > concurrent build, validation and index drop. The other caller of > GetLockConflicts() is for conflict resolution in standbys where it is > fine to ignore 2PC transactions as these cannot be cancelled. I don't think that fact that we cannot cancel transaction is sufficient here to ignore prepared transaction. I think thereshould not exist any prepared transaction that we need to cancel in standby conflict resolution. And if it exists -it's a sign of corruption, we could emit warning or something like that. But I'm really not an expert here, just a common sense that prepared transaction is just like regular transaction that survivescrash. If we wait for any transaction - probably we should wait for prepared too. I'm not insisting on anything though. > So I > agree that we are going to need more control with a new option > argument to be able to control if 2PC transactions are ignored or > not. > > Hmm. The approach taken by the patch looks to be back-patchable. > Based on the lack of complaints on the matter, we could consider > instead putting an error in WaitForLockersMultiple() if there is at > least one numPrepXact which would at least avoid inconsistent data. > But I don't think what's proposed here is bad either. > > VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing > that VirtualTransactionIdIsPreparedXact() combined with > LocalTransactionIdIsValid() would be enough to do the job. > > - Assert(VirtualTransactionIdIsValid(vxid)); > + Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid)); > + > + if (VirtualTransactionIdIsPreparedXact(vxid)) > [...] > #define VirtualTransactionIdIsPreparedXact(vxid) \ > ((vxid).backendId == InvalidBackendId) > This would allow the case where backendId and localTransactionId are > both invalid. So it would be better to also check in > VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no? Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch. Thanks! Best regards, Andrey Borodin.
Вложения
> 21 дек. 2020 г., в 12:24, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch. Sorry for the noise, removal was not complete. Best regards, Andrey Borodin.
Вложения
> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а): > > Hmm. The approach taken by the patch looks to be back-patchable. I was checking that patch\test works in supported branches and everything seems to be fine down to REL_10_STABLE. But my machines (Ubuntu18 and MacOS) report several fails in isolation tests on REL9_6_20\REL9_6_STABLE. Particularly eval-plan-qual-trigger,drop-index-concurrently-1, and async-notify. I do not observe problems with regular isolation teststhough. Do I understand correctly that check-world tests on buildfarm 'make check-prepared-txns' and the problem is somewhere inmy machines? Or something is actually broken\outdated? Thanks! Best regards, Andrey Borodin.
On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote: > Do I understand correctly that check-world tests on buildfarm 'make > check-prepared-txns' and the problem is somewhere in my machines? Or > something is actually broken\outdated? The buildfarm code has no trace of check-prepared-txns. And, FWIW, I see no failures at the top of REL9_6_STABLE. Do you mean that this happens only with your patch? Or do you mean that you see failures using the stable branch of upstream? I have not tested the former, but the latter works fine on my end. -- Michael
Вложения
> 23 дек. 2020 г., в 12:52, Michael Paquier <michael@paquier.xyz> написал(а): > > On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote: >> Do I understand correctly that check-world tests on buildfarm 'make >> check-prepared-txns' and the problem is somewhere in my machines? Or >> something is actually broken\outdated? > > FWIW, I > see no failures at the top of REL9_6_STABLE. Thanks, Michael! The problem was indeed within my machines. maintainer-cleanup is not enough for make check-prepared-txns.Fresh real installation is necessary. I've checked that test works down to REL9_5_STABLE with patch. Thanks! Best regards, Andrey Borodin.
Thanks for looking into this! > 17 янв. 2021 г., в 12:24, Noah Misch <noah@leadboat.com> написал(а): > >> --- a/src/backend/storage/lmgr/lock.c >> +++ b/src/backend/storage/lmgr/lock.c >> @@ -2931,15 +2929,17 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp) >> >> /* >> * Allocate memory to store results, and fill with InvalidVXID. We only >> - * need enough space for MaxBackends + a terminator, since prepared xacts >> - * don't count. InHotStandby allocate once in TopMemoryContext. >> + * need enough space for MaxBackends + max_prepared_xacts + a >> + * terminator. InHotStandby allocate once in TopMemoryContext. >> */ >> if (InHotStandby) >> { >> if (vxids == NULL) >> vxids = (VirtualTransactionId *) >> MemoryContextAlloc(TopMemoryContext, >> - sizeof(VirtualTransactionId) * (MaxBackends + 1)); >> + sizeof(VirtualTransactionId) * (MaxBackends >> + + max_prepared_xacts >> + + 1)); > > PostgreSQL typically puts the operator before the newline. Also, please note > the whitespace error that "git diff --check origin/master" reports. Fixed. > >> } >> else >> vxids = (VirtualTransactionId *) > > This is updating only the InHotStandby branch. Both branches need the change. Fixed. > >> @@ -4461,9 +4462,21 @@ bool >> VirtualXactLock(VirtualTransactionId vxid, bool wait) >> { >> LOCKTAG tag; >> - PGPROC *proc; >> + PGPROC *proc = NULL; >> >> - Assert(VirtualTransactionIdIsValid(vxid)); >> + Assert(VirtualTransactionIdIsValid(vxid) || >> + VirtualTransactionIdIsPreparedXact(vxid)); >> + >> + if (VirtualTransactionIdIsPreparedXact(vxid)) >> + { >> + LockAcquireResult lar; >> + /* If it's prepared xact - just wait for it */ >> + SET_LOCKTAG_TRANSACTION(tag, vxid.localTransactionId); >> + lar = LockAcquire(&tag, ShareLock, false, !wait); >> + if (lar == LOCKACQUIRE_OK) > > This should instead test "!= LOCKACQUIRE_NOT_AVAIL", lest > LOCKACQUIRE_ALREADY_HELD happen. (It perhaps can't happen, but skipping the > LockRelease() would be wrong if it did.) I think that code that successfully acquired lock should release it. Other way we risk to release someone's else lock heldfor a reason. We only acquire lock to release it instantly anyway. > >> + LockRelease(&tag, ShareLock, false); >> + return lar != LOCKACQUIRE_NOT_AVAIL; >> + } >> >> SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); >> >> diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h >> index 1c3e9c1999..cedb9d6d2a 100644 >> --- a/src/include/storage/lock.h >> +++ b/src/include/storage/lock.h >> @@ -70,6 +70,8 @@ typedef struct >> #define VirtualTransactionIdIsValid(vxid) \ >> (((vxid).backendId != InvalidBackendId) && \ >> LocalTransactionIdIsValid((vxid).localTransactionId)) >> +#define VirtualTransactionIdIsPreparedXact(vxid) \ >> + ((vxid).backendId == InvalidBackendId) > > Your patch is introducing VirtualTransactionId values that represent prepared > xacts, and it has VirtualTransactionIdIsValid() return false for those values. > Let's make VirtualTransactionIdIsValid() return true for them, since they're > as meaningful as any other value. The GetLockConflicts() header comment says > "The result array is palloc'd and is terminated with an invalid VXID." Patch > v4 falsifies that comment. The array can contain many of these new "invalid" > VXIDs, and an all-zeroes VXID terminates the array. Rather than change the > comment, let's change VirtualTransactionIdIsValid() to render the comment true > again. Most (perhaps all) VirtualTransactionIdIsValid() callers won't need to > distinguish the prepared-transaction case. Makes sense, fixed. I was afraid that there's something I'm not aware about. I've iterated over VirtualTransactionIdIsValid()calls and did not find suspicious cases. > An alternative to redefining VXID this way would be to have some new type, > each instance of which holds either a valid VXID or a valid > prepared-transaction XID. That alternative feels inferior to me, though. > What do you think? I think we should not call valid vxids "invalid". By the way maybe rename "check-prepared-txns" to "check-prepared-xacts" for consistency? Thanks! Best regards, Andrey Borodin.
Вложения
> 22 янв. 2021 г., в 10:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > <v5-0001-Wait-for-prepared-xacts-in-CREATE-INDEX-CONCURREN.patch><v5-0002-Add-test-for-CIC-with-prepared-xacts.patch> Uh, vscode did not save files and I've send incorrect version. Disregard v5. Sorry for the noise. Best regards, Andrey Borodin.
Вложения
> 24 янв. 2021 г., в 07:27, Noah Misch <noah@leadboat.com> написал(а): > > I changed that, updated comments, and fixed pgindent damage. I plan to push > the attached version. I see that patch was pushed. I'll flip status of CF entry. Many thanks! Best regards, Andrey Borodin.
> 30 янв. 2021 г., в 21:06, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > >> 24 янв. 2021 г., в 07:27, Noah Misch <noah@leadboat.com> написал(а): >> >> I changed that, updated comments, and fixed pgindent damage. I plan to push >> the attached version. > > I see that patch was pushed. I'll flip status of CF entry. Many thanks! FWIW I have 2 new reported cases on 12.6. I've double-checked that at the moment of corruption installation run version withthe patch. To the date I could not reproduce the problem myself, but I'll continue working on this. Thanks! Best regards, Andrey Borodin.
> 1 мая 2021 г., в 17:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > FWIW I have 2 new reported cases on 12.6. Sorry to say that, but $subj persists. Here's a simple reproduction. To get corrupted index you need 3 psql sessions A, B and C. By default command is executed in A. create extension amcheck ; create table t1(i int); create index on t1(i); begin; insert into t1 values(0); -- session C: reindex table concurrently t1; prepare transaction 'a'; begin; insert into t1 values(0); -- session B: commit prepared 'a'; prepare transaction 'b'; begin; insert into t1 values(0); -- session B: commit prepared 'b'; prepare transaction 'c'; begin; insert into t1 values(0); -- session B: commit prepared 'c'; prepare transaction 'd'; commit prepared 'd'; -- session C: postgres=# select bt_index_check('i1',true); ERROR: heap tuple (0,2) from table "t1" lacks matching index tuple within index "i1" HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error. The problem is WaitForLockersMultiple() gathers running vxids and 2pc xids. Then it waits, but if vxid is converted to 2pcit fails to wait. I could not compose isolation test for this, because we need to do "prepare transaction 'a';" only when "reindex table concurrentlyt1;" is already working for some time. To fix it we can return locking xids along with vxids from GetLockConflicts() like in attached diff. But this approach seemsugly. Best regards, Andrey Borodin.
Вложения
> 18 июля 2021 г., в 01:12, Noah Misch <noah@leadboat.com> написал(а): > > Suppose some transaction has a vxid but no xid. Shortly after > GetLockConflicts(), it acquires an xid, modifies the table, and issues PREPARE > TRANSACTION. Could that cause a corrupt index even with this diff? Firstly I've tried to stress things out. This little go program [0] easily reproduces corruption on patched code. Meanwhile vxid->xid->2px program does not [1] (both patched and unpatched). I think CIC does not care much about VXIDs at all. It's only important when real XID started: before GetLockConflicts() orafter. Thanks! Best regards, Andrey Borodin. [0] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-xid-2px [1] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-vxid-xid-2px
> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а): > > To fix $SUBJECT, it sounds like we need a way to identify a transaction, > usable as early as the transaction's first catalog access and remaining valid > until COMMIT PREPARED finishes. We may initially see a transaction as having > a VXID and no XID, then later need to wait for that transaction when it has > entered prepared state, having an XID and no VXID. How might we achieve that? PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments explaining whatis going on. We write actual VXID into dummy proc entries of prepared xact. When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared 2pc state.If vxid is not there - it means it is already gone and there's nothing to wait. Thanks! Best regards, Andrey Borodin.
Вложения
> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а): > > On Mon, Jul 19, 2021 at 12:10:52PM +0500, Andrey Borodin wrote: >>>> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а): >>> >>> To fix $SUBJECT, it sounds like we need a way to identify a transaction, >>> usable as early as the transaction's first catalog access and remaining valid >>> until COMMIT PREPARED finishes. We may initially see a transaction as having >>> a VXID and no XID, then later need to wait for that transaction when it has >>> entered prepared state, having an XID and no VXID. How might we achieve that? >> >> PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments explainingwhat is going on. >> We write actual VXID into dummy proc entries of prepared xact. >> When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared 2pcstate. If vxid is not there - it means it is already gone and there's nothing to wait. > > When the system reuses BackendId values, it reuses VXID values. In the > general case, two prepared transactions could exist simultaneously with the > same BackendId+LocalTransactionId. Hmm. It could be okay to have a small > probability that CIC waits on more transactions than necessary. Suppose we > have three PGPROC entries with the same VXID, two prepared transactions and > one regular transaction. Waiting for all three could be tolerable, though > avoiding that would be nice. Should we follow transactions differently to > avoid that? We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID. But I think that we haveto wait for all 2PCs with the same VXID. We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID onlyonce. Other 2PCs with same VXID may be older or newer than target 2PC. Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous VXID->XIDmapping choose oldest XID. But this logic seem to me overly complicated. Or isn’t it? Best regards, Andrey Borodin.
> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID onlyonce. > Other 2PCs with same VXID may be older or newer than target 2PC. > Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous VXID->XIDmapping choose oldest XID. > > But this logic seem to me overly complicated. Or isn’t it? Here's the PoC to asses complexity of this solution. Best regards, Andrey Borodin.
Вложения
> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID onlyonce. > Other 2PCs with same VXID may be older or newer than target 2PC. > Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous VXID->XIDmapping choose oldest XID. > > But this logic seem to me overly complicated. Or isn’t it? > Other 2PCs with same VXID may be older or newer than target 2PC. Older 2PCs must be with XID in conflicts array. Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of some otherunrelated 2PC working with different relations. Sorry for the noise. Best regards, Andrey Borodin.
> 21 июля 2021 г., в 02:49, Noah Misch <noah@leadboat.com> написал(а): > > On Wed, Jul 21, 2021 at 12:38:25AM +0500, Andrey Borodin wrote: >>> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): >>>> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а): >>>> Suppose we >>>> have three PGPROC entries with the same VXID, two prepared transactions and >>>> one regular transaction. Waiting for all three could be tolerable, though >>>> avoiding that would be nice. Should we follow transactions differently to >>>> avoid that? >>> >>> We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID. > > I don't understand those two sentences. Could you write more? Suppose we have a VXID conflicting with reindexed relation lock, and have a PGPROC with regular Xid (not 2PC) for this VXID. We do not need to return this xid from TwoPhaseGetXidByVXid() for extra wait. This situation is covered by normal vxid handlingin VirtualXactLock(). + /* Save the xid to test if transaction coverted to 2pc later */ + xid = proc->xid; >>> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXIDonly once. >>> Other 2PCs with same VXID may be older or newer than target 2PC. >>> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguousVXID->XID mapping choose oldest XID. > >> Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of someother unrelated 2PC working with different relations. > > I agree. Would it work to do the following sequence in WaitForLockers()? > > 1. In GetLockConflicts(), record a list of conflicting XIDs. Also record a > list of conflicting LXIDs having no XID. > 2. Wait on all LXIDs recorded in (1). They have either ended or converted to > prepared transactions. > 3. Inner-join the present list of prepared transactions with the list of > LXIDs from (1). Record the XIDs of matched entries. > 4. Wait on all XIDs recorded in (1) and (3). > > While that may wait on some prepared XIDs needlessly, it can't degrade into > persistent starvation. We could reduce the chance of a needless XID wait by > remembering the list of old prepared XIDs before (1) and not adding any of > those remembered, old XIDs in (3). That last bit probably would be > unjustified complexity, but maybe not. I think this protocol is equivalent to waiting on all Xids with VXid. I consider this protocol safe. FPA implementation. Patch 0001 is intact version of previous patch. There are two additions: 1. Prefer xids to vxids in GetLockConflicts() 2. Wait on all 2PCs with given VXid. Thanks! Best regards, Andrey Borodin.
Вложения
> 21 июля 2021 г., в 22:55, Noah Misch <noah@leadboat.com> написал(а): > > These drafts use reasonable concepts. Would you develop them into a > ready-to-commit patch? You may be able to translate your probabilistic test > procedures from https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc > into something like the src/bin/pgbench/t/001_pgbench_with_server.pl test of > pg_enum_oid_index. I'm working on it. Is it OK to use amcheck in pgbench tests? Or is it better to add the test to amcheck? Best regards, Andrey Borodin.
On Fri, Jul 23, 2021 at 3:30 PM Noah Misch <noah@leadboat.com> wrote: > It could be okay, but I think it's better to add the test under amcheck. You > could still use pgbench in the test. +1. Writing the test that way seems more robust. If somebody writes a patch that reintroduces a similar though distinct bug, then it seems more likely to be caught. In other words, there is less of a danger of the test over-specifying what incorrect means if it uses amcheck directly. -- Peter Geoghegan
> 24 июля 2021 г., в 03:30, Noah Misch <noah@leadboat.com> написал(а): > > It could be okay, but I think it's better to add the test under amcheck. You > could still use pgbench in the test. Currently it's still WIP. I've added two tests: deterministic with psql and probabilistic with pgbench. And I really do not like pgbench test: 1. It does not seem stable enough, it can turn buildfarm red as a good watermelon. 2. Names for 2PCs are choosen at random and have probability of collision. 3. It still breaks the fix and I have no idea why. Can you please take a look on added TAP test? Probably I'm doing wrong a lot of things, it's the very first program on Perlwritten by me... background_pgbench is a local invention. sub pgbench is a copy from nearby test. Should I refactor it somewhere? Thanks! Best regards, Andrey Borodin.
Вложения
> 27 июля 2021 г., в 17:50, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > 3. It still breaks the fix and I have no idea why. I'm still coping with the bug. I have a stable reproduction, but to the date unable to identify roots of the problem. Here's the sample trace: 2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl LOG: statement: REINDEX INDEX CONCURRENTLY idx; 2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl WARNING: Phase 1 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: Phase 2 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3493 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3493, vxid -1/3493 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3490 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3490, vxid -1/3490 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3492 2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3492, vxid -1/3492 2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING: Phase 3 2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING: Phase 4 2021-07-29 17:14:29.000 +05 [61148] 002_cic_2pc.pl WARNING: Phase 5 2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3512 2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3512, vxid -1/3512 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3513 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3513, vxid -1/3513 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: Phase 6 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3516 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3516, vxid -1/3516 2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3515 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3515, vxid -1/3515 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3517 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3517, vxid -1/3517 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid 4/1002 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: Backend is doing something else 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 0, vxid 4/1002 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: Sucessfully found xid by vxid 0 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: Phase Final 2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl LOG: statement: SELECT bt_index_check('idx',true); 2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl ERROR: heap tuple (18,74) from table "tbl" lacks matching index tuplewithin index "idx" xmin 3504 xmax 0 2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl STATEMENT: SELECT bt_index_check('idx',true); Rogue tuple was committed by xid 3504 which was never returned by GetLockConflicts(). I'm attaching patch set just for thereference of the trace, not expecting code review now. I've fixed unrelated bug in previous patchset. - SET_LOCKTAG_TRANSACTION(tag, xid); + SET_LOCKTAG_TRANSACTION(tag, xidlist.xid); lar = LockAcquire(&tag, ShareLock, false, !wait); if (lar != LOCKACQUIRE_NOT_AVAIL) LockRelease(&tag, ShareLock, false); - return lar != LOCKACQUIRE_NOT_AVAIL; + if (lar == LOCKACQUIRE_NOT_AVAIL) + return false; But it does not help. Maybe I've broken something by my fix...Searching further. Thanks for reading! Would be happy to hear any ideas why xid was not locked by GetLockConflicts() bug committed a tuple whichwas not indexed. Best regards, Andrey Borodin.
Вложения
> 29 июля 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > Searching further. Currently CIC test breaks indexes even without 2PC. How is it supposed to work if vxid stays vxid through GetLockConflicts()\WaitForLockersMultiple()barrier and then suddenly converts to xid and commits before index validated? Best regards, Andrey Borodin.
Вложения
> 30 июля 2021 г., в 07:25, Noah Misch <noah@leadboat.com> написал(а): > What alternative fix designs should we consider? I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly. Unindexed tuple was committed somewhere at the end of Phase 3 or 4. 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG: statement: REINDEX INDEX CONCURRENTLY idx; 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 1 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 2 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6735 2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6736 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: Phase 3 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6750 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 4 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 5 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6762 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6763 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: Phase 6 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid 6/2166 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6767 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6764 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6765 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING: Phase Final 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG: statement: SELECT bt_index_check('idx',true); 2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR: heap tuple (46,16) from table "tbl" lacks matching index tuplewithin index "idx" xmin 6751 xmax 0 Attaching debug logging patch, amcheck is modified to return xmin. Trace is gathered by "grep -e ERROR -e REINDEX -e WARN-e SELECT tmp_check/log/002_cic_2pc_CIC_2PC_test.log". How deep the rabbit hole goes? Thanks! Best regards, Andrey Borodin.
Вложения
> 30 июля 2021 г., в 23:41, Noah Misch <noah@leadboat.com> написал(а): > > On Fri, Jul 30, 2021 at 03:42:10PM +0500, Andrey Borodin wrote: >>> 30 июля 2021 г., в 07:25, Noah Misch <noah@leadboat.com> написал(а): >>> What alternative fix designs should we consider? >> >> I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly. >> Unindexed tuple was committed somewhere at the end of Phase 3 or 4. >> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG: statement: REINDEX INDEX CONCURRENTLY idx; >> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 1 >> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 2 >> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6735 >> 2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6736 >> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: Phase 3 >> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6750 >> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 4 >> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 5 >> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6762 >> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6763 >> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: Phase 6 >> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid 6/2166 >> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6767 >> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6764 >> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6765 >> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING: Phase Final >> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG: statement: SELECT bt_index_check('idx',true); >> 2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR: heap tuple (46,16) from table "tbl" lacks matching index tuplewithin index "idx" xmin 6751 xmax 0 > > I see a failure, too. Once again, "i:" lines are events within the INSERT > backend, and "r:" lines are events within the REINDEX CONCURRENTLY backend: > > r: Phase 2 begins. > i: INSERT. Start PREPARE. > r: Phase 2 commits indisready=t for idx_ccnew. > r: Start waiting for the INSERT to finish. > i: PREPARE finishes. > r: Wake up and start validate_index(). This is a problem. It needed to wait > for COMMIT PREPARED to finish. I'l investigate this scenario. I've tried to sprinkle some more WaitForLockersMultiple() yet without success. > This may have a different explanation than the failure you saw, because my > INSERT transaction already had a permanent XID before the start of phase 3. I > won't have time to study this further in the next several days. Can you find > out where things go wrong? I'll try. This bug is #1 priority for me. We repack ~pb of indexes each weekend (only bloated, many in fact are bloated).And seems like they all are endangered. > The next thing I would study is VirtualXactLock(), > specifically what happens if the lock holder is a normal backend (with or > without an XID) when VirtualXactLock() starts but becomes a prepared > transaction (w/ different PGPROC) before VirtualXactLock() ends. PreparedXactLock() will do the trick. If we have xid - we always take a lock on xid. If we have vxid - we try to convertit to xid and look in all PGPROCs for 2PCs. And then again - wait for xid. At this point I'm certain that if any transaction is reported by GetLockConflicts() it will get awaited by VirtualXactLock(). The problem is that rogue transaction was never reported by GetLockConflicts(). Thanks! Best regards, Andrey Borodin.
On Sat, Jul 31, 2021 at 11:37 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > I'll try. This bug is #1 priority for me. We repack ~pb of indexes each weekend (only bloated, many in fact are bloated).And seems like they all are endangered. I have a tip, which might make life a little easier: maybe use rr for this? See: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework Having a stable recording with fixed logical times *vastly* simplifies debugging in practice. I found this very useful when debugging the SSI Jepsen bug, which was a complex race condition that could only be reproduced through a complex procedure. Which is also what we have here now, more or less. Your personal notes on the bug can sensibly track specific pointer values, logical times, etc. You can even "pack" a recording, making it into a self-contained thing that doesn't rely on having the original binaries. This makes the recording stable enough to keep for weeks or months, share with other people, etc. -- Peter Geoghegan
> 31 июля 2021 г., в 13:51, Peter Geoghegan <pg@bowt.ie> написал(а): > > maybe use rr for this? Thanks, Peter! I'll give it a try. Currently I observe that during PrepareTransaction() there might be a moment when the transaction is not reported by GetLockConflicts()with relevant lock. I observe that PrepareTransaction() for violating xid is somewhere between CallXactCallbacks(XACT_EVENT_PREPARE); and PostPrepare_PgStat(); And GetLockConflicts() fails to detect this transaction. Thanks! Best regards, Andrey Borodin.
> 31 июля 2021 г., в 13:51, Peter Geoghegan <pg@bowt.ie> написал(а): > > maybe use rr for this? rr helps to explain why something happened. But how to get why something did not happen? During GetLockConflicts() proclocks SHMQueue do not include locks of concurrently running xid within PrepareTransaction()[seems like it's with PostPrepare_locks(), but I'm not sure]. The problem is reproducible within 1 second by the script leading to amcheck-detected corruption. I'd appreciate one more hint... Thanks! Best regards, Andrey Borodin.
> 6 авг. 2021 г., в 23:03, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > But how to get why something did not happen? > During GetLockConflicts() proclocks SHMQueue do not include locks of concurrently running xid within PrepareTransaction()[seems like it's with PostPrepare_locks(), but I'm not sure]. > The problem is reproducible within 1 second by the script leading to amcheck-detected corruption. Ok, finally I've figured out this charade. AtPrepare_Locks() transfers fast-path locks to normal locks pointing to PGPROC with xid in a process of conversion to 2PC. ProcArrayClearTransaction(MyProc) resets xid int PGROCs. GetLockConflicts() sees empty xid and vxid. PostPrepare_Locks(xid) hids things with new PGPROC for 2PC. PFA PoC fix. But I'm entirely not sure it's OK to do reorder things at PrepareTransaction() this way. Thanks! Best regards, Andrey Borodin.
Вложения
> 7 авг. 2021 г., в 20:33, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > <v9-0001-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch> > <v9-0004-PoC-fix-clear-xid.patch> > <v9-0003-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxid.patch> > <v9-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcach.patch> Here's v10. Changes: 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably, needa better data structure. 2. Editorialised step 3 (vxid->xid lookup). Fixing typos and some small bugs. Tomorrow I'll try to cleanup step 1 (tap tests). Thanks! Best regards, Andrey Borodin.
Вложения
On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote: > Changes: > 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably, needa better data structure. I don't know either, hence that quick data structure to delay the question. debug_discard_caches=3 may help answer the question. RelationBuildDesc() reads pg_constraint, which is !rd_isnailed. Hence, I expect one can at least get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table"). > Tomorrow I'll try to cleanup step 1 (tap tests). For the final commits, I would like to have two commits. The first will fix the non-2PC bug and add the test for that fix. The second will fix the 2PC bug and add the additional test. Feel free to structure your changes into more patch files, so long as it's straightforward to squash them into two commits that way. A few things I noticed earlier: > --- a/contrib/amcheck/Makefile > +++ b/contrib/amcheck/Makefile > @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity" > > REGRESS = check check_btree check_heap > > -TAP_TESTS = 1 > +TAP_TESTS = 2 TAP_TESTS is a Boolean flag, not a count. Leave it set to 1. > --- /dev/null > +++ b/contrib/amcheck/t/002_cic_2pc.pl > +$node->append_conf('postgresql.conf', 'lock_timeout = 5000'); > +my $main_timer = IPC::Run::timeout(5); > +my $reindex_timer = IPC::Run::timeout(5); Values this short cause spurious buildfarm failures. Generally, use 180s for timeouts that do not expire in a successful run. > +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench: > +# CIC will deadlock with itself occasionally. Consider fixing that by taking an advisory lock before running CIC. However, if use of separate pgbench executions works better, feel free to keep that. > +pgbench( > + '--no-vacuum --client=5 --time=1', Consider using a transaction count instead of --time. That way, slow machines still catch the bug, and fast machines waste less time. For the "concurrent OID generation" test, I tuned the transaction count so the test failed about half the time[1] on a buggy build. > + \set txid random(1, 1000000000) > + BEGIN; > + INSERT INTO tbl VALUES(0); > + PREPARE TRANSACTION 'tx:txid'; Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the chance of collision. [1] https://postgr.es/m/20160215171129.GA347322@tornado.leadboat.com
> 8 авг. 2021 г., в 03:19, Noah Misch <noah@leadboat.com> написал(а): > > On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote: >> Changes: >> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably, needa better data structure. > > I don't know either, hence that quick data structure to delay the question. > debug_discard_caches=3 may help answer the question. RelationBuildDesc() > reads pg_constraint, which is !rd_isnailed. Hence, I expect one can at least > get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table"). I've toyed around with $node->append_conf('postgresql.conf', 'debug_invalidate_system_caches_always = 3'); in test. I had failures only at most with Assert(in_progress_offset < 4); See [0] for stack trace. But I do not think that it proves that deeper calls are impossible with other DB schemas. >> Tomorrow I'll try to cleanup step 1 (tap tests). > > For the final commits, I would like to have two commits. The first will fix > the non-2PC bug and add the test for that fix. The second will fix the 2PC > bug and add the additional test. Feel free to structure your changes into > more patch files, so long as it's straightforward to squash them into two > commits that way. Done so. Step 1. Test for CIC with regular transactions. Step 2. Fix Step 3. Test for CIC with 2PC Step 4. Part of the fix that I'm sure about Step 5. Dubious part of fix... I had to refactor subs pgbench and pgbench_background to be reused among tests. Though I did not refactor pgbench tests touse this functions: they have two different versions for sub pgbench. > A few things I noticed earlier: > >> --- a/contrib/amcheck/Makefile >> +++ b/contrib/amcheck/Makefile >> @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity" >> >> REGRESS = check check_btree check_heap >> >> -TAP_TESTS = 1 >> +TAP_TESTS = 2 > > TAP_TESTS is a Boolean flag, not a count. Leave it set to 1. Fixed. >> --- /dev/null >> +++ b/contrib/amcheck/t/002_cic_2pc.pl > >> +$node->append_conf('postgresql.conf', 'lock_timeout = 5000'); > >> +my $main_timer = IPC::Run::timeout(5); > >> +my $reindex_timer = IPC::Run::timeout(5); > > Values this short cause spurious buildfarm failures. Generally, use 180s for > timeouts that do not expire in a successful run. Fixed. >> +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench: >> +# CIC will deadlock with itself occasionally. > > Consider fixing that by taking an advisory lock before running CIC. However, > if use of separate pgbench executions works better, feel free to keep that. I've tried this approach. Advisory lock owns vxid, thus deadlocking with CIC. >> +pgbench( >> + '--no-vacuum --client=5 --time=1', > > Consider using a transaction count instead of --time. That way, slow machines > still catch the bug, and fast machines waste less time. For the "concurrent > OID generation" test, I tuned the transaction count so the test failed about > half the time[1] on a buggy build. I've tuned tests for my laptop to observe probabilistic test to pass sometimes. >> + \set txid random(1, 1000000000) >> + BEGIN; >> + INSERT INTO tbl VALUES(0); >> + PREPARE TRANSACTION 'tx:txid'; > > Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the > chance of collision. Fixed. How to you think, do we have a chance to fix things before next release on August 12th? Thanks! Best regards, Andrey Borodin.
Вложения
On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote: > > 8 авг. 2021 г., в 03:19, Noah Misch <noah@leadboat.com> написал(а): > > On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote: > >> Changes: > >> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably, needa better data structure. > > > > I don't know either, hence that quick data structure to delay the question. > > debug_discard_caches=3 may help answer the question. RelationBuildDesc() > > reads pg_constraint, which is !rd_isnailed. Hence, I expect one can at least > > get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table"). > I've toyed around with > $node->append_conf('postgresql.conf', 'debug_invalidate_system_caches_always = 3'); in test. > I had failures only at most with > Assert(in_progress_offset < 4); > See [0] for stack trace. But I do not think that it proves that deeper calls are impossible with other DB schemas. I didn't find the [0] link in your message; can you send it again? I suspect no rel can appear more than once in the stack, and all but one rel will be a system catalog. That said, dynamically growing the array is reasonable, even if a small maximum depth theoretically exists. > Step 1. Test for CIC with regular transactions. > Step 2. Fix > Step 3. Test for CIC with 2PC > Step 4. Part of the fix that I'm sure about > Step 5. Dubious part of fix... > How to you think, do we have a chance to fix things before next release on August 12th? No. At a minimum, we still need to convince ourselves that step 5 is correct. Even if that were already done, commits to released branches get more cautious in the few days before the wrap date (tomorrow). Once it's committed, you could contact pgsql-release@postgresql.org to propose an out-of-cycle release.
On Sun, Aug 08, 2021 at 09:37:52AM -0700, Noah Misch wrote: > On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote: > > Step 1. Test for CIC with regular transactions. > > Step 2. Fix > > Step 3. Test for CIC with 2PC > > Step 4. Part of the fix that I'm sure about > > Step 5. Dubious part of fix... When I applied all five to commit b33259e^ (shortly before the PostgresNode compatibility break) and ran your tests in a loop, I got more 2PC "lacks matching index tuple" within one minute. Here's how I ran the tests: make -j20 && while make -C contrib/amcheck check REGRESS= PROVE_FLAGS=--timer PROVE_TESTS='t/002_cic.pl t/003_cic_2pc.pl'NO_TEMP_INSTALL=1; do date; done Do you see failures with that loop? If so, can you diagnose them? (So far, I've not seen a failure from the 1PC test.) On Thu, Jul 29, 2021 at 10:25:48PM -0400, Noah Misch wrote: > I haven't checked whether other inval message types have the same hazard. I will look into this next. (It probably doesn't explain 2PC failures.)
On Sat, Aug 14, 2021 at 03:13:28PM -0700, Noah Misch wrote: > On Sun, Aug 08, 2021 at 09:37:52AM -0700, Noah Misch wrote: > > On Sun, Aug 08, 2021 at 04:31:07PM +0500, Andrey Borodin wrote: > > > Step 1. Test for CIC with regular transactions. > > > Step 2. Fix > > > Step 3. Test for CIC with 2PC > > > Step 4. Part of the fix that I'm sure about > > > Step 5. Dubious part of fix... > > When I applied all five to commit b33259e^ (shortly before the PostgresNode > compatibility break) and ran your tests in a loop, I got more 2PC "lacks > matching index tuple" within one minute. Here's how I ran the tests: > > make -j20 && while make -C contrib/amcheck check REGRESS= PROVE_FLAGS=--timer PROVE_TESTS='t/002_cic.pl t/003_cic_2pc.pl'NO_TEMP_INSTALL=1; do date; done > > Do you see failures with that loop? If so, can you diagnose them? See below. > (So far, I've not seen a failure from the 1PC test.) I eventually did see one. Just one 1PC failure in six hours of 1PC test runtime, though. > On Thu, Jul 29, 2021 at 10:25:48PM -0400, Noah Misch wrote: > > I haven't checked whether other inval message types have the same hazard. > > I will look into this next. (It probably doesn't explain 2PC failures.) Now that I've looked, other inval hazards might indeed explain the remaining 2PC and 1PC failures. I'll plug those holes and see what failures, if any, continue to happen.
> 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а): >> >> Do you see failures with that loop? If so, can you diagnose them? > > See below. I do not observe failure on my laptop, though reproduced it on a linux server. I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master. > >> (So far, I've not seen a failure from the 1PC test.) > > I eventually did see one. Just one 1PC failure in six hours of 1PC test > runtime, though. I've attached a patch that reproduces the problem in 30sec on my server. Thanks! Best regards, Andrey Borodin.
Вложения
- v12-0006-Do-CIC-test-time-based-to-ensure-bug-repro.patch
- v12-0005-PoC-fix-clear-xid.patch
- v12-0004-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxi.patch
- v12-0003-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch
- v12-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch
- v12-0001-Introduce-TAP-test-for-CIC.patch
Hi, On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote: > From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001 > From: Andrey Borodin <amborodin@acm.org> > Date: Fri, 30 Jul 2021 14:40:16 +0500 > Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache > invalidation > > --- > src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c > index 13d9994af3..7eec7b1f41 100644 > --- a/src/backend/utils/cache/relcache.c > +++ b/src/backend/utils/cache/relcache.c > @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2) > * (suggesting we are trying to access a just-deleted relation). > * Any other error is reported via elog. > */ > +typedef struct InProgressRels { > + Oid relid; > + bool invalidated; > +} InProgressRels; > +static InProgressRels inProgress[100]; > + > static Relation > RelationBuildDesc(Oid targetRelId, bool insertIt) > { > + int in_progress_offset; > Relation relation; > Oid relid; > HeapTuple pg_class_tuple; > @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt) > } > #endif > > + for (in_progress_offset = 0; > + OidIsValid(inProgress[in_progress_offset].relid); > + in_progress_offset++) > + ; > + inProgress[in_progress_offset].relid = targetRelId; > +retry: > + inProgress[in_progress_offset].invalidated = false; > + > /* > * find the tuple in pg_class corresponding to the given relation id > */ > @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt) > */ > heap_freetuple(pg_class_tuple); > > + if (inProgress[in_progress_offset].invalidated) > + goto retry; /* TODO free old one */ > + /* inProgress is in fact the stack, we can safely remove it's top */ > + inProgress[in_progress_offset].relid = InvalidOid; > + Assert(inProgress[in_progress_offset + 1].relid == InvalidOid); > + > /* > * Insert newly created relation into relcache hash table, if requested. > * > @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId) > relcacheInvalsReceived++; > RelationFlushRelation(relation); > } > + else > + { > + int i; > + for (i = 0; OidIsValid(inProgress[i].relid); i++) > + if (inProgress[i].relid == relationId) > + inProgress[i].invalidated = true; > + } > } Desperately needs comments. Without a commit message and without comments it's hard to review this without re-reading the entire thread - which approximately nobody will do. > From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001 > From: Andrey Borodin <amborodin@acm.org> > Date: Sat, 7 Aug 2021 20:27:14 +0500 > Subject: [PATCH v12 5/6] PoC fix clear xid > > --- > src/backend/access/transam/xact.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c > index 387f80419a..9b19d939eb 100644 > --- a/src/backend/access/transam/xact.c > +++ b/src/backend/access/transam/xact.c > @@ -2500,7 +2500,6 @@ PrepareTransaction(void) > * done *after* the prepared transaction has been marked valid, else > * someone may think it is unlocked and recyclable. > */ > - ProcArrayClearTransaction(MyProc); > > /* > * In normal commit-processing, this is all non-critical post-transaction > @@ -2535,6 +2534,8 @@ PrepareTransaction(void) > PostPrepare_MultiXact(xid); > > PostPrepare_Locks(xid); > + > + ProcArrayClearTransaction(MyProc); > PostPrepare_PredicateLocks(xid); The comment above ProcArrayClearTransaction would need to be moved and updated... > From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001 > From: Andrey Borodin <amborodin@acm.org> > Date: Mon, 19 Jul 2021 11:50:02 +0500 > Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids > converted to 2pc ... > +/* > + * TwoPhaseGetXidByVXid > + * Try to lookup for vxid among prepared xacts > + */ > +XidListEntry > +TwoPhaseGetXidByVXid(VirtualTransactionId vxid) > +{ > + int i; > + XidListEntry result; > + result.next = NULL; > + result.xid = InvalidTransactionId; > + > + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); > + > + for (i = 0; i < TwoPhaseState->numPrepXacts; i++) > + { > + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; > + PGPROC *proc = &ProcGlobal->allProcs[gxact->pgprocno]; > + VirtualTransactionId proc_vxid; > + GET_VXID_FROM_PGPROC(proc_vxid, *proc); > + > + if (VirtualTransactionIdEquals(vxid, proc_vxid)) > + { > + if (result.xid != InvalidTransactionId) > + { > + /* Already has candidate - need to alloc some space */ > + XidListEntry *copy = palloc(sizeof(XidListEntry)); > + copy->next = result.next; > + copy->xid = result.xid; > + result.next = copy; > + } > + result.xid = gxact->xid; > + } > + } > + > + LWLockRelease(TwoPhaseStateLock); > + > + return result; > +} Dynamic memory allocations while holding a fairly central lock - one which is now going to be more contended - doesn't seem great. Is the memory context this is called in guaranteed to be of a proper duration? Including being reset in case there's an error at some point before the memory is freed? > +/* > + * WaitXact > + * > + * Wait for xid completition if have xid. Otherwise try to find xid among > + * two-phase procarray entries. > + */ > +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait) > +{ > + LockAcquireResult lar; > + LOCKTAG tag; > + XidListEntry xidlist; > + XidListEntry *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */ > + bool result; > + > + if (TransactionIdIsValid(xid)) > + { > + /* We know exact xid - no need to search in 2PC state */ > + xidlist.xid = xid; > + xidlist.next = NULL; > + } > + else > + { > + /* You cant have vxid among 2PCs if you have no 2PCs */ > + if (max_prepared_xacts == 0) > + return true; > + > + /* > + * We must search for vxids in 2pc state > + * XXX: O(N*N) complexity where N is number of prepared xacts > + */ > + xidlist = TwoPhaseGetXidByVXid(vxid); > + /* Return if transaction is gone entirely */ > + if (!TransactionIdIsValid(xidlist.xid)) > + return true; > + xidlist_ptr = xidlist.next; > + } Perhaps I missed this - but won't we constantly enter this path for non-2pc transactions? E.g. > @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) > */ > proc = BackendIdGetProc(vxid.backendId); > if (proc == NULL) > - return true; > + return WaitXact(vxid, InvalidTransactionId, wait); > > /* > * We must acquire this lock before checking the backendId and lxid > @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) > || proc->fpLocalTransactionId != vxid.localTransactionId) > { > LWLockRelease(&proc->fpInfoLock); > - return true; > + return WaitXact(vxid, InvalidTransactionId, wait); > } It seems like it's going to add a substantial amount of work even when no 2PC xacts are involved? > diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h > index e27e1a8fe8..a5f28d3a80 100644 > --- a/src/include/access/twophase.h > +++ b/src/include/access/twophase.h > @@ -25,6 +25,17 @@ > */ > typedef struct GlobalTransactionData *GlobalTransaction; > > +/* > + * XidListEntry is expected to be used as list very rarely. Under normal > + * circumstances TwoPhaseGetXidByVXid() returns only one xid. > + * But under certain conditions can return many xids or nothing. > + */ > +typedef struct XidListEntry > +{ > + TransactionId xid; > + struct XidListEntry* next; > +} XidListEntry; I don't think we should invent additional ad-hoc list types. Greetings, Andres Freund
On Sun, Aug 15, 2021 at 04:09:37PM +0500, Andrey Borodin wrote: > > 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а): > >> Do you see failures with that loop? If so, can you diagnose them? > I do not observe failure on my laptop, though reproduced it on a linux server. > I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master. > > Just one 1PC failure in six hours of 1PC test runtime, though. > I've attached a patch that reproduces the problem in 30sec on my server. Having studied the broader inval situation, I found just one additional gap that seemed potentially relevant. It didn't stop the failures under current tests, however. The attached patch replaces my last patch on this thread, so it should replace v12-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch in your series. (Like its predecessor, it's an unfinished proof-of-concept.) With v12, on my machine, the same loop took 2000s to get three failures, both of which were in the 1PC tests. I ran out of time to study the failure mechanism. Would you diagnose what happens when it fails on your server? Also see the larger review from Andres.
Вложения
Andres, Noah, thanks for the review. > 16 авг. 2021 г., в 10:13, Noah Misch <noah@leadboat.com> написал(а): > > With v12, on my machine, the same loop took 2000s to get three failures, both > of which were in the 1PC tests. I ran out of time to study the failure > mechanism. Would you diagnose what happens when it fails on your server? I'm still in progress of investigation. With your fix for RelationBuildDesc() invals my reproduction is also quite slow -I get error within 10 minutes or so. I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid (takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? Is it possible that xid is in PGPROC before backend acquires its own lock on xid? > Also see the larger review from Andres. Sure, I will address Andres's notes as long as patches will finally work without errors. I agree with the idea of not inventingdynamic list and other pointed problems. Best regards, Andrey Borodin.
> 22 авг. 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid (takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? > Is it possible that xid is in PGPROC before backend acquires its own lock on xid? Oh, that's it. We first publish xid in PGPROC and only then take a lock in lock manager on it. Ok, I know how to fix this. Currently when testing combination of all fixes I observe things like 'error running SQL: 'psql:<stdin>:1: ERROR: prepared transaction with identifier "a" is busy'' Looks like kind of race condition in tests. Thanks! Best regards, Andrey Borodin.
> 22 авг. 2021 г., в 22:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > > Currently when testing combination of all fixes I observe things like > 'error running SQL: 'psql:<stdin>:1: ERROR: prepared transaction with identifier "a" is busy'' > Looks like kind of race condition in tests. There was a race condition in deterministic 2PC test. Fixed with synchronisation points. PFA patches that work on my machines. I'm going to start fixing review notes if it will not break until tomorrow. BTW are subtransaction anything special wrt CIC? is it worth to sprinkle some SAVEPOINTs here and there, just to be sure? Thanks! Best regards, Andrey Borodin.
Вложения
- v12-0001-Introduce-TAP-test-for-CIC.patch
- v12-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch
- v12-0003-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch
- v12-0004-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxi.patch
- v12-0005-PoC-fix-clear-xid.patch
- v12-0006-Do-CIC-test-time-based-to-ensure-bug-repro.patch
On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote: > > 22 авг. 2021 г., в 22:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > Currently when testing combination of all fixes I observe things like > > 'error running SQL: 'psql:<stdin>:1: ERROR: prepared transaction with identifier "a" is busy'' > > Looks like kind of race condition in tests. > > There was a race condition in deterministic 2PC test. Fixed with synchronisation points. > PFA patches that work on my machines. > I'm going to start fixing review notes if it will not break until tomorrow. That is great news. > BTW are subtransaction anything special wrt CIC? is it worth to sprinkle some SAVEPOINTs here and there, just to be sure? Not especially. The AssignTransactionId() "Ensure parent(s) have XIDs" behavior may be important to CIC. In the test pgbench runs for scripts 002_pgbench_concurrent_2pc and 002_pgbench_concurrent_transaction, it likely wouldn't hurt to pass two equal-weight test scripts, one of which uses savepoints.
On Mon, Aug 23, 2021 at 10:37 PM Noah Misch <noah@leadboat.com> wrote: > > There was a race condition in deterministic 2PC test. Fixed with synchronisation points. > > PFA patches that work on my machines. > > I'm going to start fixing review notes if it will not break until tomorrow. > > That is great news. +1. Great detective work. -- Peter Geoghegan
Hi! > 15 авг. 2021 г., в 18:45, Andres Freund <andres@anarazel.de> написал(а): > > Hi, > > On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote: >> From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001 >> From: Andrey Borodin <amborodin@acm.org> >> Date: Fri, 30 Jul 2021 14:40:16 +0500 >> Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache >> invalidation >> >> --- >> src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c >> index 13d9994af3..7eec7b1f41 100644 >> --- a/src/backend/utils/cache/relcache.c >> +++ b/src/backend/utils/cache/relcache.c >> @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2) >> * (suggesting we are trying to access a just-deleted relation). >> * Any other error is reported via elog. >> */ >> +typedef struct InProgressRels { >> + Oid relid; >> + bool invalidated; >> +} InProgressRels; >> +static InProgressRels inProgress[100]; >> + >> static Relation >> RelationBuildDesc(Oid targetRelId, bool insertIt) >> { >> + int in_progress_offset; >> Relation relation; >> Oid relid; >> HeapTuple pg_class_tuple; >> @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt) >> } >> #endif >> >> + for (in_progress_offset = 0; >> + OidIsValid(inProgress[in_progress_offset].relid); >> + in_progress_offset++) >> + ; >> + inProgress[in_progress_offset].relid = targetRelId; >> +retry: >> + inProgress[in_progress_offset].invalidated = false; >> + >> /* >> * find the tuple in pg_class corresponding to the given relation id >> */ >> @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt) >> */ >> heap_freetuple(pg_class_tuple); >> >> + if (inProgress[in_progress_offset].invalidated) >> + goto retry; /* TODO free old one */ >> + /* inProgress is in fact the stack, we can safely remove it's top */ >> + inProgress[in_progress_offset].relid = InvalidOid; >> + Assert(inProgress[in_progress_offset + 1].relid == InvalidOid); >> + >> /* >> * Insert newly created relation into relcache hash table, if requested. >> * >> @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId) >> relcacheInvalsReceived++; >> RelationFlushRelation(relation); >> } >> + else >> + { >> + int i; >> + for (i = 0; OidIsValid(inProgress[i].relid); i++) >> + if (inProgress[i].relid == relationId) >> + inProgress[i].invalidated = true; >> + } >> } > > Desperately needs comments. Without a commit message and without > comments it's hard to review this without re-reading the entire thread - > which approximately nobody will do. I've added some comments. But it seems we should use dynamic allocations instead of 100-based array. > > >> From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001 >> From: Andrey Borodin <amborodin@acm.org> >> Date: Sat, 7 Aug 2021 20:27:14 +0500 >> Subject: [PATCH v12 5/6] PoC fix clear xid >> >> --- >> src/backend/access/transam/xact.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c >> index 387f80419a..9b19d939eb 100644 >> --- a/src/backend/access/transam/xact.c >> +++ b/src/backend/access/transam/xact.c >> @@ -2500,7 +2500,6 @@ PrepareTransaction(void) >> * done *after* the prepared transaction has been marked valid, else >> * someone may think it is unlocked and recyclable. >> */ >> - ProcArrayClearTransaction(MyProc); >> >> /* >> * In normal commit-processing, this is all non-critical post-transaction >> @@ -2535,6 +2534,8 @@ PrepareTransaction(void) >> PostPrepare_MultiXact(xid); >> >> PostPrepare_Locks(xid); >> + >> + ProcArrayClearTransaction(MyProc); >> PostPrepare_PredicateLocks(xid); > > The comment above ProcArrayClearTransaction would need to be moved and > updated... Fixed. > >> From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001 >> From: Andrey Borodin <amborodin@acm.org> >> Date: Mon, 19 Jul 2021 11:50:02 +0500 >> Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids >> converted to 2pc > ... > >> +/* >> + * TwoPhaseGetXidByVXid >> + * Try to lookup for vxid among prepared xacts >> + */ >> +XidListEntry >> +TwoPhaseGetXidByVXid(VirtualTransactionId vxid) >> +{ >> + int i; >> + XidListEntry result; >> + result.next = NULL; >> + result.xid = InvalidTransactionId; >> + >> + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); >> + >> + for (i = 0; i < TwoPhaseState->numPrepXacts; i++) >> + { >> + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; >> + PGPROC *proc = &ProcGlobal->allProcs[gxact->pgprocno]; >> + VirtualTransactionId proc_vxid; >> + GET_VXID_FROM_PGPROC(proc_vxid, *proc); >> + >> + if (VirtualTransactionIdEquals(vxid, proc_vxid)) >> + { >> + if (result.xid != InvalidTransactionId) >> + { >> + /* Already has candidate - need to alloc some space */ >> + XidListEntry *copy = palloc(sizeof(XidListEntry)); >> + copy->next = result.next; >> + copy->xid = result.xid; >> + result.next = copy; >> + } >> + result.xid = gxact->xid; >> + } >> + } >> + >> + LWLockRelease(TwoPhaseStateLock); >> + >> + return result; >> +} > > Dynamic memory allocations while holding a fairly central lock - one > which is now going to be more contended - doesn't seem great. > > Is the memory context this is called in guaranteed to be of a proper > duration? Including being reset in case there's an error at some point > before the memory is freed? I've removed custom list and all memory allocations. If there are multiple 2PCs with same vxid - we just wait for one, thenretry. >> +/* >> + * WaitXact >> + * >> + * Wait for xid completition if have xid. Otherwise try to find xid among >> + * two-phase procarray entries. >> + */ >> +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait) >> +{ >> + LockAcquireResult lar; >> + LOCKTAG tag; >> + XidListEntry xidlist; >> + XidListEntry *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */ >> + bool result; >> + >> + if (TransactionIdIsValid(xid)) >> + { >> + /* We know exact xid - no need to search in 2PC state */ >> + xidlist.xid = xid; >> + xidlist.next = NULL; >> + } >> + else >> + { >> + /* You cant have vxid among 2PCs if you have no 2PCs */ >> + if (max_prepared_xacts == 0) >> + return true; >> + >> + /* >> + * We must search for vxids in 2pc state >> + * XXX: O(N*N) complexity where N is number of prepared xacts >> + */ >> + xidlist = TwoPhaseGetXidByVXid(vxid); >> + /* Return if transaction is gone entirely */ >> + if (!TransactionIdIsValid(xidlist.xid)) >> + return true; >> + xidlist_ptr = xidlist.next; >> + } > > Perhaps I missed this - but won't we constantly enter this path for > non-2pc transactions? E.g. I've restored heuristics that if it's not 2PC - we just exit from WaitPreparedXact(). > >> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) >> */ >> proc = BackendIdGetProc(vxid.backendId); >> if (proc == NULL) >> - return true; >> + return WaitXact(vxid, InvalidTransactionId, wait); >> >> /* >> * We must acquire this lock before checking the backendId and lxid >> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) >> || proc->fpLocalTransactionId != vxid.localTransactionId) >> { >> LWLockRelease(&proc->fpInfoLock); >> - return true; >> + return WaitXact(vxid, InvalidTransactionId, wait); >> } > > It seems like it's going to add a substantial amount of work even when > no 2PC xacts are involved? Only if 2PCs are enabled. >> diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h >> index e27e1a8fe8..a5f28d3a80 100644 >> --- a/src/include/access/twophase.h >> +++ b/src/include/access/twophase.h >> @@ -25,6 +25,17 @@ >> */ >> typedef struct GlobalTransactionData *GlobalTransaction; >> >> +/* >> + * XidListEntry is expected to be used as list very rarely. Under normal >> + * circumstances TwoPhaseGetXidByVXid() returns only one xid. >> + * But under certain conditions can return many xids or nothing. >> + */ >> +typedef struct XidListEntry >> +{ >> + TransactionId xid; >> + struct XidListEntry* next; >> +} XidListEntry; > > I don't think we should invent additional ad-hoc list types. Fixed, removed list here entirely. I'm attaching new version. It works fine on my machines. Thanks! Best regards, Andrey Borodin.
Вложения
- v13-0001-Introduce-TAP-test-for-CIC.patch
- v13-0006-Do-CIC-test-time-based-to-ensure-bug-repro.patch
- v13-0005-Reorder-steps-of-2PC-preparation.patch
- v13-0004-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxi.patch
- v13-0003-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch
- v13-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch
Hi, On 2021-08-29 12:27:31 +0500, Andrey Borodin wrote: > > 15 авг. 2021 г., в 18:45, Andres Freund <andres@anarazel.de> написал(а): > > On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote: > >> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) > >> */ > >> proc = BackendIdGetProc(vxid.backendId); > >> if (proc == NULL) > >> - return true; > >> + return WaitXact(vxid, InvalidTransactionId, wait); > >> > >> /* > >> * We must acquire this lock before checking the backendId and lxid > >> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait) > >> || proc->fpLocalTransactionId != vxid.localTransactionId) > >> { > >> LWLockRelease(&proc->fpInfoLock); > >> - return true; > >> + return WaitXact(vxid, InvalidTransactionId, wait); > >> } > > > > It seems like it's going to add a substantial amount of work even when > > no 2PC xacts are involved? > Only if 2PCs are enabled. I don't think that's good enough. Plenty of systems have 2PC enabled but very few if any transactions end up as 2PC ones. Greetings, Andres Freund
> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а): >>> >>> It seems like it's going to add a substantial amount of work even when >>> no 2PC xacts are involved? >> Only if 2PCs are enabled. > > I don't think that's good enough. Plenty of systems have 2PC enabled but very > few if any transactions end up as 2PC ones. I could not compose significantly better solution. Let's consider what we actually pay for. GetLockConflicts() returns xids for 2PC and vxids for any other transaction. Then we wait for each other one-by-one. GetLockConflicts() cannot return xids for regular transactions: these xids may not have xid lock yet. So we have to workwith vxid. Each wait case can resort to set of three possible actions: test/wait for xid (cheap), test/wait for vxid (relatively cheap),search xid by vxid (expensive). When we have to wait for known-2PC we wait on xid (cheap). When we have vxid: 1. If backend is not on the same vxid, we have to search xid by vxid (expensive). Then wait on obtained xid if any. 2. Either way wait for vxid, then for xid (cheap * 2). So the really slow part is searching xids for long-gone vxids that may not have a xid at all. We simply have no good mechanics to say that this vxid did not have a xid, if vxid is long gone. Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). Does it worth the complexity? Best regards, Andrey Borodin.
On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote: > > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а): > >>> It seems like it's going to add a substantial amount of work even when > >>> no 2PC xacts are involved? > >> Only if 2PCs are enabled. > > > > I don't think that's good enough. Plenty of systems have 2PC enabled but very > > few if any transactions end up as 2PC ones. > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). > > Does it worth the complexity? https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a bottleneck. Nonetheless, if Andres Freund finds it's worth the complexity, then I'm content with it. I'd certainly expect some performance benefit. Andres, what do you think? We could start with an unlocked scan of the twophase shared memory. If the unlocked scan finds a potential match, acquire TwoPhaseStateLock and repeat the scan. Otherwise, we don't need the locked scan, because we can deduce that the locked scan would find nothing. I'm not fond of relying on such reasoning without a known-strong performance need, but it's an alternative.
On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote: > --- a/src/backend/access/transam/twophase.c > +++ b/src/backend/access/transam/twophase.c > @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, > proc->pgprocno = gxact->pgprocno; > SHMQueueElemInit(&(proc->links)); > proc->waitStatus = PROC_WAIT_STATUS_OK; > - /* We set up the gxact's VXID as InvalidBackendId/XID */ > - proc->lxid = (LocalTransactionId) xid; > + /* We set up the gxact's VXID as real for CIC purposes */ > + proc->lxid = MyProc->lxid; This breaks the case where the server restarted after PREPARE TRANSACTION. MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is false. I'm attaching a test case addition. Can you repair this? > proc->xid = xid; > Assert(proc->xmin == InvalidTransactionId); > proc->delayChkpt = false; > proc->statusFlags = 0; > proc->pid = 0; > - proc->backendId = InvalidBackendId; > + /* May be backendId of startup process */ > + proc->backendId = MyBackendId; Incidentally, MyBackendId of startup process depends on other facts. When using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1. Otherwise, including clean startup of a non-standby node, MyBackendId is InvalidBackendId. This may be harmless. I didn't know about it. On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote: > On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote: > > > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а): > > >>> It seems like it's going to add a substantial amount of work even when > > >>> no 2PC xacts are involved? > > >> Only if 2PCs are enabled. > > > > > > I don't think that's good enough. Plenty of systems have 2PC enabled but very > > > few if any transactions end up as 2PC ones. > > > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). > > > > Does it worth the complexity? > > https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r > suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a > bottleneck. Nonetheless, if Andres Freund finds it's worth the complexity, > then I'm content with it. I'd certainly expect some performance benefit. > Andres, what do you think? A few more benefits (beyond lock contention) come to mind: - Looking at the three VirtualXactLock() callers, waiting for final disposition of prepared transactions is necessary for WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and dead code for ResolveRecoveryConflictWithVirtualXIDs(). In WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a prepared transaction won't do further database reads. Waiting on the prepared transaction there could give CIC an arbitrarily-long, needless delay. ResolveRecoveryConflictWithVirtualXIDs() will never wait on a prepared transaction, because prepared transactions hold no locks during recovery. (If a prepared transaction originally acquired AccessExclusiveLock, the startup process holds that lock on its behalf.) Coordinating the XID search at a higher layer would let us change WaitForLockersMultiple() without changing the others. - v13 WaitPreparedXact() experiences starvation when a steady stream of prepared transactions have the same VXID. Since VXID reuse entails reconnecting, starvation will be unnoticeable in systems that follow best practices around connection lifespan. The 2021-08-23 patch version didn't have that hazard. None of those benefits clearly justify the complexity, but they're relevant to the decision.
Вложения
> 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а): > > On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote: >> --- a/src/backend/access/transam/twophase.c >> +++ b/src/backend/access/transam/twophase.c >> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, >> proc->pgprocno = gxact->pgprocno; >> SHMQueueElemInit(&(proc->links)); >> proc->waitStatus = PROC_WAIT_STATUS_OK; >> - /* We set up the gxact's VXID as InvalidBackendId/XID */ >> - proc->lxid = (LocalTransactionId) xid; >> + /* We set up the gxact's VXID as real for CIC purposes */ >> + proc->lxid = MyProc->lxid; > > This breaks the case where the server restarted after PREPARE TRANSACTION. > MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is > false. I'm attaching a test case addition. Can you repair this? Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's PGPROC, whilevxid is not valid. I see two ways to solve this: 1. Always set valid vxid, but fake 'vxid from xid' for gxact. 2. Teach GetLockConflicts() to use xid if vxid is invalid. Both ways lead to identical GetLockConflicts() output. PFA implementation of approach 1. >> proc->xid = xid; >> Assert(proc->xmin == InvalidTransactionId); >> proc->delayChkpt = false; >> proc->statusFlags = 0; >> proc->pid = 0; >> - proc->backendId = InvalidBackendId; >> + /* May be backendId of startup process */ >> + proc->backendId = MyBackendId; > > Incidentally, MyBackendId of startup process depends on other facts. When > using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1. > Otherwise, including clean startup of a non-standby node, MyBackendId is > InvalidBackendId. This may be harmless. I didn't know about it. I think we should avoid using backendId during startup. The backend itself has nothing to do with the transaction. > On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote: >> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote: >>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а): >>>>>> It seems like it's going to add a substantial amount of work even when >>>>>> no 2PC xacts are involved? >>>>> Only if 2PCs are enabled. >>>> >>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very >>>> few if any transactions end up as 2PC ones. >> >>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). >>> >>> Does it worth the complexity? >> >> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r >> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a >> bottleneck. Nonetheless, if Andres Freund finds it's worth the complexity, >> then I'm content with it. I'd certainly expect some performance benefit. >> Andres, what do you think? > > A few more benefits (beyond lock contention) come to mind: > > - Looking at the three VirtualXactLock() callers, waiting for final > disposition of prepared transactions is necessary for > WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and > dead code for ResolveRecoveryConflictWithVirtualXIDs(). In > WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a > prepared transaction won't do further database reads. Waiting on the > prepared transaction there could give CIC an arbitrarily-long, needless > delay. ResolveRecoveryConflictWithVirtualXIDs() will never wait on a > prepared transaction, because prepared transactions hold no locks during > recovery. (If a prepared transaction originally acquired > AccessExclusiveLock, the startup process holds that lock on its behalf.) > Coordinating the XID search at a higher layer would let us change > WaitForLockersMultiple() without changing the others. BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to VirtualXactLock()if we want to wait on 2PC or not. Does it make sense? > > > - v13 WaitPreparedXact() experiences starvation when a steady stream of > prepared transactions have the same VXID. Since VXID reuse entails > reconnecting, starvation will be unnoticeable in systems that follow best > practices around connection lifespan. The 2021-08-23 patch version didn't > have that hazard. I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a stream ofoverlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream to becameone-2PC-at-a-time for a moment. Thanks! Best regards, Andrey Borodin.
Вложения
- v14-0001-Introduce-TAP-test-for-CIC.patch
- v14-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch
- v14-0003-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch
- v14-0004-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxi.patch
- v14-0005-Reorder-steps-of-2PC-preparation.patch
- v14-0006-Do-CIC-test-time-based-to-ensure-bug-repro.patch
On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote: > > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а): > > On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote: > >> --- a/src/backend/access/transam/twophase.c > >> +++ b/src/backend/access/transam/twophase.c > >> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, > >> proc->pgprocno = gxact->pgprocno; > >> SHMQueueElemInit(&(proc->links)); > >> proc->waitStatus = PROC_WAIT_STATUS_OK; > >> - /* We set up the gxact's VXID as InvalidBackendId/XID */ > >> - proc->lxid = (LocalTransactionId) xid; > >> + /* We set up the gxact's VXID as real for CIC purposes */ > >> + proc->lxid = MyProc->lxid; > > > > This breaks the case where the server restarted after PREPARE TRANSACTION. > > MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is > > false. I'm attaching a test case addition. Can you repair this? > Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's PGPROC,while vxid is not valid. > I see two ways to solve this: > 1. Always set valid vxid, but fake 'vxid from xid' for gxact. > 2. Teach GetLockConflicts() to use xid if vxid is invalid. > Both ways lead to identical GetLockConflicts() output. > PFA implementation of approach 1. That's reasonable. I'll queue a task to review the rest of the patch. > > On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote: > >> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote: > >>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а): > >>>>>> It seems like it's going to add a substantial amount of work even when > >>>>>> no 2PC xacts are involved? > >>>>> Only if 2PCs are enabled. > >>>> > >>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very > >>>> few if any transactions end up as 2PC ones. > >> > >>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). > >>> > >>> Does it worth the complexity? > >> > >> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r > >> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a > >> bottleneck. Nonetheless, if Andres Freund finds it's worth the complexity, > >> then I'm content with it. I'd certainly expect some performance benefit. > >> Andres, what do you think? > > > > A few more benefits (beyond lock contention) come to mind: > > > > - Looking at the three VirtualXactLock() callers, waiting for final > > disposition of prepared transactions is necessary for > > WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and > > dead code for ResolveRecoveryConflictWithVirtualXIDs(). In > > WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a > > prepared transaction won't do further database reads. Waiting on the > > prepared transaction there could give CIC an arbitrarily-long, needless > > delay. ResolveRecoveryConflictWithVirtualXIDs() will never wait on a > > prepared transaction, because prepared transactions hold no locks during > > recovery. (If a prepared transaction originally acquired > > AccessExclusiveLock, the startup process holds that lock on its behalf.) > > Coordinating the XID search at a higher layer would let us change > > WaitForLockersMultiple() without changing the others. > BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to VirtualXactLock()if we want to wait on 2PC or not. Does it make sense? For both CIC and ATExecDetachPartitionFinalize(), one needs unlucky timing for the operation to needlessly wait on a prepared xact. Specifically, the needless wait arises if a PREPARE happens while WaitForOlderSnapshots() is running, after its GetCurrentVirtualXIDs() and before its VirtualXactLock(). I would not choose to "indicate to VirtualXactLock() if we want to wait on 2PC or not", but code like that wouldn't be too bad. I probably wouldn't remove code like that if you chosen to write it. The alternative I have in mind would work like the following pseudocode. I'm leaning toward thinking it's not worth doing, since none of the three benefits are known to be important. But maybe it is worth doing. struct LockConflictData { /* VXIDs seen to have XIDs */ List *vxid_of_xid; /* VXIDs without known XIDs */ List *vxid_no_xid; /* * XIDs. Has one entry per entry in vxid_of_xid, and it may have up to * max_prepared_transactions additional entries. */ List *xid; }; void WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress) { struct LockConflictData holders; List *latest_xid = NIL; List *need_mapping = NIL; ListCell *lc; int total = 0; int done = 0; /* Collect the transactions we need to wait on */ foreach(lc, locktags) { LOCKTAG *locktag = lfirst(lc); int count; GetLockConflicts(&holders, locktag, lockmode, progress ? &count : NULL); } /* wait on VXIDs known to have XIDs, and wait on known XIDs */ foreach(lc, holders.vxid_of_xid) VirtualXactLock(lfirst_int(lc), true, NULL); foreach(lc, holders.xid) XactLockTableWait(lfirst_int(lc), other_params); /* wait on remaining VXIDs, possibly discovering more XIDs */ foreach(lc, holders.vxid_no_xid) { VirtualTransactionId *v = lfirst(lc); TransactionId xid = InvalidTransactionId; /* * Under this design, VirtualXactLock() would know nothing about 2PC, * but it would gain the ability to return proc->xid of the waited * proc. Under this design, VirtualTransactionId is always a local * transaction, like it was before commit 8a54e12. */ VirtualXactLock(*v, true, &xid); if (TransactionIdIsValid(xid)) latest_xid = lappend_int(late_xid, xid); else need_mapping = lappend_int(need_mapping, v); } /* wait on XIDs just discovered */ foreach(lc, latest_xid) XactLockTableWait(lfirst_int(lc), other_params); /* * If we never saw an XID associated with a particular VXID, check whether * the VXID became a prepared xact. */ latest_xid = TwoPhaseGetXidByVXid(need_mapping); foreach(lc, latest_xid) XactLockTableWait(lfirst_int(lc), other_params); } > > - v13 WaitPreparedXact() experiences starvation when a steady stream of > > prepared transactions have the same VXID. Since VXID reuse entails > > reconnecting, starvation will be unnoticeable in systems that follow best > > practices around connection lifespan. The 2021-08-23 patch version didn't > > have that hazard. > I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream tobecame one-2PC-at-a-time for a moment. You're probably right about that.
On Sat, Sep 25, 2021 at 01:10:12PM -0700, Noah Misch wrote: > On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote: > > > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а): > I'll queue a task to review the rest of the patch. I think the attached version is ready for commit. Notable differences vs. v14: - Made TwoPhaseGetXidByVXid() stop leaking TwoPhaseStateLock when it found a second match. - Instead of moving the ProcArrayClearTransaction() call within PrepareTransaction(), move the PostPrepare_Locks() call. I didn't find any bug in the other way, but it's a good principle to maximize similarity with CommitTransaction(). PostPrepare_Locks() has no counterpart in CommitTransaction(), so that principle is indifferent to moving it. - inval-build-race-v0.1.patch was incompatible with debug_discard_caches. The being-built relation would always get invalidated, making RelationBuildDesc() an infinite loop. I've fixed this by making RelationCacheInvalidate() invalidate in-progress rels only when called for sinval overflow, not when called for debug_discard_caches. This adds some function arguments that only matter in assert builds. That's not great, but InvalidateSystemCaches() is expensive anyway. I considered instead adding functions HoldDebugInval() and ResumeDebugInval(), which RelationBuildDesc() would use to suppress debug_discard_caches during any iteration after the first. That didn't seem better. - Discard the in-progress array after an ERROR during RelationBuildDesc(). - Made the relcache.c changes repalloc the list of in-progress rels as needed. - Changed the background_pgbench args from ($dbname, $stdin, \$stdout, $timer, $files, $opts) to ($opts, $files, \$stdout, $timer). $dbname was unused. pgbench doesn't make interesting use of its stdin, so I dropped that arg until we have a use case. $opts and $files seem akin to the $dbname arg of background_psql, so I moved them to the front. I'm not sure that last change was an improvement. - Made 001_pgbench_with_server.pl use PostgresNode::pgbench(), rather than duplicate code. Factored out a subroutine of PostgresNode::pgbench() and PostgresNode::background_pgbench(). - Lots of comment and naming changes. One thing not done here is to change the tests to use CREATE INDEX CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11 and earlier. I may do that before pushing, or I may just omit the tests from older branches. > > > - v13 WaitPreparedXact() experiences starvation when a steady stream of > > > prepared transactions have the same VXID. Since VXID reuse entails > > > reconnecting, starvation will be unnoticeable in systems that follow best > > > practices around connection lifespan. The 2021-08-23 patch version didn't > > > have that hazard. > > I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream tobecame one-2PC-at-a-time for a moment. > > You're probably right about that. I didn't know of the "stateP->nextLXID = nextLocalTransactionId;" in CleanupInvalidationState(), which indeed makes this all but impossible. On Sat, Aug 07, 2021 at 03:19:57PM -0700, Noah Misch wrote: > On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote: > > Changes: > > 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably, needa better data structure. > > I don't know either, hence that quick data structure to delay the question. > debug_discard_caches=3 may help answer the question. RelationBuildDesc() > reads pg_constraint, which is !rd_isnailed. Hence, I expect one can at least > get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table"). debug_discard_caches=5 yields a depth of eight when opening a relation having a CHECK constraint: my_rel_having_check_constraint pg_constraint_conrelid_contypid_conname_index pg_index pg_constraint pg_constraint pg_constraint pg_constraint pg_constraint While debug_discard_caches doesn't permit higher values, I think one could reach depths greater than eight by, for example, having a number of sessions invalidate pg_constraint as often as possible. Hence, I'm glad the code no longer relies on a depth limit.
Вложения
> 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а): > > I think the attached version is ready for commit. Notable differences > vs. v14: Few notes: 1. Maybe an Assert(in_progress_list_maxlen) when in_progress_list_maxlen is used? 2. -#define VirtualTransactionIdIsPreparedXact(vxid) \ +#define VirtualTransactionIdIsRecoveredPreparedXact(vxid) \ This is a very neat transition. Yes, the function argument will always be a xid only for recovered transactions. Maybe adda comment here that this function is expected to be used only for results of GetLockConflicts()? > One thing not done here is to change the tests to use CREATE INDEX > CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11 > and earlier. I may do that before pushing, or I may just omit the tests from > older branches. The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive. But swapping every "REINDEX INDEX CONCURRENTLY idx;" with DROP INDEX CONCURRENTLY idx; CREATE INDEX CONCURRENTLY idx on tbl(i); works. > <inval-build-race-v1.patch><prepared-transactions-cic-series202107-v15nm.patch> I've checked that this patches work for some time on my machines. I do not observe failures. Thanks! Best regards, Andrey Borodin.
On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote: > > 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а): > > I think the attached version is ready for commit. Notable differences > > vs. v14: > Few notes: > > 1. Maybe an Assert(in_progress_list_maxlen) when in_progress_list_maxlen is used? RelationCacheInitialize() initializes both in_progress_list_maxlen and the RelationIdCache hash table, and any ERROR there is promoted to FATAL. Without the hash table, nothing good can happen in relcache. Hence, I think such an assert is excessive. > 2. > -#define VirtualTransactionIdIsPreparedXact(vxid) \ > +#define VirtualTransactionIdIsRecoveredPreparedXact(vxid) \ > > This is a very neat transition. Yes, the function argument will always be a xid only for recovered transactions. Maybeadd a comment here that this function is expected to be used only for results of GetLockConflicts()? One can use it on any VirtualTransactionId, though some sources only return values for which this returns false. It can return true for the result of GET_VXID_FROM_PGPROC(). I think it can return true for the result of GetCurrentVirtualXIDs(limitXmin = InvalidTransactionId), but core code doesn't make such a call. > > One thing not done here is to change the tests to use CREATE INDEX > > CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11 > > and earlier. I may do that before pushing, or I may just omit the tests from > > older branches. > > The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive. That's fine with me. Back-patching a fix without its tests is riskier than back-patching test infrastructure changes. > But swapping every "REINDEX INDEX CONCURRENTLY idx;" with > DROP INDEX CONCURRENTLY idx; > CREATE INDEX CONCURRENTLY idx on tbl(i); > works. > > > <inval-build-race-v1.patch><prepared-transactions-cic-series202107-v15nm.patch> > I've checked that this patches work for some time on my machines. I do not observe failures. Good. Thanks for checking.
On Mon, Oct 18, 2021 at 08:02:12PM -0700, Noah Misch wrote: > On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote: > > > 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а): > > > I think the attached version is ready for commit. Notable differences > > > vs. v14: Pushed. Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple "IPC::Run: timeout on timer" in the new tests. No other animal has. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09 is an example run. The pgbench queries finished quickly, but the $pgbench_h->finish() apparently timed out after 180s. I guess this would be consistent with pgbench blocking in write(), waiting for something to empty a pipe buffer so it can write more. I thought finish() will drain any incoming I/O, though. This phenomenon has been appearing regularly via src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to resolve it. A stack trace of the stuck pgbench should be informative, though. Compared to my last post, the push included two more test changes. I removed sleeps from a test. They could add significant time on a system with coarse sleep granularity. This did not change test sensitivity on my system. Second, I changed background_pgbench to include stderr lines in $stdout, as it had documented. This becomes important during the back-patch to v11, where server errors don't cause a nonzero pgbench exit status. background_psql still has the same bug, and I can fix it later. (The background_psql version of the bug is not affecting current usage.) FYI, the non-2PC test is less sensitive in older branches. It reproduces master's bug in 25-50% of runs, but it took about six minutes on v11 and v12. > > > One thing not done here is to change the tests to use CREATE INDEX > > > CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11 > > > and earlier. I may do that before pushing, or I may just omit the tests from > > > older branches. > > > > The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive. > > That's fine with me. Back-patching a fix without its tests is riskier than > back-patching test infrastructure changes. Back-patching the tests did end up tricky, for other reasons. Before v12 (d3c09b9), a TAP suite in a pgxs module wouldn't run during check-world. Before v11 (7f563c0), amcheck lacks the heapallindexed feature that the tests rely on. Hence, for v11, v10, and v9.6, I used a plpgsql implementation of the heapallindexed check, and I moved the tests to src/bin/pgbench. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-19%2012%3A58%3A08
> 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а): > > On Mon, Oct 18, 2021 at 08:02:12PM -0700, Noah Misch wrote: >> On Mon, Oct 18, 2021 at 06:23:05PM +0500, Andrey Borodin wrote: >>>> 17 окт. 2021 г., в 20:12, Noah Misch <noah@leadboat.com> написал(а): >>>> I think the attached version is ready for commit. Notable differences >>>> vs. v14: > > Pushed. Wow, that's great! Thank you! > Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple > "IPC::Run: timeout on timer" in the new tests. No other animal has. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09 > is an example run. The pgbench queries finished quickly, but the > $pgbench_h->finish() apparently timed out after 180s. I guess this would be > consistent with pgbench blocking in write(), waiting for something to empty a > pipe buffer so it can write more. I thought finish() will drain any incoming > I/O, though. This phenomenon has been appearing regularly via > src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to > resolve it. A stack trace of the stuck pgbench should be informative, though. Some thoughts: 0. I doubt that psql\pgbench is stuck in these failures. 1. All observed similar failures seem to be related to finish() sub of IPC::Run harness 2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something. 3. There is reported bug of finish [1]. But the description is slightly different. > > Compared to my last post, the push included two more test changes. I removed > sleeps from a test. They could add significant time on a system with coarse > sleep granularity. This did not change test sensitivity on my system. > Second, I changed background_pgbench to include stderr lines in $stdout, as it > had documented. This becomes important during the back-patch to v11, where > server errors don't cause a nonzero pgbench exit status. background_psql > still has the same bug, and I can fix it later. (The background_psql version > of the bug is not affecting current usage.) > > FYI, the non-2PC test is less sensitive in older branches. It reproduces > master's bug in 25-50% of runs, but it took about six minutes on v11 and v12. It seem like loading Relation Descr to relcache becomes more expensive? >>>> One thing not done here is to change the tests to use CREATE INDEX >>>> CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11 >>>> and earlier. I may do that before pushing, or I may just omit the tests from >>>> older branches. >>> >>> The tests refactors PostgresNode.pm and some tests. Back-patching this would be quite invasive. >> >> That's fine with me. Back-patching a fix without its tests is riskier than >> back-patching test infrastructure changes. > > Back-patching the tests did end up tricky, for other reasons. Before v12 > (d3c09b9), a TAP suite in a pgxs module wouldn't run during check-world. > Before v11 (7f563c0), amcheck lacks the heapallindexed feature that the tests > rely on. Hence, for v11, v10, and v9.6, I used a plpgsql implementation of > the heapallindexed check, and I moved the tests to src/bin/pgbench. Cool! Thanks! Best regards, Andrey Borodin. [0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481 [1] https://github.com/toddr/IPC-Run/issues/57
On Sun, Oct 24, 2021 at 02:45:38PM +0300, Andrey Borodin wrote: > > 24 окт. 2021 г., в 08:00, Noah Misch <noah@leadboat.com> написал(а): > > Buildfarm member conchuela (DragonFly BSD 6.0) has gotten multiple > > "IPC::Run: timeout on timer" in the new tests. No other animal has. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-24%2003%3A05%3A09 > > is an example run. The pgbench queries finished quickly, but the > > $pgbench_h->finish() apparently timed out after 180s. I guess this would be > > consistent with pgbench blocking in write(), waiting for something to empty a > > pipe buffer so it can write more. I thought finish() will drain any incoming > > I/O, though. This phenomenon has been appearing regularly via > > src/test/recovery/t/017_shm.pl[1], so this thread doesn't have a duty to > > resolve it. A stack trace of the stuck pgbench should be informative, though. > > Some thoughts: > 0. I doubt that psql\pgbench is stuck in these failures. Got it. If pgbench is a zombie, the fault does lie in IPC::Run or the kernel. > 1. All observed similar failures seem to be related to finish() sub of IPC::Run harness > 2. Finish must pump any pending data from process [0]. But it can hang if process is waiting for something. > 3. There is reported bug of finish [1]. But the description is slightly different. Since that report is about a Perl-process child on Linux, I think we can treat it as unrelated. These failures started on 2021-10-09, the day conchuela updated from DragonFly v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. Since the theorized kernel bug seems not to affect src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround from other tests. One thing in common with src/test/recovery/t/017_shm.pl and the newest failure sites is that they don't write anything to the child stdin. Does writing e.g. a single byte (that the child doesn't use) work around the problem? If not, does passing the script via stdin, like "pgbench -f- <script.sql", work around the problem? > [0] https://metacpan.org/dist/IPC-Run/source/lib/IPC/Run.pm#L3481 > [1] https://github.com/toddr/IPC-Run/issues/57
> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а): > > These failures started on 2021-10-09, the day conchuela updated from DragonFly > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. > Since the theorized kernel bug seems not to affect > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround > from other tests. One thing in common with src/test/recovery/t/017_shm.pl and > the newest failure sites is that they don't write anything to the child stdin. > Does writing e.g. a single byte (that the child doesn't use) work around the > problem? Following diff did not solve the problem on my VM, finish() still hangs sometimes. diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 86eb920ea1..0c550ff0dc 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2112,6 +2112,7 @@ sub background_pgbench my $harness = IPC::Run::start \@cmd, '<', \$stdin, '>', $stdout, '2>&1', $timer; + $stdin .= "some bytes"; return $harness; } > If not, does passing the script via stdin, like "pgbench -f- > <script.sql", work around the problem? I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts. Best regards, Andrey Borodin.
On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote: > Pushed. cm@enterprisedb.com, could you post a stack trace from buildfarm member gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like those seen today: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39 Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably broke something, but I've run out of ideas, and my hpux boxes died. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-10-24%2012%3A01%3A10 got an interesting v9.6 failure, unrelated to the gharial SIGSEGV failures: 2021-10-24 14:25:29.263 CEST [34569:175] pgbench ERROR: could not read two-phase state from xlog at 0/158F4E0 2021-10-24 14:25:29.263 CEST [34569:176] pgbench STATEMENT: COMMIT PREPARED 'c1'; I don't see this thread's commits changing code paths relevant to that failure, so I currently think this one is a new test showing an old bug.
Andrey Borodin <x4mmm@yandex-team.ru> writes: >> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а): >> These failures started on 2021-10-09, the day conchuela updated from DragonFly >> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. prairiedog just reported the identical symptom: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58 I'm thinking "timing problem" more than "kernel bug". (I'd have also believed "ancient IPC::Run bug", except that I assume conchuela is running a pretty modern IPC::Run.) I can poke into it tomorrow. regards, tom lane
> 25 окт. 2021 г., в 02:53, Tom Lane <tgl@sss.pgh.pa.us> написал(а): > > I can poke into it tomorrow. FWIW it's easy to make the issue reproduce faster with following diff diff --git a/contrib/amcheck/t/003_cic_2pc.pl b/contrib/amcheck/t/003_cic_2pc.pl index f4255c1fb8..0d68c53387 100644 --- a/contrib/amcheck/t/003_cic_2pc.pl +++ b/contrib/amcheck/t/003_cic_2pc.pl @@ -141,7 +141,7 @@ $node->safe_psql('postgres', q(REINDEX TABLE tbl;)); my $pgbench_out = ''; my $pgbench_timer = IPC::Run::timeout(180); my $pgbench_h = $node->background_pgbench( - '--no-vacuum --client=1 --transactions=100', + '--no-vacuum --client=1 --transactions=1', { '002_pgbench_concurrent_cic' => q( DROP INDEX CONCURRENTLY idx; I.e. if pgbench exits earlier than we call finish() it always hangs. I observe reproduction on my Drogonfly VM and I canshare VM state if it's of any help. Best regards, Andrey Borodin.
Andrey Borodin <x4mmm@yandex-team.ru> writes: > FWIW it's easy to make the issue reproduce faster with following diff > - '--no-vacuum --client=1 --transactions=100', > + '--no-vacuum --client=1 --transactions=1', Hmm, didn't help here. It seems that even though prairiedog managed to fail on its first attempt, it's not terribly reproducible there; I've seen only one failure in about 30 manual attempts. In the one failure, the non-background pgbench completed fine (as determined by counting statements in the postmaster's log); but the background one had only finished about 90 transactions before seemingly getting stuck. No new SQL commands had been issued after about 10 seconds. Nonetheless, I have a theory and a proposal. This coding pattern seems pretty silly: $pgbench_h->pump_nb; $pgbench_h->finish(); ISTM that if you need to call pump at all, you need a loop not just one call. So I'm guessing that when it fails, it's for lack of pumping. The other thing I noticed is that at least on prairiedog's host, the number of invocations of the DROP/CREATE/bt_index_check transaction is ridiculously out of proportion to the number of invocations of the other transactions. It can only get through seven or eight iterations of the index transaction before the other transactions are all done, which means the last 190 iterations of that transaction are a complete waste of cycles. What I think we should do in these two tests is nuke the use of background_pgbench entirely; that looks like a solution in search of a problem, and it seems unnecessary here. Why not run the DROP/CREATE/bt_index_check transaction as one of three script options in the main pgbench run? Aside from dodging this maybe-its-a-bug-or-maybe-not behavior in IPC::Run, this would make the test automatically scale the number of iterations of the different transactions to appropriate values, so that we'd not be wasting cycles. regards, tom lane
On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote: > Andrey Borodin <x4mmm@yandex-team.ru> writes: > > FWIW it's easy to make the issue reproduce faster with following diff > > - '--no-vacuum --client=1 --transactions=100', > > + '--no-vacuum --client=1 --transactions=1', > > Hmm, didn't help here. It seems that even though prairiedog managed to > fail on its first attempt, it's not terribly reproducible there; I've > seen only one failure in about 30 manual attempts. In the one failure, > the non-background pgbench completed fine (as determined by counting > statements in the postmaster's log); but the background one had only > finished about 90 transactions before seemingly getting stuck. No new > SQL commands had been issued after about 10 seconds. Interesting. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2021-10-24%2016%3A05%3A58 also shows a short command count, just 131/200 completed. However, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-10-25%2000%3A35%3A27 shows the full 200/200. I'm starting to think the prairiedog failures have only superficial similarity to the conchuela failures. > Nonetheless, I have a theory and a proposal. This coding pattern > seems pretty silly: > > $pgbench_h->pump_nb; > $pgbench_h->finish(); > > ISTM that if you need to call pump at all, you need a loop not just > one call. So I'm guessing that when it fails, it's for lack of > pumping. The pump_nb() is just unnecessary. We've not added anything destined for stdin, and finish() takes care of pumping outputs. > The other thing I noticed is that at least on prairiedog's host, the > number of invocations of the DROP/CREATE/bt_index_check transaction > is ridiculously out of proportion to the number of invocations of the > other transactions. It can only get through seven or eight iterations > of the index transaction before the other transactions are all done, > which means the last 190 iterations of that transaction are a complete > waste of cycles. That makes sense. > What I think we should do in these two tests is nuke the use of > background_pgbench entirely; that looks like a solution in search > of a problem, and it seems unnecessary here. Why not run > the DROP/CREATE/bt_index_check transaction as one of three script > options in the main pgbench run? The author tried that and got deadlocks: https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru On prairiedog, the proximate trouble is pgbench getting stuck. IPC::Run is behaving normally given a stuck pgbench. When pgbench stops sending queries, does pg_stat_activity show anything at all running? If so, are those backends waiting on locks? If not, what's the pgbench stack trace at that time?
Noah Misch <noah@leadboat.com> writes: > On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote: >> What I think we should do in these two tests is nuke the use of >> background_pgbench entirely; that looks like a solution in search >> of a problem, and it seems unnecessary here. Why not run >> the DROP/CREATE/bt_index_check transaction as one of three script >> options in the main pgbench run? > The author tried that and got deadlocks: > https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru Hmm, I guess that's because two concurrent CICs can deadlock against each other. I wonder if we could fix that ... or maybe we could teach pgbench that it mustn't launch more than one instance of that script? Or more practically, use advisory locks in that script to enforce that only one runs at once. > On prairiedog, the proximate trouble is pgbench getting stuck. IPC::Run is > behaving normally given a stuck pgbench. When pgbench stops sending queries, > does pg_stat_activity show anything at all running? If so, are those backends > waiting on locks? If not, what's the pgbench stack trace at that time? The connected backend is idle, waiting for client input: #0 0x9002ec88 in kevent () #1 0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1, wait_event_info=0)at latch.c:1601 #2 0x0028d6a0 in secure_read (port=0x2501f40, ptr=0x7ad1d0, len=8192) at be-secure.c:186 #3 0x002958a4 in pq_recvbuf () at pqcomm.c:957 #4 0x00295994 in pq_getbyte () at pqcomm.c:1000 #5 0x004196a0 in PostgresMain The last few entries in the postmaster log are: 2021-10-26 01:19:31.122 EDT [1013] 002_cic.pl LOG: statement: DROP INDEX CONCURRENTLY idx; 2021-10-26 01:19:31.136 EDT [1013] 002_cic.pl LOG: statement: CREATE INDEX CONCURRENTLY idx ON tbl(i); 2021-10-26 01:19:31.167 EDT [1013] 002_cic.pl LOG: statement: SELECT bt_index_check('idx',true); which is consistent with pg_stat_activity: datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port| backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query_id | query | backend_type -------+----------+------+------------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- +-----------------+---------------------+--------+-------------+--------------+----------+------------------------------------+------------------------------ 13019 | postgres | 1013 | | 10 | tgl | 002_cic.pl | | | -1 | 2021-10-26 01:19:18.49282-04 | | 2021-10-26 01:19:31.167508-04 | 2021-10-26 01:19:31.190256-04 | Client | ClientRead | idle | | | | SELECT bt_index_check('idx',true);| client backend The pgbench process is sitting here: #0 0x9001f488 in select () #1 0x0000758c in wait_on_socket_set (sa=0x300bb0, usecs=0) at pgbench.c:7098 #2 0x0000d514 in threadRun (arg=0x300420) at pgbench.c:6776 #3 0x00010c44 in main (argc=-1635084506, argv=0x0) at pgbench.c:6540 which means it thinks it's waiting for input. The PGconn object is in this state: status = CONNECTION_OK, asyncStatus = PGASYNC_BUSY, xactStatus = PQTRANS_IDLE, ... inBuffer = 0x6a000 "T", inBufSize = 16384, inStart = 0, inCursor = 0, inEnd = 0, outBuffer = 0x6e000 "Q", outBufSize = 16384, outCount = 0, outMsgStart = 1, outMsgEnd = 34, The inBuffer is logically empty, but its contents look like the last input was the result of the "SELECT bt_index_check" command, consistent with the postmaster log. The outBuffer is likewise logically empty, but what it contains appears to be the next DROP INDEX command: (gdb) x/64c state->con->outBuffer 0x6e000: 81 'Q' 0 '\0' 0 '\0' 0 '\0' 33 '!' 68 'D' 82 'R' 79 'O' 0x6e008: 80 'P' 32 ' ' 73 'I' 78 'N' 68 'D' 69 'E' 88 'X' 32 ' ' 0x6e010: 67 'C' 79 'O' 78 'N' 67 'C' 85 'U' 82 'R' 82 'R' 69 'E' 0x6e018: 78 'N' 84 'T' 76 'L' 89 'Y' 32 ' ' 105 'i' 100 'd' 120 'x' 0x6e020: 59 ';' 0 '\0' 114 'r' 117 'u' 101 'e' 41 ')' 59 ';' 0 '\0' 0x6e028: 108 'l' 40 '(' 105 'i' 41 ')' 59 ';' 0 '\0' 95 '_' 110 'n' 0x6e030: 97 'a' 109 'm' 101 'e' 0 '\0' 48 '0' 48 '0' 50 '2' 95 '_' 0x6e038: 99 'c' 105 'i' 99 'c' 46 '.' 112 'p' 108 'l' 0 '\0' 0 '\0' So what we have is that libpq thinks it's sent the next DROP INDEX, but the backend hasn't seen it. It's fairly hard to blame that state of affairs on the IPC::Run harness. I'm wondering if we might be looking at some timing-dependent corner-case bug in the new libpq pipelining code. Pipelining isn't enabled: pipelineStatus = PQ_PIPELINE_OFF, but that doesn't mean that the pipelining code hasn't been anywhere near this command. I can see cmd_queue_head = 0x300d40, cmd_queue_tail = 0x300d40, cmd_queue_recycle = 0x0, (gdb) p *state->con->cmd_queue_head $4 = { queryclass = PGQUERY_SIMPLE, query = 0x3004e0 "DROP INDEX CONCURRENTLY idx;", next = 0x0 } The trouble with this theory, of course, is "if libpq is busted, why is only this test case showing it?". But AFAICS it would take some pretty spooky action-at-a-distance for the Perl harness to have caused this. regards, tom lane
On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Mon, Oct 25, 2021 at 04:59:42PM -0400, Tom Lane wrote: > >> What I think we should do in these two tests is nuke the use of > >> background_pgbench entirely; that looks like a solution in search > >> of a problem, and it seems unnecessary here. Why not run > >> the DROP/CREATE/bt_index_check transaction as one of three script > >> options in the main pgbench run? > > > The author tried that and got deadlocks: > > https://postgr.es/m/5E041A70-4946-489C-9B6D-764DF627A92D@yandex-team.ru > > Hmm, I guess that's because two concurrent CICs can deadlock against each > other. I wonder if we could fix that ... or maybe we could teach pgbench > that it mustn't launch more than one instance of that script? Both sound doable, but I don't expect either to fix prairiedog's trouble. > Or more > practically, use advisory locks in that script to enforce that only one > runs at once. The author did try that. > So what we have is that libpq thinks it's sent the next DROP INDEX, > but the backend hasn't seen it. Thanks for isolating that. > It's fairly hard to blame that state of affairs on the IPC::Run harness. > I'm wondering if we might be looking at some timing-dependent corner-case > bug in the new libpq pipelining code. Pipelining isn't enabled: > > pipelineStatus = PQ_PIPELINE_OFF, > > but that doesn't mean that the pipelining code hasn't been anywhere > near this command. I can see > > cmd_queue_head = 0x300d40, > cmd_queue_tail = 0x300d40, > cmd_queue_recycle = 0x0, > > (gdb) p *state->con->cmd_queue_head > $4 = { > queryclass = PGQUERY_SIMPLE, > query = 0x3004e0 "DROP INDEX CONCURRENTLY idx;", > next = 0x0 > } > > The trouble with this theory, of course, is "if libpq is busted, why is > only this test case showing it?". Agreed, it's not clear how the new tests would reveal a libpq bug that src/bin/pgbench/t/001_pgbench_with_server.pl has been unable to reveal. Does the problem reproduce on v13? Grasping at straws, background_pgbench does differ by specifying stdin as a ref to an empty scalar. I think that makes IPC::Run open a pipe and never write to it. The older pgbench tests don't override stdin, so I think that makes pgbench inherit the original stdin. Given your pgbench stack trace, this seems awfully unlikely to be the relevant difference. If we run out of ideas, you could try some runs with that difference removed: --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2110,7 +2110,7 @@ sub background_pgbench # IPC::Run would otherwise append to existing contents: $$stdout = "" if ref($stdout); - my $harness = IPC::Run::start \@cmd, '<', \$stdin, '>', $stdout, '2>&1', + my $harness = IPC::Run::start \@cmd, '>', $stdout, '2>&1', $timer; return $harness; > But AFAICS it would take some pretty > spooky action-at-a-distance for the Perl harness to have caused this. Agreed. We'll have to consider the harness innocent for the moment.
Noah Misch <noah@leadboat.com> writes: > On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote: >> Or more >> practically, use advisory locks in that script to enforce that only one >> runs at once. > The author did try that. Hmm ... that ought to have done the trick, I'd think. However: > Both sound doable, but I don't expect either to fix prairiedog's trouble. Yeah :-(. I think this test is somehow stumbling over a pre-existing bug. >> So what we have is that libpq thinks it's sent the next DROP INDEX, >> but the backend hasn't seen it. > Thanks for isolating that. The plot thickens. When I went back to look at that machine this morning, I found this in the postmaster log: 2021-10-26 02:52:09.324 EDT [1013] 002_cic.pl LOG: statement: DROP INDEX CONCURRENTLY idx; 2021-10-26 02:52:09.352 EDT [1013] 002_cic.pl LOG: could not send data to client: Broken pipe 2021-10-26 02:52:09.352 EDT [1013] 002_cic.pl FATAL: connection to client lost The timestamps correspond (more or less anyway) to when I killed off the stuck test run and went to bed. So the DROP command *was* sent, and it was eventually received by the backend, but it seems to have taken killing the pgbench process to do it. I think this probably exonerates the pgbench/libpq side of things, and instead we have to wonder about a backend or kernel bug. A kernel bug could possibly explain the unexplainable connection to what's happening on some other file descriptor. I'd be prepared to believe that prairiedog's ancient macOS version has some weird bug preventing kevent() from noticing available data ... but (a) surely conchuela wouldn't share such a bug, and (b) we've been using kevent() for a couple years now, so how come we didn't see this before? Still baffled. I'm currently experimenting to see if the bug reproduces when latch.c is made to use poll() instead of kevent(). But the failure rate was low enough that it'll be hours before I can say confidently that it doesn't (unless, of course, it does). regards, tom lane
Noah Misch <noah@leadboat.com> writes: > On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote: >> Or more >> practically, use advisory locks in that script to enforce that only one >> runs at once. > The author did try that. Ah, I see: if the other pgbench thread is waiting in pg_advisory_lock, then it is inside a transaction, so it blocks CIC from completing. We can get around that though, by using pg_try_advisory_lock and not proceeding if it fails. The attached POC does this for the 002 test; it looks like the same thing could be done to 003. Now the problem with this is it will only work back to v12, because pgbench lacks the necessary features before that. However, I think it's worth doing it like this in versions where we can do so, because of the load-balancing aspect: this won't waste cycles running CICs after the inserts have stopped, nor vice versa. Thoughts? regards, tom lane diff --git a/contrib/amcheck/t/002_cic.pl b/contrib/amcheck/t/002_cic.pl index 0b14e66270..1f258dd59d 100644 --- a/contrib/amcheck/t/002_cic.pl +++ b/contrib/amcheck/t/002_cic.pl @@ -9,7 +9,7 @@ use Config; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; -use Test::More tests => 4; +use Test::More tests => 3; my ($node, $result); @@ -25,32 +25,18 @@ $node->safe_psql('postgres', q(CREATE TABLE tbl(i int))); $node->safe_psql('postgres', q(CREATE INDEX idx ON tbl(i))); # -# Stress CIC with pgbench +# Stress CIC with pgbench. +# +# pgbench might try to launch more than one instance of the CIC +# transaction concurrently. That would deadlock, so use an advisory +# lock to ensure only one CIC runs at a time. # - -# Run background pgbench with CIC. We cannot mix-in this script into single -# pgbench: CIC will deadlock with itself occasionally. -my $pgbench_out = ''; -my $pgbench_timer = IPC::Run::timeout(180); -my $pgbench_h = $node->background_pgbench( - '--no-vacuum --client=1 --transactions=200', - { - '002_pgbench_concurrent_cic' => q( - DROP INDEX CONCURRENTLY idx; - CREATE INDEX CONCURRENTLY idx ON tbl(i); - SELECT bt_index_check('idx',true); - ) - }, - \$pgbench_out, - $pgbench_timer); - -# Run pgbench. $node->pgbench( '--no-vacuum --client=5 --transactions=200', 0, [qr{actually processed}], [qr{^$}], - 'concurrent INSERTs', + 'concurrent INSERTs and CIC', { '002_pgbench_concurrent_transaction' => q( BEGIN; @@ -62,17 +48,17 @@ $node->pgbench( SAVEPOINT s1; INSERT INTO tbl VALUES(0); COMMIT; + ), + '002_pgbench_concurrent_cic' => q( + SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset + \if :gotlock + DROP INDEX CONCURRENTLY idx; + CREATE INDEX CONCURRENTLY idx ON tbl(i); + SELECT bt_index_check('idx',true); + SELECT pg_advisory_unlock(42); + \endif ) }); -$pgbench_h->pump_nb; -$pgbench_h->finish(); -$result = - ($Config{osname} eq "MSWin32") - ? ($pgbench_h->full_results)[0] - : $pgbench_h->result(0); -is($result, 0, "pgbench with CIC works"); - -# done $node->stop; done_testing();
On Wed, Oct 27, 2021 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > [...] I'd be prepared to believe that prairiedog's > ancient macOS version has some weird bug preventing kevent() from noticing > available data ... but (a) surely conchuela wouldn't share such a bug, > and (b) we've been using kevent() for a couple years now, so how come > we didn't see this before? There was this case soon after our kqueue support landed: https://www.postgresql.org/message-id/CA%2BhUKGLzaR5cV0EmZWoVXJDO_XwZpmpQX_sYwCBRE1qLBEcGPQ%40mail.gmail.com There are a few discussions on the 'net about the flakiness of both kevent() and poll() around that vintage of macOS (both were new and shared infrastructure, separate from select()); for example in libcurl and libevent talked about this and blocked version ranges. I don't have any ideas about conchuela. For the next person who manages to reproduce this, just to sanity-check what we're passing in to kevent(), what do *port and waitfor look like when secure_read() blocks in WaitEventSetWait? It's good news that Andrey could reproduce this on a VM. I may look into setting one of those up too.
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Oct 27, 2021 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [...] I'd be prepared to believe that prairiedog's >> ancient macOS version has some weird bug preventing kevent() from noticing >> available data ... but (a) surely conchuela wouldn't share such a bug, >> and (b) we've been using kevent() for a couple years now, so how come >> we didn't see this before? > There was this case soon after our kqueue support landed: > https://www.postgresql.org/message-id/CA%2BhUKGLzaR5cV0EmZWoVXJDO_XwZpmpQX_sYwCBRE1qLBEcGPQ%40mail.gmail.com Oh yeah ... that looks like the exact same thing, doesn't it? I've just finished doing 500 cycles of amcheck's 002_cic.pl without a failure after recompiling latch.c with -DWAIT_USE_POLL. With the kevent code path, it is hard to reproduce but not that hard. So unless we can learn something from the DragonFly case, I'm inclined to write this off as ancient-macOS-bug and start using -DWAIT_USE_POLL on prairiedog. > I don't have any ideas about conchuela. For the next person who > manages to reproduce this, just to sanity-check what we're passing in > to kevent(), what do *port and waitfor look like when secure_read() > blocks in WaitEventSetWait? Not sure which variables you're referring to there, but here's a more complete gdb dump from the stuck backend: (gdb) bt #0 0x9002ec88 in kevent () #1 0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1, wait_event_info=0)at latch.c:1601 #2 0x0028d6a0 in secure_read (port=0x2501f40, ptr=0x7ad1d0, len=8192) at be-secure.c:186 #3 0x002958a4 in pq_recvbuf () at pqcomm.c:957 #4 0x00295994 in pq_getbyte () at pqcomm.c:1000 #5 0x004196a0 in PostgresMain (dbname=0x3805894 "", username=0x789020 "...") at postgres.c:353 #6 0x00363df8 in PostmasterMain (argc=8059652, argv=0x61789299) at postmaster.c:4560 #7 0x00298b6c in main (argc=4, argv=0x2500a30) at main.c:198 (gdb) f 1 #1 0x003e3fa8 in WaitEventSetWait (set=0x3802258, timeout=-1, occurred_events=0xbfffe098, nevents=1, wait_event_info=0)at latch.c:1601 1601 rc = kevent(set->kqueue_fd, NULL, 0, (gdb) i locals cur_event = (WaitEvent *) 0x78ed40 cur_kqueue_event = (struct kevent *) 0x0 timeout = { tv_sec = 13, tv_nsec = 0 } rc = 4 timeout_p = (struct timespec *) 0x0 rc = 4 returned_events = 4 start_time = { tv_sec = -1879041144, tv_usec = 58742932 } cur_time = { tv_sec = 59028992, tv_usec = 2 } cur_timeout = -1 (gdb) p *set $1 = { nevents = 3, nevents_space = 3, events = 0x380227c, latch = 0xc954ebc, latch_pos = 1, exit_on_postmaster_death = 0 '\0', kqueue_fd = 4, kqueue_ret_events = 0x38022ac, report_postmaster_not_running = 0 '\0' } (gdb) p set->events[0] $2 = { pos = 0, events = 2, fd = 7, # matches MyProcPort->sock user_data = 0x0 } (gdb) p set->events[1] $3 = { pos = 1, events = 1, fd = -1, user_data = 0x0 } (gdb) p set->events[2] $4 = { pos = 2, events = 16, fd = 3, user_data = 0x0 } (gdb) p set->kqueue_ret_events[0] $6 = { ident = 7, # matches MyProcPort->sock filter = -1, # EVFILT_READ flags = 1, # EV_ADD fflags = 0, data = 40, udata = 0x380227c } So AFAICS, the data we're passing to kevent() is all sane. It then occurred to me to replicate the case I saw this morning (control-C'ing the test script) and ktrace all the involved processes. The perl test script and the pgbench process both just immediately die on SIGINT: 25500 pgbench RET select -1 errno 4 Interrupted system call 25500 pgbench PSIG SIGINT SIG_DFL so they are clearly not doing anything to cause the DROP command to get delivered. Nonetheless, what the backend sees post-control-C is 25502 postgres RET kevent 1 25502 postgres CALL recvfrom(0x7,0x7ad1d0,0x2000,0,0,0) 25502 postgres GIO fd 7 wrote 34 bytes "Q\0\0\0!DROP INDEX CONCURRENTLY idx;\0" 25502 postgres RET recvfrom 34/0x22 ... blah, blah, executing the command ... 25502 postgres CALL sendto(0x7,0x241d020,0x16,0,0,0) 25502 postgres RET sendto -1 errno 32 Broken pipe ... blah, blah, shutting down ... It's really hard to look at this and not call it a kernel bug. regards, tom lane
On Tue, Oct 26, 2021 at 03:41:58PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Tue, Oct 26, 2021 at 02:03:54AM -0400, Tom Lane wrote: > >> Or more > >> practically, use advisory locks in that script to enforce that only one > >> runs at once. > > > The author did try that. > > Ah, I see: if the other pgbench thread is waiting in pg_advisory_lock, > then it is inside a transaction, so it blocks CIC from completing. > We can get around that though, by using pg_try_advisory_lock and not > proceeding if it fails. Good thought. > The attached POC does this for the 002 test; > it looks like the same thing could be done to 003. > > Now the problem with this is it will only work back to v12, because > pgbench lacks the necessary features before that. However, I think > it's worth doing it like this in versions where we can do so, because > of the load-balancing aspect: this won't waste cycles running CICs > after the inserts have stopped, nor vice versa. > > Thoughts? I tried it out with the fix removed (git checkout fdd965d; git checkout HEAD^ src/include src/backend; git apply CIC-test-with-one-pgbench.patch). Sensitivity (~25%) and runtime (~900ms) were in the same ballpark. Still, even if it doesn't buy back notable cycles, the test code seems easier to read and understand with your change.
Noah Misch <noah@leadboat.com> writes: > On Tue, Oct 26, 2021 at 03:41:58PM -0400, Tom Lane wrote: >> Now the problem with this is it will only work back to v12, because >> pgbench lacks the necessary features before that. However, I think >> it's worth doing it like this in versions where we can do so, because >> of the load-balancing aspect: this won't waste cycles running CICs >> after the inserts have stopped, nor vice versa. > I tried it out with the fix removed (git checkout fdd965d; git checkout HEAD^ > src/include src/backend; git apply CIC-test-with-one-pgbench.patch). > Sensitivity (~25%) and runtime (~900ms) were in the same ballpark. Still, > even if it doesn't buy back notable cycles, the test code seems easier to read > and understand with your change. The point wasn't to reduce the runtime: it was to get more useful test iterations in the same runtime. A test iteration that runs CIC while there are no concurrent inserts isn't providing any useful coverage. It's possible that with this patch, we could dial it back to less than 1000 (200x5) total transactions and still get the same effective number of test iterations as we have in HEAD. I didn't really look into that. I did notice that even on a much faster machine than prairiedog, HEAD wastes at least three-quarters of the CIC iterations. regards, tom lane
> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а): > > We can get around that though, by using pg_try_advisory_lock and not > proceeding if it fails. The attached POC does this for the 002 test; > it looks like the same thing could be done to 003. That's a neat idea. PFA patch with copy of this changes to 003. Also I've added a script to stress not only CREATE INDEX CONCURRENTLY, but also REINDEX CONCURRENTLY. Because it was easy and checks slightly more code paths. I do not observe failures on Dragonfly 6.0 with the patch, but I didn't run for a long yet time. Best regards, Andrey Borodin.
Вложения
On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote: > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote: > > Pushed. > > cm@enterprisedb.com, could you post a stack trace from buildfarm member > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like > those seen today: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39 > > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably > broke something, but I've run out of ideas, and my hpux boxes died. cm@enterprisedb.com, is this still in your queue, or not? I am somewhat concerned that the 2021-11-11 releases will contain a regression if the gharial failures remain undiagnosed. I don't know how to diagnose them without your help. (I currently can't justify reverting the suspect bug fix on the basis of two master-branch-only failures in one buildfarm member.)
Andrey Borodin <x4mmm@yandex-team.ru> writes: >> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а): >> We can get around that though, by using pg_try_advisory_lock and not >> proceeding if it fails. The attached POC does this for the 002 test; >> it looks like the same thing could be done to 003. > That's a neat idea. PFA patch with copy of this changes to 003. Pushed. It'll be interesting to see if conchuela's behavior changes. regards, tom lane
On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote: > > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably > > broke something, but I've run out of ideas, and my hpux boxes died. I recently managed to scrounge up an account in another generous project's operating system graveyard, and could possibly try to repro that or put you in touch. On first attempt just now I couldn't build because of weird problems with ifaddr.c's need for structs in <net/if.h> that seem to be kernel/internal-only that I unfortunately won't have time to debug until next week (I've successfully built here before so I must be doing something stupid).
On Fri, Oct 29, 2021 at 09:59:14AM +1300, Thomas Munro wrote: > On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote: > > > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably > > > broke something, but I've run out of ideas, and my hpux boxes died. > > I recently managed to scrounge up an account in another generous > project's operating system graveyard, and could possibly try to repro > that or put you in touch. On first attempt just now I couldn't build > because of weird problems with ifaddr.c's need for structs in > <net/if.h> that seem to be kernel/internal-only that I unfortunately > won't have time to debug until next week (I've successfully built here > before so I must be doing something stupid). Thanks. Either would help. If you can put me in touch this week, let's try that since there's little time left before release.
On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote: > > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а): > > These failures started on 2021-10-09, the day conchuela updated from DragonFly > > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. > > Since the theorized kernel bug seems not to affect > > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround > > from other tests. One thing in common with src/test/recovery/t/017_shm.pl and > > the newest failure sites is that they don't write anything to the child stdin. > > If not, does passing the script via stdin, like "pgbench -f- > > <script.sql", work around the problem? > > I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts. Did that work? Commit 7f580aa should make this unnecessary for v12+ contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl needs a fix in all branches. A backup plan is just to skip affected tests on dragonfly 6+. Since the breakage has been limited to so few tests, I'm optimistic that a better workaround will present itself.
Noah Misch <noah@leadboat.com> writes: > Did that work? Commit 7f580aa should make this unnecessary for v12+ > contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl > needs a fix in all branches. A backup plan is just to skip affected tests on > dragonfly 6+. Since the breakage has been limited to so few tests, I'm > optimistic that a better workaround will present itself. It indeed is looking like 7f580aa made the problem go away on conchuela, but do we understand why? The only theory I can think of is "kernel bug", but while that's plausible for prairiedog it seems hard to credit for a late-model BSD kernel. regards, tom lane
On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It indeed is looking like 7f580aa made the problem go away on conchuela, > but do we understand why? The only theory I can think of is "kernel bug", > but while that's plausible for prairiedog it seems hard to credit for a > late-model BSD kernel. I have yet to even log into a DBSD system (my attempt to install the 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I do intend to get it working at some point. But I can offer a poorly researched wildly speculative hypothesis: DBSD forked from FBSD in 2003. macOS 10.3 took FBSD's kqueue code in... 2003. So maybe a bug was fixed later that they both inherited? Or perhaps that makes no sense, I dunno. It'd be nice to try to write a repro and send them a report, if we can.
> 29 окт. 2021 г., в 08:42, Thomas Munro <thomas.munro@gmail.com> написал(а): > > On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It indeed is looking like 7f580aa made the problem go away on conchuela, >> but do we understand why? The only theory I can think of is "kernel bug", >> but while that's plausible for prairiedog it seems hard to credit for a >> late-model BSD kernel. > > I have yet to even log into a DBSD system (my attempt to install the > 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I > do intend to get it working at some point. Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use. root password is P@ssw0rd To reproduce cd postgres/contrib/amcheck su x4m /tesh.sh Best regards, Andrey Borodin.
On Fri, Oct 29, 2021 at 04:42:31PM +1300, Thomas Munro wrote: > On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It indeed is looking like 7f580aa made the problem go away on conchuela, > > but do we understand why? I don't. > > The only theory I can think of is "kernel bug", > > but while that's plausible for prairiedog it seems hard to credit for a > > late-model BSD kernel. DragonFly BSD is a niche OS, so I'm more willing than usual to conclude that. Could be a bug in IPC::Run or in the port of Perl to DragonFly, but those feel less likely than the kernel. The upgrade from DragonFly v4.4.3 to DragonFly v6.0.0, which introduced this form of PostgreSQL test breakage, also updated Perl from v5.20.3 to 5.32.1. > I have yet to even log into a DBSD system (my attempt to install the > 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I > do intend to get it working at some point. But I can offer a poorly > researched wildly speculative hypothesis: DBSD forked from FBSD in > 2003. macOS 10.3 took FBSD's kqueue code in... 2003. So maybe a bug > was fixed later that they both inherited? Or perhaps that makes no > sense, I dunno. It'd be nice to try to write a repro and send them a > report, if we can. The conchuela bug and the prairiedog bug both present with a timeout in IPC::Run::finish, but the similarity ends there. On prairiedog, the postmaster was stuck when it should have been reading a query from pgbench. On conchuela, pgbench ran to completion and became a zombie, and IPC::Run got stuck when it should have been reaping that zombie. Good thought, however.
On Fri, Oct 29, 2021 at 7:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use. Thanks! I may yet need to use that because I haven't seen the problem yet, but I finally managed to get the OS running with a reusable Vagrant file after solving a couple of stupid problems[1]. [1] https://github.com/macdice/postgresql-dev-vagrant/blob/master/dragonfly6/Vagrantfile
On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > Pushed.
>
> cm@enterprisedb.com, could you post a stack trace from buildfarm member
> gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> those seen today:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
>
> Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> broke something, but I've run out of ideas, and my hpux boxes died.
cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
concerned that the 2021-11-11 releases will contain a regression if the
gharial failures remain undiagnosed. I don't know how to diagnose them
without your help. (I currently can't justify reverting the suspect bug fix
on the basis of two master-branch-only failures in one buildfarm member.)
> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а): > > On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote: > On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote: > > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote: > > > Pushed. > > > > cm@enterprisedb.com, could you post a stack trace from buildfarm member > > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like > > those seen today: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39 > > > > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably > > broke something, but I've run out of ideas, and my hpux boxes died. > > cm@enterprisedb.com, is this still in your queue, or not? I am somewhat > concerned that the 2021-11-11 releases will contain a regression if the > gharial failures remain undiagnosed. I don't know how to diagnose them > without your help. (I currently can't justify reverting the suspect bug fix > on the basis of two master-branch-only failures in one buildfarm member.) > > Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was afailure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD > > Please let me know if I missed something Hi Sandeep, thank you for the response! Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observedonly on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this wouldaffect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault. Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runsfail it's vital to collect backtraces of run. Many thanks! Best regards, Andrey Borodin.
> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed. I don't know how to diagnose them
> without your help. (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something
Hi Sandeep, thank you for the response!
Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.
Many thanks!
Best regards, Andrey Borodin.
On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed. I don't know how to diagnose them
> without your help. (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something
Hi Sandeep, thank you for the response!
Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.Hi Andrew,OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to run multiple times in a day. Unfortunately, I can't run it ~100 times because it's a legacy (slow) server and also we run another animal (anole - with a different compiler) on the same server. Depending on the time every run on HEAD takes, I'll increase the frequency. Hope this helps.
Many thanks!
Best regards, Andrey Borodin.--Sandeep Thakkar
On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote: > On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > >> Some failed runs on this animal indicate SegFault in places changed by > >> bugfix relevant to this thread. Fails were observed only on HEAD, but this > >> may be a result of concurrent nature of possible new bug. If there is a new > >> bug - this would affect upcoming point release. But we do not know for > >> sure. Other animals show no signs of any related SegFault. > >> Can you please run make check phase on this animal on HEAD multiple time > >> (preferably ~hundred times)? If some of this runs fail it's vital to > >> collect backtraces of run. > > > > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to > > run multiple times in a day. Unfortunately, I can't run it ~100 times > > because it's a legacy (slow) server and also we run another animal (anole - > > with a different compiler) on the same server. Depending on the time every > > run on HEAD takes, I'll increase the frequency. Hope this helps. > > I've used "--force" option so that it ignores the last running status. > ./run_build.pl --verbose --force HEAD Thanks, but I don't think that gets us closer to having a stack trace for the SIGSEGV. If some run gets a SIGSEGV, the core dump will get unlinked. I'd disable both buildfarm member cron jobs and then run this script (derived from gharial's buildfarm configuration): git clone https://git.postgresql.org/git/postgresql.git cd postgresql git checkout 70bef49 export LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib' cat >temp.conf <<EOF log_line_prefix = '%m [%p:%l] %q%a ' log_connections = 'true' log_disconnections = 'true' log_statement = 'all' fsync = off force_parallel_mode = regress EOF ./configure --enable-cassert --enable-debug --with-perl --without-readline \ --with-libxml --with-libxslt --with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64 \ --with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include \ --with-pgport=5678 CFLAGS=-mlp64 CC=gcc make while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done # When the loop stops, use the core file to make a stack trace. If it runs # for a week without stopping, give up.
On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote:
> On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:
> > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Some failed runs on this animal indicate SegFault in places changed by
> >> bugfix relevant to this thread. Fails were observed only on HEAD, but this
> >> may be a result of concurrent nature of possible new bug. If there is a new
> >> bug - this would affect upcoming point release. But we do not know for
> >> sure. Other animals show no signs of any related SegFault.
> >> Can you please run make check phase on this animal on HEAD multiple time
> >> (preferably ~hundred times)? If some of this runs fail it's vital to
> >> collect backtraces of run.
> >
> > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to
> > run multiple times in a day. Unfortunately, I can't run it ~100 times
> > because it's a legacy (slow) server and also we run another animal (anole -
> > with a different compiler) on the same server. Depending on the time every
> > run on HEAD takes, I'll increase the frequency. Hope this helps.
>
> I've used "--force" option so that it ignores the last running status.
> ./run_build.pl --verbose --force HEAD
Thanks, but I don't think that gets us closer to having a stack trace for the
SIGSEGV. If some run gets a SIGSEGV, the core dump will get unlinked. I'd
disable both buildfarm member cron jobs and then run this script (derived from
gharial's buildfarm configuration):
git clone https://git.postgresql.org/git/postgresql.git
cd postgresql
git checkout 70bef49
export LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
cat >temp.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
force_parallel_mode = regress
EOF
./configure --enable-cassert --enable-debug --with-perl --without-readline \
--with-libxml --with-libxslt --with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64 \
--with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include \
--with-pgport=5678 CFLAGS=-mlp64 CC=gcc
make
while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done
# When the loop stops, use the core file to make a stack trace. If it runs
# for a week without stopping, give up.
-rw------- 1 pgbfarm users 6672384 Oct 30 22:04 ./postgresql/src/test/regress/tmp_check/data/core.postgres.1267
-rw------- 1 pgbfarm users 3088384 Oct 30 22:09 ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422
bash-4.1$ gdb ./postgresql/tmp_install/usr/local/pgsql/bin/postgres ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422
HP gdb 6.1 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.
Copyright 1986 - 2009 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 6.1 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
SEGV_MAPERR - Address not mapped to object
#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()
(gdb) bt
#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()
warning: Attempting to unwind past bad PC 0x3fffffffff3fdbf0
#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
tupdesc2=0x60000000001fba08)
#2 0x40000000017f9660:0 in RelationClearRelation (
relation=0x60000000001f3730, rebuild=true)
#3 0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)
#4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)
#5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
msg=0x60000000001a46b8)
#6 0x40000000017c84e0:0 in ProcessInvalidationMessages (
group=0x60000000001a43e4, func=0x87ffffffef7b7250)
#7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()
#8 0x40000000006b1c50:0 in AtCCI_LocalCache ()
#9 0x40000000006b0910:0 in CommandCounterIncrement ()
#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
OIDOldToast=0)
#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
reloptions=0, lockmode=8) at toasting.c:62
#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
lockmode=8, context=0x87ffffffffffc590)
Вложения
> 31 окт. 2021 г., в 12:31, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а): > > Here is the stack trace: Thanks Sandeep! So far I didn't come up with a clear understanding how this might happen. The only idea I have: 1. It seems equalTupleDescs() got two valid pointers, probably with broken data. 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly? 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added sizeof(relminmxid)? #define CLASS_TUPLE_SIZE \ (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId)) The idea doesn't seem very strong but that's only 2 cents I have. Noah, what do you think? Best regards, Andrey Borodin.
Hi, On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote: > > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, > > > > tupdesc2=0x60000000001fba08) Could you print out *tupdesc1, *tupdesc2? And best also p tupdesc1->attrs[0] p tupdesc1->attrs[1] p tupdesc1->attrs[2] p tupdesc2->attrs[0] p tupdesc2->attrs[1] p tupdesc2->attrs[2] > > #2 0x40000000017f9660:0 in RelationClearRelation ( > > > > relation=0x60000000001f3730, rebuild=true) Hm, too bad that we don't have working line numbers for some reason? > > #3 0x40000000017fa730:0 in RelationFlushRelation > > (relation=0x60000000001f3730) > > > > #4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry > > (relationId=27272) > > > > #5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage ( > > > > msg=0x60000000001a46b8) > > > > #6 0x40000000017c84e0:0 in ProcessInvalidationMessages ( > > > > group=0x60000000001a43e4, func=0x87ffffffef7b7250) > > > > #7 0x40000000017cb420:0 in CommandEndInvalidationMessages () > > > > #8 0x40000000006b1c50:0 in AtCCI_LocalCache () > > > > #9 0x40000000006b0910:0 in CommandCounterIncrement () > > > > #10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730, > > > > toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true, > > > > OIDOldToast=0) > > > > #11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272, > > > > reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88 > > > > #12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272, > > > > reloptions=0, lockmode=8) at toasting.c:62 > > > > #13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0, > > > > lockmode=8, context=0x87ffffffffffc590) This crash could suggest that somehow the catalogs were corrupted and that we're not reading back valid tupledescs from them. Hopefully we'll know more after looking at the tupledescs. Greetings, Andres Freund
Hi,
On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> >
> > tupdesc2=0x60000000001fba08)
Could you print out *tupdesc1, *tupdesc2? And best also
p tupdesc1->attrs[0]
p tupdesc1->attrs[1]
p tupdesc1->attrs[2]
p tupdesc2->attrs[0]
p tupdesc2->attrs[1]
p tupdesc2->attrs[2]
> > #2 0x40000000017f9660:0 in RelationClearRelation (
> >
> > relation=0x60000000001f3730, rebuild=true)
Hm, too bad that we don't have working line numbers for some reason?
> > #3 0x40000000017fa730:0 in RelationFlushRelation
> > (relation=0x60000000001f3730)
> >
> > #4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry
> > (relationId=27272)
> >
> > #5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
> >
> > msg=0x60000000001a46b8)
> >
> > #6 0x40000000017c84e0:0 in ProcessInvalidationMessages (
> >
> > group=0x60000000001a43e4, func=0x87ffffffef7b7250)
> >
> > #7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()
> >
> > #8 0x40000000006b1c50:0 in AtCCI_LocalCache ()
> >
> > #9 0x40000000006b0910:0 in CommandCounterIncrement ()
> >
> > #10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
> >
> > toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
> >
> > OIDOldToast=0)
> >
> > #11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
> >
> > reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
> >
> > #12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
> >
> > reloptions=0, lockmode=8) at toasting.c:62
> >
> > #13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
> >
> > lockmode=8, context=0x87ffffffffffc590)
This crash could suggest that somehow the catalogs were corrupted and
that we're not reading back valid tupledescs from them. Hopefully we'll
know more after looking at the tupledescs.
Greetings,
Andres Freund
> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а): > > > > On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote: > > > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, > > > > > > tupdesc2=0x60000000001fba08) > > Could you print out *tupdesc1, *tupdesc2? And best also > p tupdesc1->attrs[0] > p tupdesc1->attrs[1] > p tupdesc1->attrs[2] > p tupdesc2->attrs[0] > p tupdesc2->attrs[1] > p tupdesc2->attrs[2] > > you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files wherethis needs to be done or give me a patch? You can paste these commands into gdb. Just as you did with bt. It will print values. Thank you! Best regards, Andrey Borodin.
> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
>
>
> On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> > >
> > > tupdesc2=0x60000000001fba08)
>
> Could you print out *tupdesc1, *tupdesc2? And best also
> p tupdesc1->attrs[0]
> p tupdesc1->attrs[1]
> p tupdesc1->attrs[2]
> p tupdesc2->attrs[0]
> p tupdesc2->attrs[1]
> p tupdesc2->attrs[2]
>
> you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files where this needs to be done or give me a patch?
You can paste these commands into gdb. Just as you did with bt. It will print values.
(gdb) bt
#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()
#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
tupdesc2=0x60000000001fba08)
#2 0x40000000017f9660:0 in RelationClearRelation (
relation=0x60000000001f3730, rebuild=true)
#3 0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)
#4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)
#5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
msg=0x60000000001a46b8)
#6 0x40000000017c84e0:0 in ProcessInvalidationMessages (
group=0x60000000001a43e4, func=0x87ffffffef7b7250)
#7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()
#8 0x40000000006b1c50:0 in AtCCI_LocalCache ()
#9 0x40000000006b0910:0 in CommandCounterIncrement ()
#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
OIDOldToast=0)
#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
reloptions=0, lockmode=8) at toasting.c:62
#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
lockmode=8, context=0x87ffffffffffc590)
---Type <return> to continue, or q <return> to quit---
#14 0x4000000000aa6fc0:0 in ATController (parsetree=0x60000000000e04e0,
rel=0x60000000001f3730, cmds=0x60000000000e0488, recurse=true, lockmode=8,
context=0x87ffffffffffc590)
#15 0x4000000000aa6410:0 in AlterTable (stmt=0x60000000000e04e0, lockmode=8,
context=0x87ffffffffffc590)
#16 0x40000000012f9a50:0 in ProcessUtilitySlow (pstate=0x6000000000117210,
pstmt=0x60000000000e0830,
queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x60000000000e0920, qc=0x87ffffffffffd838)
#17 0x40000000012f8890:0 in standard_ProcessUtility (pstmt=0x60000000000e0830,
queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)
#18 0x40000000012f6150:0 in ProcessUtility (pstmt=0x60000000000e0830,
queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)
#19 0x40000000012f24c0:0 in PortalRunUtility (portal=0x600000000015baf0,
pstmt=0x60000000000e0830, isTopLevel=true, setHoldSnapshot=false,
dest=0x60000000000e0920, qc=0x87ffffffffffd838)
#20 0x40000000012f2c90:0 in PortalRunMulti (portal=0x600000000015baf0,
isTopLevel=true, setHoldSnapshot=false, dest=0x60000000000e0920,
---Type <return> to continue, or q <return> to quit---
altdest=0x60000000000e0920, qc=0x87ffffffffffd838)
#21 0x40000000012f08d0:0 in PortalRun (portal=0x600000000015baf0,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x60000000000e0920, altdest=0x60000000000e0920, qc=0x87ffffffffffd838)
#22 0x40000000012dcf70:0 in exec_simple_query (
query_string=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;")
#23 0x40000000012eaaf0:0 in PostgresMain (
dbname=0x60000000000aaa78 "regression",
username=0x60000000000a8ff8 "pgbfarm")
#24 0x40000000010751b0:0 in BackendRun (port=0x60000000001168e0)
#25 0x4000000001074040:0 in BackendStartup (port=0x60000000001168e0)
#26 0x4000000001068e60:0 in ServerLoop ()
#27 0x40000000010679b0:0 in PostmasterMain (argc=8, argv=0x87ffffffffffe610)
#28 0x4000000000d45660:0 in main (argc=8, argv=0x87ffffffffffe610)
at main.c:146
(gdb) p tupdesc1->attrs[1]
No symbol "tupdesc1" in current context.
(gdb) p tupdesc1->attrs[2]
No symbol "tupdesc1" in current context.
(gdb) p tupdesc2->attrs[0]
No symbol "tupdesc2" in current context.
(gdb) p tupdesc2->attrs[1]
No symbol "tupdesc2" in current context.
(gdb)
Thank you!
Best regards, Andrey Borodin.
On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote: > (gdb) p tupdesc1->attrs[1] > No symbol "tupdesc1" in current context. > (gdb) p tupdesc1->attrs[2] > No symbol "tupdesc1" in current context. > (gdb) p tupdesc2->attrs[0] > No symbol "tupdesc2" in current context. > (gdb) p tupdesc2->attrs[1] > No symbol "tupdesc2" in current context. > (gdb) I think you need to select stack frame 1 before running these commands. I believe just running "frame 1" before you run these print commands should do the trick. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> (gdb) p tupdesc1->attrs[1]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc1->attrs[2]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc2->attrs[0]
> No symbol "tupdesc2" in current context.
> (gdb) p tupdesc2->attrs[1]
> No symbol "tupdesc2" in current context.
> (gdb)
I think you need to select stack frame 1 before running these
commands. I believe just running "frame 1" before you run these print
commands should do the trick.
(gdb) frame 1
#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
tupdesc2=0x60000000001fba08)
(gdb) p tupdesc1->attrs[0]
$1 = {attrelid = 27272, attname = {
data = "initial", '\000' <repeats 56 times>}, atttypid = 23,
attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0,
atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p',
attcompression = 0 '\000', attnotnull = false, atthasdef = false,
atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}
(gdb) p tupdesc1->attrs[1]
$2 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>},
atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0,
attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
atthasdef = true, atthasmissing = true, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true,
attinhcount = 0, attcollation = 0}
(gdb) p tupdesc1->attrs[2]
$3 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true,
attinhcount = 0, attcollation = 950}
(gdb) p tupdesc2->attrs[0]
$4 = {attrelid = 27272, attname = {
data = "initial", '\000' <repeats 56 times>}, atttypid = 23,
attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0,
atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p',
attcompression = 0 '\000', attnotnull = false, atthasdef = false,
atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}
(gdb) p tupdesc2->attrs[1]
$5 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>},
atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0,
attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
atthasdef = true, atthasmissing = true, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true,
attinhcount = 0, attcollation = 0}
(gdb) p tupdesc2->attrs[2]
$6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true,
attinhcount = 0, attcollation = 950}
(gdb)
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote: > (gdb) frame 1 > > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0, > > tupdesc2=0x60000000001fba08) > ... > (gdb) p tupdesc2->attrs[2] > > $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>}, > atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0, > attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c', > attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false, > atthasdef = false, atthasmissing = false, attidentity = 0 '\000', > attgenerated = 0 '\000', attisdropped = false, attislocal = true, > attinhcount = 0, attcollation = 950} That looks healthy. Since gdb isn't giving line numbers, let's single-step from the start of the function and see if that is informative. Please apply the attached patch, which just adds a test file. Then run "make -C src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and attach src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a reply to this email. On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote: > So far I didn't come up with a clear understanding how this might happen. Agreed. > The only idea I have: > 1. It seems equalTupleDescs() got two valid pointers, probably with broken data. > 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly? > 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added sizeof(relminmxid)? > #define CLASS_TUPLE_SIZE \ > (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId)) See the comment at overread_tuplestruct_pg_cast for the reason why I think that can't cause an actual malfunction. Still, there's some possibility of this being the explanation.
Вложения
On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote:
> (gdb) frame 1
>
> #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
>
> tupdesc2=0x60000000001fba08)
>
...
> (gdb) p tupdesc2->attrs[2]
>
> $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
> atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
> attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
> attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
> atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
> attgenerated = 0 '\000', attisdropped = false, attislocal = true,
> attinhcount = 0, attcollation = 950}
That looks healthy. Since gdb isn't giving line numbers, let's single-step
from the start of the function and see if that is informative. Please apply
the attached patch, which just adds a test file. Then run "make -C
src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and
attach
src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a
reply to this email.
On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote:
> So far I didn't come up with a clear understanding how this might happen.
Agreed.
> The only idea I have:
> 1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
> 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
> 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added sizeof(relminmxid)?
> #define CLASS_TUPLE_SIZE \
> (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))
See the comment at overread_tuplestruct_pg_cast for the reason why I think
that can't cause an actual malfunction. Still, there's some possibility of
this being the explanation.
Semab
On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote: > checking for fop... no > checking for dbtoepub... no > checking for perl module IPC::Run 0.79... no > checking for perl module Test::More 0.87... no > checking for perl module Time::HiRes 1.52... ok > configure: error: Additional Perl modules are required to run TAP tests > > Seems like can't enable tap tests with current frameworks You could always install that locally with CPAN, with something like: export PERL5LIB=~/perl5/lib/perl5 cpan -T IPC::Run Hmm, speaking of missing package management, GCC 4.6.0 is long dead and missing many bug fixes, and in fact the whole 4.6.x line is finished. I remembered another phase of weird segmentation faults that was never really explained, on that system: https://www.postgresql.org/message-id/flat/CA%2BhUKGLukanJE9W8C%2B0n8iRsZDpbuhcWOxBMjGaUO-RNHhBGXw%40mail.gmail.com
On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote: > On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote: > > checking for fop... no > > checking for dbtoepub... no > > checking for perl module IPC::Run 0.79... no > > checking for perl module Test::More 0.87... no > > checking for perl module Time::HiRes 1.52... ok > > configure: error: Additional Perl modules are required to run TAP tests > > > > Seems like can't enable tap tests with current frameworks > > You could always install that locally with CPAN, with something like: > > export PERL5LIB=~/perl5/lib/perl5 > cpan -T IPC::Run Here's something more self-contained: mkdir $HOME/perl-for-tap (cd $HOME/perl-for-tap && wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz && gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - && wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz && gzip -d <IPC-Run-0.79.tar.gz | tar xf -) export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"' Semab, if you run those commands and then rerun the "configure" that failed, does that help?
On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote:
> On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> > checking for fop... no
> > checking for dbtoepub... no
> > checking for perl module IPC::Run 0.79... no
> > checking for perl module Test::More 0.87... no
> > checking for perl module Time::HiRes 1.52... ok
> > configure: error: Additional Perl modules are required to run TAP tests
> >
> > Seems like can't enable tap tests with current frameworks
>
> You could always install that locally with CPAN, with something like:
>
> export PERL5LIB=~/perl5/lib/perl5
> cpan -T IPC::Run
Here's something more self-contained:
mkdir $HOME/perl-for-tap
(cd $HOME/perl-for-tap &&
wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz &&
gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - &&
wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz &&
gzip -d <IPC-Run-0.79.tar.gz | tar xf -)
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"'
Semab
Вложения
On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote: > After configure when i run make -C sarc/test/subscription check > PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching > regress_log_080_step_equalTupleDescs file and log file of make command Thanks. Please replace the previous test case patch with this one, and run it again. (On your platform, gdb needs an explicit path to the executable.)
Вложения
On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote:
> After configure when i run make -C sarc/test/subscription check
> PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching
> regress_log_080_step_equalTupleDescs file and log file of make command
Thanks. Please replace the previous test case patch with this one, and run it
again.
Semab
Вложения
On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote: > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs ( > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50) The addresses there are weird. tupdesc1 is neither a stack address nor a heap address; it may be in a program text section. tupdesc2 is a stack address. In the earlier stack trace from https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com both tupdesc1 and tupdesc2 were heap addresses. > /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8: Errorin sourced command file: > Error accessing memory address 0x40010006f968: Bad address. Thanks. Please try the attached test version, which avoids exiting too early like the last version did.
Вложения
On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote: > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote: > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs ( > > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50) > > The addresses there are weird. tupdesc1 is neither a stack address nor a heap > address; it may be in a program text section. tupdesc2 is a stack address. > In the earlier stack trace from > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com > both tupdesc1 and tupdesc2 were heap addresses. > > > /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8: Errorin sourced command file: > > Error accessing memory address 0x40010006f968: Bad address. > > Thanks. Please try the attached test version, which avoids exiting too early > like the last version did. If you haven't run that yet, please use this version instead. It collects more data. The log will probably be too big to be proper for the mailing list, so please compress it.
Вложения
On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
>
> The addresses there are weird. tupdesc1 is neither a stack address nor a heap
> address; it may be in a program text section. tupdesc2 is a stack address.
> In the earlier stack trace from
> https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> both tupdesc1 and tupdesc2 were heap addresses.
>
> > /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8: Error in sourced command file:
> > Error accessing memory address 0x40010006f968: Bad address.
>
> Thanks. Please try the attached test version, which avoids exiting too early
> like the last version did.
If you haven't run that yet, please use this version instead. It collects
more data. The log will probably be too big to be proper for the mailing
list, so please compress it.
Semab
Вложения
On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote: > On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote: > > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote: > > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote: > > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs ( > > > > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50) > > > > > > The addresses there are weird. tupdesc1 is neither a stack address nor a heap > > > address; it may be in a program text section. tupdesc2 is a stack address. > > > In the earlier stack trace from > > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com > > > both tupdesc1 and tupdesc2 were heap addresses. That turned out to be a false alarm. On gharial, a breakpoint at the start of the function doesn't see the real arguments. After a ten-instruction prologue, the real arguments appear, and they are heap addresses. > PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch Thanks. That shows the crash happened sometime after strcmp(defval1->adbin, defval2->adbin). Please run the attached version, which collects yet more information.
Вложения
On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote:
> On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > > > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> > >
> > > The addresses there are weird. tupdesc1 is neither a stack address nor a heap
> > > address; it may be in a program text section. tupdesc2 is a stack address.
> > > In the earlier stack trace from
> > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> > > both tupdesc1 and tupdesc2 were heap addresses.
That turned out to be a false alarm. On gharial, a breakpoint at the start of
the function doesn't see the real arguments. After a ten-instruction
prologue, the real arguments appear, and they are heap addresses.
> PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch
Thanks. That shows the crash happened sometime after strcmp(defval1->adbin,
defval2->adbin). Please run the attached version,
Semab
Вложения
On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote: > On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote: > > Thanks. That shows the crash happened sometime after strcmp(defval1->adbin, > > defval2->adbin). Please run the attached version, > > PFA the new log file > 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8) > 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>: br.call.sptk.many rp=0x3fffffffff3fdc30 > 0x3fffffffff3fdc30 in <unknown_procedure> () > 0x3fffffffff3fdc30: Error accessing memory address 0x3fffffffff3fdc30: Bad address. This postgres binary apparently contains an explicit branch to 0x3fffffffff3fdc30, which is not an address reasonably expected to contain code. (It's not a known heap, a known stack, or a CODE section from the binary file.) This probably confirms a toolchain bug. Would you do "git checkout 166f943" in the source directory you've been testing, then rerun the test and post the compressed tmp_check/log directory? I'm guessing that will show the bad branch instruction no longer present.
On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks. That shows the crash happened sometime after strcmp(defval1->adbin,
> > defval2->adbin). Please run the attached version,
>
> PFA the new log file
> 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8)
> 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>: br.call.sptk.many rp=0x3fffffffff3fdc30
> 0x3fffffffff3fdc30 in <unknown_procedure> ()
> 0x3fffffffff3fdc30: Error accessing memory address 0x3fffffffff3fdc30: Bad address.
This postgres binary apparently contains an explicit branch to
0x3fffffffff3fdc30, which is not an address reasonably expected to contain
code. (It's not a known heap, a known stack, or a CODE section from the
binary file.) This probably confirms a toolchain bug.
Would you do "git checkout 166f943" in the source directory you've been
testing, then rerun the test and post the compressed tmp_check/log directory?
I'm guessing that will show the bad branch instruction no longer present.
Semab
Вложения
On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote: > On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote: > > This postgres binary apparently contains an explicit branch to > > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain > > code. (It's not a known heap, a known stack, or a CODE section from the > > binary file.) This probably confirms a toolchain bug. > > > > Would you do "git checkout 166f943" in the source directory you've been > > testing, then rerun the test and post the compressed > > tmp_check/log directory? > > > > So make command fails after I applied the v0.4 patch to 166f943 also it did > not create any tmp_check/log directory > t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains: /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/5.8.8 /opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/site_perl/5.8.8 /opt/perl_64/lib/site_perl/opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/vendor_perl/5.8.8/opt/perl_64/lib/vendor_perl .) at /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line102. It looks like this attempt did not use a shell with the environment variables discussed upthread. Use these commands first to bring back the required environment: export LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib' export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > This postgres binary apparently contains an explicit branch to
> > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > code. (It's not a known heap, a known stack, or a CODE section from the
> > binary file.) This probably confirms a toolchain bug.
> >
> > Would you do "git checkout 166f943" in the source directory you've been
> > testing, then rerun the test and post the compressed
> > tmp_check/log directory?
> >
>
> So make command fails after I applied the v0.4 patch to 166f943 also it did
> not create any tmp_check/log directory
> t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains: /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/5.8.8 /opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/site_perl/5.8.8 /opt/perl_64/lib/site_perl /opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/vendor_perl/5.8.8 /opt/perl_64/lib/vendor_perl .) at /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 102.
It looks like this attempt did not use a shell with the environment variables
discussed upthread.
Yes, I missed the trick PFA tmp_check.tar.bz2
Semab
Вложения
On Tue, Nov 09, 2021 at 11:55:57AM +0500, Semab Tariq wrote: > On Mon, Nov 8, 2021 at 7:46 PM Noah Misch <noah@leadboat.com> wrote: > > On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote: > > > On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote: > > > > This postgres binary apparently contains an explicit branch to > > > > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain > > > > code. (It's not a known heap, a known stack, or a CODE section from the > > > > binary file.) This probably confirms a toolchain bug. > > > > > > > > Would you do "git checkout 166f943" in the source directory you've been > > > > testing, then rerun the test and post the compressed > > > > tmp_check/log directory? > PFA tmp_check.tar.bz2 Excellent. No crash, and the only difference in equalTupleDescs() code generation is the branch destination addresses. At commit 70bef49, gharial's toolchain generates the invalid branch destination: $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs) --- /dev/fd/63 2021-11-09 06:11:20.927444437 -0800 +++ /dev/fd/62 2021-11-09 06:11:20.926444428 -0800 @@ -100 +100 @@ - br.call.sptk.many rp=0x40000000003cdc20 + br.call.sptk.many rp=0x40000000003cde20 @@ -658 +658 @@ - br.call.sptk.many rp=0x40000000003cdc20 + br.call.sptk.many rp=0x40000000003cde20 @@ -817 +817 @@ - br.call.sptk.many rp=0x3fffffffff3fdc30 + br.call.sptk.many rp=0x4000000000400c30 @@ -949 +949 @@ - br.call.sptk.many rp=0x40000000003cdc20 + br.call.sptk.many rp=0x40000000003cde20 @@ -970 +970 @@ - br.call.sptk.many rp=0x40000000003cdc20 + br.call.sptk.many rp=0x40000000003cde20 Since "git diff 70bef49 166f943" contains nothing that correlates with such a change, I'm concluding that this is a bug in gharial's toolchain. It looks like gharial's automatic buildfarm runs have been paused for nine days. Feel free to unpause it. Also, I recommend using the buildfarm client setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'. Months or years pass between these events. Here are all gharial "signal 11" failures, likely some of which have other causes: sysname │ snapshot │ branch │ bfurl ─────────┼─────────────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────────────── gharial │ 2018-04-10 00:32:08 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-10%2000%3A32%3A08 gharial │ 2019-03-08 01:30:45 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2001%3A30%3A45 gharial │ 2019-03-08 08:55:31 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2008%3A55%3A31 gharial │ 2019-03-08 19:55:38 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2019%3A55%3A38 gharial │ 2019-08-20 09:57:27 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-20%2009%3A57%3A27 gharial │ 2019-08-21 08:04:58 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-21%2008%3A04%3A58 gharial │ 2019-08-22 00:37:03 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2000%3A37%3A03 gharial │ 2019-08-22 12:42:02 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2012%3A42%3A02 gharial │ 2019-08-24 18:43:52 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-24%2018%3A43%3A52 gharial │ 2019-08-25 11:14:36 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2011%3A14%3A36 gharial │ 2019-08-25 18:44:04 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2018%3A44%3A04 gharial │ 2019-08-26 08:47:19 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2008%3A47%3A19 gharial │ 2019-08-26 22:30:23 │ REL_12_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2022%3A30%3A23 gharial │ 2021-04-08 03:21:42 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-08%2003%3A21%3A42 gharial │ 2021-04-09 06:40:31 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-09%2006%3A40%3A31 gharial │ 2021-10-24 16:19:05 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05 gharial │ 2021-10-24 20:38:39 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39 (17 rows)
On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote: > $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs) > --- /dev/fd/63 2021-11-09 06:11:20.927444437 -0800 > +++ /dev/fd/62 2021-11-09 06:11:20.926444428 -0800 > @@ -100 +100 @@ > - br.call.sptk.many rp=0x40000000003cdc20 > + br.call.sptk.many rp=0x40000000003cde20 > @@ -658 +658 @@ > - br.call.sptk.many rp=0x40000000003cdc20 > + br.call.sptk.many rp=0x40000000003cde20 > @@ -817 +817 @@ > - br.call.sptk.many rp=0x3fffffffff3fdc30 > + br.call.sptk.many rp=0x4000000000400c30 > @@ -949 +949 @@ > - br.call.sptk.many rp=0x40000000003cdc20 > + br.call.sptk.many rp=0x40000000003cde20 > @@ -970 +970 @@ > - br.call.sptk.many rp=0x40000000003cdc20 > + br.call.sptk.many rp=0x40000000003cde20 > > Since "git diff 70bef49 166f943" contains nothing that correlates with such a > change, I'm concluding that this is a bug in gharial's toolchain. Excellent detective work. > It looks like gharial's automatic buildfarm runs have been paused for nine > days. Feel free to unpause it. Also, I recommend using the buildfarm client > setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'. Months or > years pass between these events. Here are all gharial "signal 11" failures, > likely some of which have other causes: Yeah I spent time investigating some of these at the time and I know others did too. IMHO we should hunt down dead toolchains and gently see if we can get them updated. There is no point in testing PostgreSQL on every commit on a compiler someone built from a tarball many years ago and never updated with bug fixes.
On Wed, Nov 10, 2021 at 10:55:13AM +1300, Thomas Munro wrote: > On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote: > > $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs) > > --- /dev/fd/63 2021-11-09 06:11:20.927444437 -0800 > > +++ /dev/fd/62 2021-11-09 06:11:20.926444428 -0800 > > @@ -100 +100 @@ > > - br.call.sptk.many rp=0x40000000003cdc20 > > + br.call.sptk.many rp=0x40000000003cde20 > > @@ -658 +658 @@ > > - br.call.sptk.many rp=0x40000000003cdc20 > > + br.call.sptk.many rp=0x40000000003cde20 > > @@ -817 +817 @@ > > - br.call.sptk.many rp=0x3fffffffff3fdc30 > > + br.call.sptk.many rp=0x4000000000400c30 > > @@ -949 +949 @@ > > - br.call.sptk.many rp=0x40000000003cdc20 > > + br.call.sptk.many rp=0x40000000003cde20 > > @@ -970 +970 @@ > > - br.call.sptk.many rp=0x40000000003cdc20 > > + br.call.sptk.many rp=0x40000000003cde20 > > > > Since "git diff 70bef49 166f943" contains nothing that correlates with such a > > change, I'm concluding that this is a bug in gharial's toolchain. > > Excellent detective work. Thanks. > > It looks like gharial's automatic buildfarm runs have been paused for nine > > days. Feel free to unpause it. Also, I recommend using the buildfarm client > > setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'. Months or > > years pass between these events. Here are all gharial "signal 11" failures, > > likely some of which have other causes: > > Yeah I spent time investigating some of these at the time and I know > others did too. > > IMHO we should hunt down dead toolchains and gently see if we can get > them updated. There is no point in testing PostgreSQL on every > commit on a compiler someone built from a tarball many years ago and > never updated with bug fixes. +1 for "gently see if". It's good that we have a few old-gcc animals, but having versions of intermediate age is less important. Some owners will decline, and that's okay.
On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote: > On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote: > > > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а): > > > These failures started on 2021-10-09, the day conchuela updated from DragonFly > > > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. > > > Since the theorized kernel bug seems not to affect > > > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround > > > from other tests. One thing in common with src/test/recovery/t/017_shm.pl and > > > the newest failure sites is that they don't write anything to the child stdin. > > > > If not, does passing the script via stdin, like "pgbench -f- > > > <script.sql", work around the problem? > > > > I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts. > > Did that work? Commit 7f580aa should make this unnecessary for v12+ > contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl > needs a fix in all branches. A backup plan is just to skip affected tests on > dragonfly 6+. Since the breakage has been limited to so few tests, I'm > optimistic that a better workaround will present itself. Is this still in your queue, or not? The conchuela breakage in src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm noise right now.
> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а): > > On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote: >> On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote: >>>> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а): >>>> These failures started on 2021-10-09, the day conchuela updated from DragonFly >>>> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE. It smells like a kernel bug. >>>> Since the theorized kernel bug seems not to affect >>>> src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround >>>> from other tests. One thing in common with src/test/recovery/t/017_shm.pl and >>>> the newest failure sites is that they don't write anything to the child stdin. >> >>>> If not, does passing the script via stdin, like "pgbench -f- >>>> <script.sql", work around the problem? >>> >>> I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts. >> >> Did that work? Commit 7f580aa should make this unnecessary for v12+ >> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl >> needs a fix in all branches. A backup plan is just to skip affected tests on >> dragonfly 6+. Since the breakage has been limited to so few tests, I'm >> optimistic that a better workaround will present itself. > > Is this still in your queue, or not? The conchuela breakage in > src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm > noise right now. Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week. Best regards, Andrey Borodin.
>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а): >> > Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week. > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand. I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches. But so far I didn't come up with some clever mutex idea for REL_10. Best regards, Andrey Borodin.
Вложения
On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote: > >> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а): > > Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week. > > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand. > I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches. > But so far I didn't come up with some clever mutex idea for REL_10. That's a reasonable sentiment, but removing background_pgbench() isn't going to fix 017_shm.pl. I'm not enthusiastic about any fix that repairs src/bin/pgbench without repairing 017_shm.pl. I'm okay with skipping affected test files on dragonfly >= 6 if you decide to cease figuring out how to make them pass like the others do.
On Sun, Nov 14, 2021 at 1:17 PM Noah Misch <noah@leadboat.com> wrote: > On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote: > > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand. > > I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches. > > But so far I didn't come up with some clever mutex idea for REL_10. > > That's a reasonable sentiment, but removing background_pgbench() isn't going > to fix 017_shm.pl. I'm not enthusiastic about any fix that repairs > src/bin/pgbench without repairing 017_shm.pl. I'm okay with skipping affected > test files on dragonfly >= 6 if you decide to cease figuring out how to make > them pass like the others do. Hmm, so if "IPC::Run got stuck when it should have been reaping that zombie", what's it stuck in, I guess select() or waitpid()? Maybe there' s a kernel bug but it seems hard to believe that a Unix system would have bugs in such fundamental facilities and still be able to build itself and ship a release... Otherwise I guess Perl, or perl scripts, would need to be confusing fds or pids or something? But that's hard to believe on its own, too, given the lack of problems on other systems that are pretty similar. If Andrey can still reproduce this, it'd be interesting to see a gdb backtrace, and also "ps O wchan" or perhaps kill -INFO $pid, and lsof for the process (or according to old pages found with google, perhaps the equivalent tool is "fstat" on that system).
> 14 нояб. 2021 г., в 05:09, Noah Misch <noah@leadboat.com> написал(а): > > On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote: >>>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а): >>> Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week. >> >> I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand. >> I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches. >> But so far I didn't come up with some clever mutex idea for REL_10. > > That's a reasonable sentiment, but removing background_pgbench() isn't going > to fix 017_shm.pl. I'm not enthusiastic about any fix that repairs > src/bin/pgbench without repairing 017_shm.pl. I'm okay with skipping affected > test files on dragonfly >= 6 if you decide to cease figuring out how to make > them pass like the others do. Let's skip these tests. How this can be accomplished? Should we mute only 022_cic.pl, 023_cic_2pc.pl, and 017_shm.pl? Or everything that calls harness->finish? I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used... Best regards, Andrey Borodin.
On Mon, Nov 15, 2021 at 8:20 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used... Summary for the record: defunct child, and gdb showed perl blocked in select(), and ps revealed that Dragonfly internally maps select() onto kevent() (wchan = kqread), which is interesting, but apparently not new. (Hmmm, /me eyes the recent changes to events reported on half-closed pipes[1].) [1] https://github.com/DragonFlyBSD/DragonFlyBSD/commits/master/sys/kern/sys_generic.c
I managed to reproduce the 017_shm.pl hang using my Vagrant file (see earlier). It happens for "gmake -s check-world", but not "gmake -s -C src/test/recovery check", which might have confused me last time. The symptoms are as described already, though this time I learned with truss that it's in a retry loop waiting 1 second at a time. I don't know how to get debug symbols for libc on this operating system (well, probably needed to select debug option while installing the OS, but I didn't install the OS, I'm using a lazy bones pre-rolled Vagrant image). So in order to be able to inspect the arguments to select, I made my own LD_PRELOAD wrapper for select (see end), and I observed that it was calling select() with an empty fd_set: (gdb) bt #0 0x00000008014c1e6c in select () from /lib/libc.so.8 #1 0x00000008009f0fe6 in select () from /usr/lib/libpthread.so.0 #2 0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0, writefds=0x8017f6d50, exceptfds=0x0, timeout=0x7fffffdfd4a0) at select.c:13 #3 0x00000008007bfaf9 in Perl_pp_sselect () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #4 0x000000080076b036 in Perl_runops_standard () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #5 0x00000008006da3b7 in perl_run () from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32 #6 0x0000000000400de4 in main () (gdb) f 2 #2 0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0, writefds=0x8017f6d50, exceptfds=0x0, timeout=0x7fffffdfd4a0) at select.c:13 13 return real_function(nfds, readfds, writefds, exceptfds, timeout); (gdb) print nfds $1 = 16 (gdb) print *readfds $2 = {fds_bits = {0 <repeats 16 times>}} (gdb) print *writefds $3 = {fds_bits = {0 <repeats 16 times>}} So it looks a lot like something on the perl side has lost track of the pipe it's supposed to be selecting on. If I understand correctly, it's supposed to be waiting for one of the following pipes to appear as readable, whichever is the one that the zombie psql process previously held the other end of: $ fstat -p 73032 USER CMD PID FD PATH INUM MODE SZ|DV R/W vagrant perl 73032 root / 1 drwxr-xr-x offset:0 r vagrant perl 73032 wd /home/vagrant/postgres/src/test/recovery 4434524058 drwxr-xr-x offset:0 r vagrant perl 73032 text /pfs/@@-1:00001/local/bin/perl 4329919842 -rwxr-xr-x offset:0 r vagrant perl 73032 0 /dev/pts/0 1335 crw--w---- pts/0:155171 rw vagrant perl 73032 1 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 2 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 3 /home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm 4477427235 -rw-r--r-- offset:9457 w vagrant perl 73032 4* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 5* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 6* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 7* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 8* pipe fffff800aac1dca0 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 9* pipe fffff800ab6e9020 (B<->A) ravail 0 wavail 0 rw vagrant perl 73032 13* pipe fffff800ab157560 (B<->A) ravail 0 wavail 0 rw === select wrapper === $ cat select.c #include <dlfcn.h> #include <sys/select.h> int select(int nfds,fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout) { static int (*real_function)(int, fd_set *, fd_set *, fd_set *, struct timeval *); if (!real_function) real_function = dlsym(RTLD_NEXT, "select"); return real_function(nfds, readfds, writefds, exceptfds, timeout); } $ cc -Wall -fPIC -shared -g -o myselect.so select.c -ldl $ cd postgres $ LD_PRELOAD=/home/vagrant/myselect.so gmake check-world -s
On Fri, Nov 19, 2021 at 5:09 PM Thomas Munro <thomas.munro@gmail.com> wrote: > (gdb) print *writefds > $3 = {fds_bits = {0 <repeats 16 times>}} Oops, that was after it had been cleared already by the OS; duh. On entry to my wrapper, writefds does in fact contain the bit pattern for fd 13. That led me to try a very simply C program which runs to completion on Linux and FreeBSD, but hangs forever on Dragonfly.
Вложения
On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote: > led me to try a very simply C program which runs to > completion on Linux and FreeBSD, but hangs forever on Dragonfly. > [pipe FD considered non-writable, but writing would give EPIPE] Looks like optimal material for a kernel bug report. Excellent discovery.
> 19 нояб. 2021 г., в 11:13, Noah Misch <noah@leadboat.com> написал(а): > > Looks like optimal material for a kernel bug report. Excellent discovery. Are we going to wait for the fix or disable tests partially on conchuela? Best regards, Andrey Borodin.
On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote: > On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote: > > led me to try a very simply C program which runs to > > completion on Linux and FreeBSD, but hangs forever on Dragonfly. > > > [pipe FD considered non-writable, but writing would give EPIPE] > > Looks like optimal material for a kernel bug report. Excellent discovery. https://bugs.dragonflybsd.org/issues/3307
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote: >> Looks like optimal material for a kernel bug report. Excellent discovery. > https://bugs.dragonflybsd.org/issues/3307 I see they're pushing back on whether this is a bug. I failed to find a way to quickly comment on the bug report, but I suggest you quote POSIX select(2) at them: A descriptor shall be considered ready for reading when a call to an input function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. (The function might return data, an end-of-file indication, or an error other than one indicating that it is blocked, and in each of these cases the descriptor shall be considered ready for reading.) A descriptor shall be considered ready for writing when a call to an output function with O_NONBLOCK clear would not block, whether or not the function would transfer data successfully. I don't know whether it'd help to point out that the test program works as expected on other BSDen. (I'm planning to go try it on a few more platforms in a bit, but in any case the standards-compliance question looks pretty open-and-shut to me.) regards, tom lane
Andrey Borodin <x4mmm@yandex-team.ru> writes: > Are we going to wait for the fix or disable tests partially on conchuela? Let's wait a bit and see if a fix will be forthcoming promptly. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > Oops, that was after it had been cleared already by the OS; duh. On > entry to my wrapper, writefds does in fact contain the bit pattern for > fd 13. That led me to try a very simply C program which runs to > completion on Linux and FreeBSD, but hangs forever on Dragonfly. For completeness, I checked this on macOS, OpenBSD, and NetBSD, and they all give the expected results. regards, tom lane
On Sat, Nov 20, 2021 at 3:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrey Borodin <x4mmm@yandex-team.ru> writes: > > Are we going to wait for the fix or disable tests partially on conchuela? > > Let's wait a bit and see if a fix will be forthcoming promptly. The fix came forth. I'll wait for the back-patch and then ask Mikael to upgrade conchuela.
On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote: > The fix came forth. I'll wait for the back-patch and then ask Mikael > to upgrade conchuela. Done; thanks Mikael.
On 2021-12-29 20:53, Thomas Munro wrote: > On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> The fix came forth. I'll wait for the back-patch and then ask Mikael >> to upgrade conchuela. > > Done; thanks Mikael. It has only had 1 pass through the buildfarm so far but all branches passed. So looking promising. Let's see how it looks after a few more passes. /Mikael