Обсуждение: CREATE INDEX CONCURRENTLY does not index prepared xact's data

Поиск

Список

Период

Сортировка

CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

19 декабря 2020 г., 17:13:39

Hi hackers!

$subj.

Steps to reproduce:
create extension if not exists amcheck;
create table if not exists t1(i int);
begin;
insert into t1 values(1);
prepare transaction 'x';
create index concurrently i1 on t1(i);
commit prepared 'x';
select bt_index_check('i1', true);

I observe:
NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

This happens because WaitForLockersMultiple() does not take prepared xacts into account. Meanwhile CREATE INDEX
CONCURRENTLYexpects that locks are dropped only when transaction commit is visible. 

This issue affects pg_repack and similar machinery based on CIC.

PFA draft of a fix.

Best regards, Andrey Borodin.

Вложения

v1-0001-Wait-for-prepared-xacts-in-CREATE-INDEX-CONCURREN.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Victor Yegorov

Дата:

19 декабря 2020 г., 17:22:14

сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:

Steps to reproduce:
create extension if not exists amcheck;
create table if not exists t1(i int);
begin;
insert into t1 values(1);
prepare transaction 'x';
create index concurrently i1 on t1(i);
commit prepared 'x';
select bt_index_check('i1', true);

I observe:
NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?..

Victor Yegorov

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

19 декабря 2020 г., 17:25:22


> 19 дек. 2020 г., в 22:22, Victor Yegorov <vyegorov@gmail.com> написал(а):
>
> CREATE INDEX CONCURRENTLY isn't supposed to be run inside a transaction?..

CREATE INDEX CONCURRENTLY cannot run inside a transaction block.

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Victor Yegorov

Дата:

19 декабря 2020 г., 17:48:13

сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:

I observe:
NOTICE: heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
I expect: awaiting 'x' commit before index is created, correct index after.

I agree, that behaviour is unexpected. But getting a notice that requires me
to re-create the index some time later is not better (from DBA perspective).

Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions?

Victor Yegorov

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Tom Lane

Дата:

19 декабря 2020 г., 17:57:41

Andrey Borodin <x4mmm@yandex-team.ru> writes:
> This happens because WaitForLockersMultiple() does not take prepared
> xacts into account.

Ugh, clearly an oversight.

> Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only
> when transaction commit is visible.

Don't follow your point here --- I'm pretty sure that prepared xacts
continue to hold their locks.

> PFA draft of a fix.

Haven't you completely broken VirtualXactLock()?  Certainly, whether the
target is a normal or prepared transaction shouldn't alter the meaning
of the "wait" flag.

In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
need to gain an additional parameter indicating whether to consider
prepared xacts.  It's not clear to me that their current behavior is wrong
for all possible uses.

            regards, tom lane

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

19 декабря 2020 г., 18:25:11


> 19 дек. 2020 г., в 22:48, Victor Yegorov <vyegorov@gmail.com> написал(а):
>
> сб, 19 дек. 2020 г. в 18:13, Andrey Borodin <x4mmm@yandex-team.ru>:
> I observe:
> NOTICE:  heap tuple (1,8) from table "t1" lacks matching index tuple within index "i1"
> I expect: awaiting 'x' commit before index is created, correct index after.
>
> I agree, that behaviour is unexpected. But getting a notice that requires me
> to re-create the index some time later is not better (from DBA perspective).
>
> Maybe it'd be better to wait on prepared xacts like on other open ordinary transactions?

This is not a real NOTICE. I just used a bit altered amcheck to diagnose the problem. It's an incorrect index. It lacks
sometuples. It will not find existing data, failing to provide "read committed" consistency guaranties. 
Fix waits for prepared xacts just like any other transaction.



> 19 дек. 2020 г., в 22:57, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
>> Meanwhile CREATE INDEX CONCURRENTLY expects that locks are dropped only
>> when transaction commit is visible.
>
> Don't follow your point here --- I'm pretty sure that prepared xacts
> continue to hold their locks.
Uhmm, yes, locks are there. Relation is locked with RowExclusiveLock, but this lock is ignored by
WaitForLockers(heaplocktag,ShareLock, true). Locking relation with ShareLock works as expected. 

>> PFA draft of a fix.
>
> Haven't you completely broken VirtualXactLock()?  Certainly, whether the
> target is a normal or prepared transaction shouldn't alter the meaning
> of the "wait" flag.
You are right, the patch has a bug here.

> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
> need to gain an additional parameter indicating whether to consider
> prepared xacts.  It's not clear to me that their current behavior is wrong
> for all possible uses.
I don't see usages besides indexing stuff. But maybe it worth to analyse each case...

BTW do we need a test for this? Will isolation test be good at checking this?

Thanks!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

20 декабря 2020 г., 18:10:37


> 19 дек. 2020 г., в 23:25, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> BTW do we need a test for this? Will isolation test be good at checking this?

PFA patch set with isolation test for the $subj and fix for VirtualXactLock() bug.

I think I'll register the thread on January CF.

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Michael Paquier

Дата:

21 декабря 2020 г., 05:40:03

On Sat, Dec 19, 2020 at 12:57:41PM -0500, Tom Lane wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> This happens because WaitForLockersMultiple() does not take prepared
>> xacts into account.
>
> Ugh, clearly an oversight.

This looks to be the case since 295e639 where virtual XIDs have been
introduced.  So this is an old bug.

> Don't follow your point here --- I'm pretty sure that prepared xacts
> continue to hold their locks.

Yes, that's what I recall as well.

> Haven't you completely broken VirtualXactLock()?  Certainly, whether the
> target is a normal or prepared transaction shouldn't alter the meaning
> of the "wait" flag.

Yep.

> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
> need to gain an additional parameter indicating whether to consider
> prepared xacts.  It's not clear to me that their current behavior is wrong
> for all possible uses.

WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY,
where it seems to me we need to care about all the cases related to
concurrent build, validation and index drop.  The other caller of
GetLockConflicts() is for conflict resolution in standbys where it is
fine to ignore 2PC transactions as these cannot be cancelled.  So I
agree that we are going to need more control with a new option
argument to be able to control if 2PC transactions are ignored or
not.

Hmm.  The approach taken by the patch looks to be back-patchable.
Based on the lack of complaints on the matter, we could consider
instead putting an error in WaitForLockersMultiple() if there is at
least one numPrepXact which would at least avoid inconsistent data.
But I don't think what's proposed here is bad either.

VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing
that VirtualTransactionIdIsPreparedXact() combined with
LocalTransactionIdIsValid() would be enough to do the job.

-       Assert(VirtualTransactionIdIsValid(vxid));
+       Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid));
+
+       if (VirtualTransactionIdIsPreparedXact(vxid))
[...]
#define VirtualTransactionIdIsPreparedXact(vxid) \
    ((vxid).backendId == InvalidBackendId)
This would allow the case where backendId and localTransactionId are
both invalid.  So it would be better to also check in
VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no?
--
Michael

Вложения

signature.asc

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

21 декабря 2020 г., 07:24:55


> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а):
>
>> In general, I wonder whether WaitForLockersMultiple and GetLockConflicts
>> need to gain an additional parameter indicating whether to consider
>> prepared xacts.  It's not clear to me that their current behavior is wrong
>> for all possible uses.
>
> WaitForLockers is used only by REINDEX and CREATE/DROP CONCURRENTLY,
> where it seems to me we need to care about all the cases related to
> concurrent build, validation and index drop.  The other caller of
> GetLockConflicts() is for conflict resolution in standbys where it is
> fine to ignore 2PC transactions as these cannot be cancelled.

I don't think that fact that we cannot cancel transaction is sufficient here to ignore prepared transaction. I think
thereshould not exist any prepared transaction that we need to cancel in standby conflict resolution. And if it exists
-it's a sign of corruption, we could emit warning or something like that. 
But I'm really not an expert here, just a common sense that prepared transaction is just like regular transaction that
survivescrash. If we wait for any transaction - probably we should wait for prepared too. I'm not insisting on anything
though.

>  So I
> agree that we are going to need more control with a new option
> argument to be able to control if 2PC transactions are ignored or
> not.
>
> Hmm.  The approach taken by the patch looks to be back-patchable.
> Based on the lack of complaints on the matter, we could consider
> instead putting an error in WaitForLockersMultiple() if there is at
> least one numPrepXact which would at least avoid inconsistent data.
> But I don't think what's proposed here is bad either.
>
> VirtualTransactionIdIsValidOrPreparedXact() is confusing IMO, knowing
> that VirtualTransactionIdIsPreparedXact() combined with
> LocalTransactionIdIsValid() would be enough to do the job.
>
> -       Assert(VirtualTransactionIdIsValid(vxid));
> +       Assert(VirtualTransactionIdIsValidOrPreparedXact(vxid));
> +
> +       if (VirtualTransactionIdIsPreparedXact(vxid))
> [...]
> #define VirtualTransactionIdIsPreparedXact(vxid) \
>    ((vxid).backendId == InvalidBackendId)
> This would allow the case where backendId and localTransactionId are
> both invalid.  So it would be better to also check in
> VirtualTransactionIdIsPreparedXact() that the XID is not invalid, no?
Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch.

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

21 декабря 2020 г., 08:19:59


> 21 дек. 2020 г., в 12:24, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> Seems valid. Removed VirtualTransactionIdIsValidOrPreparedXact() from patch.

Sorry for the noise, removal was not complete.

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

23 декабря 2020 г., 07:23:28


> 21 дек. 2020 г., в 10:40, Michael Paquier <michael@paquier.xyz> написал(а):
>
> Hmm.  The approach taken by the patch looks to be back-patchable.
I was checking that patch\test works in supported branches and everything seems to be fine down to REL_10_STABLE.

But my machines (Ubuntu18 and MacOS) report several fails in isolation tests on REL9_6_20\REL9_6_STABLE. Particularly
eval-plan-qual-trigger,drop-index-concurrently-1, and async-notify. I do not observe problems with regular isolation
teststhough. 

Do I understand correctly that check-world tests on buildfarm 'make check-prepared-txns' and the problem is somewhere
inmy machines? Or something is actually broken\outdated? 

Thanks!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Michael Paquier

Дата:

23 декабря 2020 г., 07:52:02

On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote:
> Do I understand correctly that check-world tests on buildfarm 'make
> check-prepared-txns' and the problem is somewhere in my machines? Or
> something is actually broken\outdated?

The buildfarm code has no trace of check-prepared-txns.  And, FWIW, I
see no failures at the top of REL9_6_STABLE.  Do you mean that this
happens only with your patch?  Or do you mean that you see failures
using the stable branch of upstream?  I have not tested the former,
but the latter works fine on my end.
--
Michael

Вложения

signature.asc

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

25 декабря 2020 г., 09:05:04

> 23 дек. 2020 г., в 12:52, Michael Paquier <michael@paquier.xyz> написал(а):
>
> On Wed, Dec 23, 2020 at 12:23:28PM +0500, Andrey Borodin wrote:
>> Do I understand correctly that check-world tests on buildfarm 'make
>> check-prepared-txns' and the problem is somewhere in my machines? Or
>> something is actually broken\outdated?
>
> FWIW, I
> see no failures at the top of REL9_6_STABLE.

Thanks, Michael! The problem was indeed within my machines. maintainer-cleanup is not enough for make
check-prepared-txns.Fresh real installation is necessary. 

I've checked that test works down to REL9_5_STABLE with patch.

Thanks!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

22 января 2021 г., 05:44:35

Thanks for looking into this!

> 17 янв. 2021 г., в 12:24, Noah Misch <noah@leadboat.com> написал(а):
>
>> --- a/src/backend/storage/lmgr/lock.c
>> +++ b/src/backend/storage/lmgr/lock.c
>> @@ -2931,15 +2929,17 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
>>
>>     /*
>>      * Allocate memory to store results, and fill with InvalidVXID.  We only
>> -     * need enough space for MaxBackends + a terminator, since prepared xacts
>> -     * don't count. InHotStandby allocate once in TopMemoryContext.
>> +     * need enough space for MaxBackends + max_prepared_xacts + a
>> +     * terminator. InHotStandby allocate once in TopMemoryContext.
>>      */
>>     if (InHotStandby)
>>     {
>>         if (vxids == NULL)
>>             vxids = (VirtualTransactionId *)
>>                 MemoryContextAlloc(TopMemoryContext,
>> -                                   sizeof(VirtualTransactionId) * (MaxBackends + 1));
>> +                                   sizeof(VirtualTransactionId) * (MaxBackends
>> +                                   + max_prepared_xacts
>> +                                   + 1));
>
> PostgreSQL typically puts the operator before the newline.  Also, please note
> the whitespace error that "git diff --check origin/master" reports.
Fixed.
>
>>     }
>>     else
>>         vxids = (VirtualTransactionId *)
>
> This is updating only the InHotStandby branch.  Both branches need the change.
Fixed.
>
>> @@ -4461,9 +4462,21 @@ bool
>> VirtualXactLock(VirtualTransactionId vxid, bool wait)
>> {
>>     LOCKTAG        tag;
>> -    PGPROC       *proc;
>> +    PGPROC       *proc = NULL;
>>
>> -    Assert(VirtualTransactionIdIsValid(vxid));
>> +    Assert(VirtualTransactionIdIsValid(vxid) ||
>> +            VirtualTransactionIdIsPreparedXact(vxid));
>> +
>> +    if (VirtualTransactionIdIsPreparedXact(vxid))
>> +    {
>> +        LockAcquireResult lar;
>> +        /* If it's prepared xact - just wait for it */
>> +        SET_LOCKTAG_TRANSACTION(tag, vxid.localTransactionId);
>> +        lar = LockAcquire(&tag, ShareLock, false, !wait);
>> +        if (lar == LOCKACQUIRE_OK)
>
> This should instead test "!= LOCKACQUIRE_NOT_AVAIL", lest
> LOCKACQUIRE_ALREADY_HELD happen.  (It perhaps can't happen, but skipping the
> LockRelease() would be wrong if it did.)
I think that code that successfully acquired lock should release it. Other way we risk to release someone's else lock
heldfor a reason. We only acquire lock to release it instantly anyway. 

>
>> +            LockRelease(&tag, ShareLock, false);
>> +        return lar != LOCKACQUIRE_NOT_AVAIL;
>> +    }
>>
>>     SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid);
>>
>> diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
>> index 1c3e9c1999..cedb9d6d2a 100644
>> --- a/src/include/storage/lock.h
>> +++ b/src/include/storage/lock.h
>> @@ -70,6 +70,8 @@ typedef struct
>> #define VirtualTransactionIdIsValid(vxid) \
>>     (((vxid).backendId != InvalidBackendId) && \
>>      LocalTransactionIdIsValid((vxid).localTransactionId))
>> +#define VirtualTransactionIdIsPreparedXact(vxid) \
>> +    ((vxid).backendId == InvalidBackendId)
>
> Your patch is introducing VirtualTransactionId values that represent prepared
> xacts, and it has VirtualTransactionIdIsValid() return false for those values.
> Let's make VirtualTransactionIdIsValid() return true for them, since they're
> as meaningful as any other value.  The GetLockConflicts() header comment says
> "The result array is palloc'd and is terminated with an invalid VXID."  Patch
> v4 falsifies that comment.  The array can contain many of these new "invalid"
> VXIDs, and an all-zeroes VXID terminates the array.  Rather than change the
> comment, let's change VirtualTransactionIdIsValid() to render the comment true
> again.  Most (perhaps all) VirtualTransactionIdIsValid() callers won't need to
> distinguish the prepared-transaction case.
Makes sense, fixed. I was afraid that there's something I'm not aware about. I've iterated over
VirtualTransactionIdIsValid()calls and did not find suspicious cases. 

> An alternative to redefining VXID this way would be to have some new type,
> each instance of which holds either a valid VXID or a valid
> prepared-transaction XID.  That alternative feels inferior to me, though.
> What do you think?
I think we should not call valid vxids "invalid".

By the way maybe rename "check-prepared-txns" to "check-prepared-xacts" for consistency?

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

22 января 2021 г., 05:49:13


> 22 янв. 2021 г., в 10:44, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> <v5-0001-Wait-for-prepared-xacts-in-CREATE-INDEX-CONCURREN.patch><v5-0002-Add-test-for-CIC-with-prepared-xacts.patch>
Uh, vscode did not save files and I've send incorrect version. Disregard v5.
Sorry for the noise.

Best regards, Andrey Borodin.


> 1 мая 2021 г., в 17:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> FWIW I have 2 new reported cases on 12.6.

Sorry to say that, but $subj persists. Here's a simple reproduction.
To get corrupted index you need 3 psql sessions A, B and C. By default command is executed in A.

create extension amcheck ;
create table t1(i int);
create index on t1(i);

begin;
insert into t1 values(0);

-- session C: reindex table concurrently t1;

prepare transaction 'a';
begin;
insert into t1 values(0);
-- session B: commit prepared 'a';
prepare transaction 'b';
begin;
insert into t1 values(0);
-- session B: commit prepared 'b';
prepare transaction 'c';

begin;
insert into t1 values(0);
-- session B: commit prepared 'c';
prepare transaction 'd';
commit prepared 'd';

-- session C: postgres=# select bt_index_check('i1',true);
ERROR:  heap tuple (0,2) from table "t1" lacks matching index tuple within index "i1"
HINT:  Retrying verification using the function bt_index_parent_check() might provide a more specific error.

The problem is WaitForLockersMultiple() gathers running vxids and 2pc xids. Then it waits, but if vxid is converted to
2pcit fails to wait. 
I could not compose isolation test for this, because we need to do "prepare transaction 'a';" only when "reindex table
concurrentlyt1;" is already working for some time. 

To fix it we can return locking xids along with vxids from GetLockConflicts() like in attached diff. But this approach
seemsugly. 


Best regards, Andrey Borodin.

Вложения

return_xids_with_vxids.diff

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

18 июля 2021 г., 05:17:28


> 18 июля 2021 г., в 01:12, Noah Misch <noah@leadboat.com> написал(а):
>
> Suppose some transaction has a vxid but no xid.  Shortly after
> GetLockConflicts(), it acquires an xid, modifies the table, and issues PREPARE
> TRANSACTION.  Could that cause a corrupt index even with this diff?

Firstly I've tried to stress things out. This little go program [0] easily reproduces corruption on patched code.
Meanwhile vxid->xid->2px program does not [1] (both patched and unpatched).

I think CIC does not care much about VXIDs at all. It's only important when real XID started: before GetLockConflicts()
orafter. 

Thanks!


Best regards, Andrey Borodin.

[0] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-xid-2px
[1] https://gist.github.com/x4m/8b6025995eedf29bf588727375014dfc#file-stress-vxid-xid-2px

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

19 июля 2021 г., 07:10:52


> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а):
>
> To fix $SUBJECT, it sounds like we need a way to identify a transaction,
> usable as early as the transaction's first catalog access and remaining valid
> until COMMIT PREPARED finishes.  We may initially see a transaction as having
> a VXID and no XID, then later need to wait for that transaction when it has
> entered prepared state, having an XID and no VXID.  How might we achieve that?

PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments explaining
whatis going on. 
We write actual VXID into dummy proc entries of prepared xact.
When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared 2pc
state.If vxid is not there - it means it is already gone and there's nothing to wait. 

Thanks!

Best regards, Andrey Borodin.

Вложения

v2-0001-Wait-for-vxids-in-CIC.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

19 июля 2021 г., 18:41:28

> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а):
>
> On Mon, Jul 19, 2021 at 12:10:52PM +0500, Andrey Borodin wrote:
>>>> 19 июля 2021 г., в 05:30, Noah Misch <noah@leadboat.com> написал(а):
>>>
>>> To fix $SUBJECT, it sounds like we need a way to identify a transaction,
>>> usable as early as the transaction's first catalog access and remaining valid
>>> until COMMIT PREPARED finishes.  We may initially see a transaction as having
>>> a VXID and no XID, then later need to wait for that transaction when it has
>>> entered prepared state, having an XID and no VXID.  How might we achieve that?
>>
>> PFA draft with vxid->xid mapping and subsequent wait for it. The patch, obviously, lacks a ton of comments
explainingwhat is going on. 
>> We write actual VXID into dummy proc entries of prepared xact.
>> When we wait for vxid we try to convert it to xid through real proc entry. If we cannot do so - we lookup in shared
2pcstate. If vxid is not there - it means it is already gone and there's nothing to wait. 
>
> When the system reuses BackendId values, it reuses VXID values.  In the
> general case, two prepared transactions could exist simultaneously with the
> same BackendId+LocalTransactionId.  Hmm.  It could be okay to have a small
> probability that CIC waits on more transactions than necessary.  Suppose we
> have three PGPROC entries with the same VXID, two prepared transactions and
> one regular transaction.  Waiting for all three could be tolerable, though
> avoiding that would be nice.  Should we follow transactions differently to
> avoid that?

We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID. But I think that we
haveto wait for all 2PCs with the same VXID. 

We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
Other 2PCs with same VXID may be older or newer than target 2PC.
Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 

But this logic seem to me overly complicated. Or isn’t it?

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

20 июля 2021 г., 19:25:29


> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
> Other 2PCs with same VXID may be older or newer than target 2PC.
> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 
>
> But this logic seem to me overly complicated. Or isn’t it?

Here's the PoC to asses complexity of this solution.

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

20 июля 2021 г., 19:38:25


> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each VXID
onlyonce. 
> Other 2PCs with same VXID may be older or newer than target 2PC.
> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each ambiguous
VXID->XIDmapping choose oldest XID. 
>
> But this logic seem to me overly complicated. Or isn’t it?

> Other 2PCs with same VXID may be older or newer than target 2PC. Older 2PCs must be with XID in conflicts array.
Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of some
otherunrelated 2PC working with different relations. 

Sorry for the noise.

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

21 июля 2021 г., 08:13:08


> 21 июля 2021 г., в 02:49, Noah Misch <noah@leadboat.com> написал(а):
>
> On Wed, Jul 21, 2021 at 12:38:25AM +0500, Andrey Borodin wrote:
>>> 19 июля 2021 г., в 23:41, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>>>> 19 июля 2021 г., в 23:10, Noah Misch <noah@leadboat.com> написал(а):
>>>> Suppose we
>>>> have three PGPROC entries with the same VXID, two prepared transactions and
>>>> one regular transaction.  Waiting for all three could be tolerable, though
>>>> avoiding that would be nice.  Should we follow transactions differently to
>>>> avoid that?
>>>
>>> We don’t have to wait for regular Xid in this case at all. Because it would be finished with VXID.
>
> I don't understand those two sentences.  Could you write more?
Suppose we have a VXID conflicting with reindexed relation lock, and have a PGPROC with regular Xid (not 2PC) for this
VXID.
We do not need to return this xid from TwoPhaseGetXidByVXid() for extra wait. This situation is covered by normal vxid
handlingin VirtualXactLock(). 
+    /* Save the xid to test if transaction coverted to 2pc later */
+    xid = proc->xid;


>>> We are looking for transaction that was only VXID during GetLockConflicts(). In conflicts array we may have each
VXIDonly once. 
>>> Other 2PCs with same VXID may be older or newer than target 2PC.
>>> Older 2PCs must be with XID in conflicts array. So we might wait for all 2PC with known XIDs. Then for each
ambiguousVXID->XID mapping choose oldest XID. 
>
>> Unfortunately, this is just wrong. Older 2PC with same VXID don't have to be in conflicts array. They might be of
someother unrelated 2PC working with different relations. 
>
> I agree.  Would it work to do the following sequence in WaitForLockers()?
>
> 1. In GetLockConflicts(), record a list of conflicting XIDs.  Also record a
>   list of conflicting LXIDs having no XID.
> 2. Wait on all LXIDs recorded in (1).  They have either ended or converted to
>   prepared transactions.
> 3. Inner-join the present list of prepared transactions with the list of
>   LXIDs from (1).  Record the XIDs of matched entries.
> 4. Wait on all XIDs recorded in (1) and (3).
>
> While that may wait on some prepared XIDs needlessly, it can't degrade into
> persistent starvation.  We could reduce the chance of a needless XID wait by
> remembering the list of old prepared XIDs before (1) and not adding any of
> those remembered, old XIDs in (3).  That last bit probably would be
> unjustified complexity, but maybe not.
I think this protocol is equivalent to waiting on all Xids with VXid.
I consider this protocol safe. FPA implementation.
Patch 0001 is intact version of previous patch.
There are two additions:
1. Prefer xids to vxids in GetLockConflicts()
2. Wait on all 2PCs with given VXid.

Thanks!

Best regards, Andrey Borodin.


> 24 июля 2021 г., в 03:30, Noah Misch <noah@leadboat.com> написал(а):
>
> It could be okay, but I think it's better to add the test under amcheck.  You
> could still use pgbench in the test.
Currently it's still WIP.
I've added two tests: deterministic with psql and probabilistic with pgbench.
And I really do not like pgbench test:
1. It does not seem stable enough, it can turn buildfarm red as a good watermelon.
2. Names for 2PCs are choosen at random and have probability of collision.
3. It still breaks the fix and I have no idea why.

Can you please take a look on added TAP test? Probably I'm doing wrong a lot of things, it's the very first program on
Perlwritten by me... 
background_pgbench is a local invention. sub pgbench is a copy from nearby test. Should I refactor it somewhere?

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

29 июля 2021 г., 12:30:56


> 27 июля 2021 г., в 17:50, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> 3. It still breaks the fix and I have no idea why.

I'm still coping with the bug. I have a stable reproduction, but to the date unable to identify roots of the problem.
Here's the sample trace:
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl LOG:  statement: REINDEX INDEX CONCURRENTLY idx;
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 1
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 2
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3493, vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3490, vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3492
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3492, vxid -1/3492
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 3
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 4
2021-07-29 17:14:29.000 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 5
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3512
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3512, vxid -1/3512
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3513, vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  Phase 6
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3516, vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3515, vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 3517, vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  Backend is doing something else
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: PreparedXactLock xid 0, vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  XXX: Sucessfully found xid by vxid 0
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING:  Phase Final
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl LOG:  statement: SELECT bt_index_check('idx',true);
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl ERROR:  heap tuple (18,74) from table "tbl" lacks matching index
tuplewithin index "idx" xmin 3504 xmax 0 
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl STATEMENT:  SELECT bt_index_check('idx',true);

Rogue tuple was committed by xid 3504 which was never returned by GetLockConflicts(). I'm attaching patch set just for
thereference of the trace, not expecting code review now. 

I've fixed unrelated bug in previous patchset.
-                       SET_LOCKTAG_TRANSACTION(tag, xid);
+                       SET_LOCKTAG_TRANSACTION(tag, xidlist.xid);
                        lar = LockAcquire(&tag, ShareLock, false, !wait);
                        if (lar != LOCKACQUIRE_NOT_AVAIL)
                                LockRelease(&tag, ShareLock, false);
-                       return lar != LOCKACQUIRE_NOT_AVAIL;
+                       if (lar == LOCKACQUIRE_NOT_AVAIL)
+                               return false;

But it does not help. Maybe I've broken something by my fix...Searching further.

Thanks for reading! Would be happy to hear any ideas why xid was not locked by GetLockConflicts() bug committed a tuple
whichwas not indexed. 

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

29 июля 2021 г., 14:21:44


> 29 июля 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> Searching further.

Currently CIC test breaks indexes even without 2PC. How is it supposed to work if vxid stays vxid through
GetLockConflicts()\WaitForLockersMultiple()barrier and then suddenly converts to xid and commits before index
validated?

Best regards, Andrey Borodin.

Вложения

v7-0001-Do-not-use-Prepared-transactions-in-CIC-test.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

30 июля 2021 г., 10:42:10


> 30 июля 2021 г., в 07:25, Noah Misch <noah@leadboat.com> написал(а):
> What alternative fix designs should we consider?

I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly.
Unindexed tuple was committed somewhere at the end of Phase 3 or 4.
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG:  statement: REINDEX INDEX CONCURRENTLY idx;
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 1
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 2
2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6735
2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6736
2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 3
2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6750
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 4
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 5
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6762
2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6763
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  Phase 6
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid 6/2166
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6767
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6764
2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING:  XXX: VirtualXactLock vxid -1/6765
2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING:  Phase Final
2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG:  statement: SELECT bt_index_check('idx',true);
2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR:  heap tuple (46,16) from table "tbl" lacks matching index
tuplewithin index "idx" xmin 6751 xmax 0 

Attaching debug logging patch, amcheck is modified to return xmin. Trace is gathered by "grep -e ERROR -e REINDEX -e
WARN-e SELECT tmp_check/log/002_cic_2pc_CIC_2PC_test.log". 

How deep the rabbit hole goes?

Thanks!

Best regards, Andrey Borodin.


> 6 авг. 2021 г., в 23:03, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> But how to get why something did not happen?
> During GetLockConflicts() proclocks SHMQueue do not include locks of concurrently running xid within
PrepareTransaction()[seems like it's with PostPrepare_locks(), but I'm not sure]. 
> The problem is reproducible within 1 second by the script leading to amcheck-detected corruption.

Ok, finally I've figured out this charade.

AtPrepare_Locks() transfers fast-path locks to normal locks pointing to PGPROC with xid in a process of conversion to
2PC.
ProcArrayClearTransaction(MyProc) resets xid int PGROCs.
GetLockConflicts() sees empty xid and vxid.
PostPrepare_Locks(xid) hids things with new PGPROC for 2PC.

PFA PoC fix. But I'm entirely not sure it's OK to do reorder things at PrepareTransaction() this way.

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

07 августа 2021 г., 19:00:55


> 7 авг. 2021 г., в 20:33, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> <v9-0001-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch>

> <v9-0004-PoC-fix-clear-xid.patch>

> <v9-0003-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxid.patch>

> <v9-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcach.patch>

Here's v10.
Changes:
1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure. 
2. Editorialised step 3 (vxid->xid lookup). Fixing typos and some small bugs.

Tomorrow I'll try to cleanup step 1 (tap tests).

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

07 августа 2021 г., 22:19:57

On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
> Changes:
> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure.

I don't know either, hence that quick data structure to delay the question.
debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").

> Tomorrow I'll try to cleanup step 1 (tap tests).

For the final commits, I would like to have two commits.  The first will fix
the non-2PC bug and add the test for that fix.  The second will fix the 2PC
bug and add the additional test.  Feel free to structure your changes into
more patch files, so long as it's straightforward to squash them into two
commits that way.

A few things I noticed earlier:

> --- a/contrib/amcheck/Makefile
> +++ b/contrib/amcheck/Makefile
> @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity"
>  
>  REGRESS = check check_btree check_heap
>  
> -TAP_TESTS = 1
> +TAP_TESTS = 2

TAP_TESTS is a Boolean flag, not a count.  Leave it set to 1.

> --- /dev/null
> +++ b/contrib/amcheck/t/002_cic_2pc.pl

> +$node->append_conf('postgresql.conf', 'lock_timeout = 5000');

> +my $main_timer = IPC::Run::timeout(5);

> +my $reindex_timer = IPC::Run::timeout(5);

Values this short cause spurious buildfarm failures.  Generally, use 180s for
timeouts that do not expire in a successful run.

> +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench:
> +# CIC will deadlock with itself occasionally.

Consider fixing that by taking an advisory lock before running CIC.  However,
if use of separate pgbench executions works better, feel free to keep that.

> +pgbench(
> +    '--no-vacuum --client=5 --time=1',

Consider using a transaction count instead of --time.  That way, slow machines
still catch the bug, and fast machines waste less time.  For the "concurrent
OID generation" test, I tuned the transaction count so the test failed about
half the time[1] on a buggy build.

> +            \set txid random(1, 1000000000)
> +            BEGIN;
> +            INSERT INTO tbl VALUES(0);
> +            PREPARE TRANSACTION 'tx:txid';

Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the
chance of collision.

[1] https://postgr.es/m/20160215171129.GA347322@tornado.leadboat.com

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

08 августа 2021 г., 11:31:07


> 8 авг. 2021 г., в 03:19, Noah Misch <noah@leadboat.com> написал(а):
>
> On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
>> Changes:
>> 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure. 
>
> I don't know either, hence that quick data structure to delay the question.
> debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
> reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
> get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").
I've toyed around with
$node->append_conf('postgresql.conf', 'debug_invalidate_system_caches_always = 3'); in test.
I had failures only at most with
Assert(in_progress_offset < 4);
See [0] for stack trace. But I do not think that it proves that deeper calls are impossible with other DB schemas.

>> Tomorrow I'll try to cleanup step 1 (tap tests).
>
> For the final commits, I would like to have two commits.  The first will fix
> the non-2PC bug and add the test for that fix.  The second will fix the 2PC
> bug and add the additional test.  Feel free to structure your changes into
> more patch files, so long as it's straightforward to squash them into two
> commits that way.
Done so.
Step 1. Test for CIC with regular transactions.
Step 2. Fix
Step 3. Test for CIC with 2PC
Step 4. Part of the fix that I'm sure about
Step 5. Dubious part of fix...

I had to refactor subs pgbench and pgbench_background to be reused among tests. Though I did not refactor pgbench tests
touse this functions: they have two different versions for sub pgbench. 

> A few things I noticed earlier:
>
>> --- a/contrib/amcheck/Makefile
>> +++ b/contrib/amcheck/Makefile
>> @@ -12,7 +12,7 @@ PGFILEDESC = "amcheck - function for verifying relation integrity"
>>
>> REGRESS = check check_btree check_heap
>>
>> -TAP_TESTS = 1
>> +TAP_TESTS = 2
>
> TAP_TESTS is a Boolean flag, not a count.  Leave it set to 1.
Fixed.

>> --- /dev/null
>> +++ b/contrib/amcheck/t/002_cic_2pc.pl
>
>> +$node->append_conf('postgresql.conf', 'lock_timeout = 5000');
>
>> +my $main_timer = IPC::Run::timeout(5);
>
>> +my $reindex_timer = IPC::Run::timeout(5);
>
> Values this short cause spurious buildfarm failures.  Generally, use 180s for
> timeouts that do not expire in a successful run.
Fixed.

>> +# Run background pgbench with CIC. We cannot mix-in this script into single pgbench:
>> +# CIC will deadlock with itself occasionally.
>
> Consider fixing that by taking an advisory lock before running CIC.  However,
> if use of separate pgbench executions works better, feel free to keep that.
I've tried this approach. Advisory lock owns vxid, thus deadlocking with CIC.

>> +pgbench(
>> +    '--no-vacuum --client=5 --time=1',
>
> Consider using a transaction count instead of --time.  That way, slow machines
> still catch the bug, and fast machines waste less time.  For the "concurrent
> OID generation" test, I tuned the transaction count so the test failed about
> half the time[1] on a buggy build.
I've tuned tests for my laptop to observe probabilistic test to pass sometimes.

>> +            \set txid random(1, 1000000000)
>> +            BEGIN;
>> +            INSERT INTO tbl VALUES(0);
>> +            PREPARE TRANSACTION 'tx:txid';
>
> Try "PREPARE TRANSACTION 'c:client_id'" (or c:client_id:txid) to eliminate the
> chance of collision.
Fixed.

How to you think, do we have a chance to fix things before next release on August 12th?

Thanks!

Best regards, Andrey Borodin.


> 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а):
>>
>> Do you see failures with that loop?  If so, can you diagnose them?
>
> See below.
I do not observe failure on my laptop, though reproduced it on a linux server.
I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master.

>
>> (So far, I've not seen a failure from the 1PC test.)
>
> I eventually did see one.  Just one 1PC failure in six hours of 1PC test
> runtime, though.
I've attached a patch that reproduces the problem in 30sec on my server.


Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andres Freund

Дата:

15 августа 2021 г., 13:45:47

Hi,

On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote:
> From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Fri, 30 Jul 2021 14:40:16 +0500
> Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache
>  invalidation
> 
> ---
>  src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
> index 13d9994af3..7eec7b1f41 100644
> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2)
>   *        (suggesting we are trying to access a just-deleted relation).
>   *        Any other error is reported via elog.
>   */
> +typedef struct InProgressRels {
> +    Oid relid;
> +    bool invalidated;
> +} InProgressRels;
> +static InProgressRels inProgress[100];
> +
>  static Relation
>  RelationBuildDesc(Oid targetRelId, bool insertIt)
>  {
> +    int in_progress_offset;
>      Relation    relation;
>      Oid            relid;
>      HeapTuple    pg_class_tuple;
> @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>      }
>  #endif
>  
> +    for (in_progress_offset = 0;
> +         OidIsValid(inProgress[in_progress_offset].relid);
> +         in_progress_offset++)
> +        ;
> +    inProgress[in_progress_offset].relid = targetRelId;
> +retry:
> +    inProgress[in_progress_offset].invalidated = false;
> +
>      /*
>       * find the tuple in pg_class corresponding to the given relation id
>       */
> @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>       */
>      heap_freetuple(pg_class_tuple);
>  
> +    if (inProgress[in_progress_offset].invalidated)
> +        goto retry;                /* TODO free old one */
> +    /* inProgress is in fact the stack, we can safely remove it's top */
> +    inProgress[in_progress_offset].relid = InvalidOid;
> +    Assert(inProgress[in_progress_offset + 1].relid == InvalidOid);
> +
>      /*
>       * Insert newly created relation into relcache hash table, if requested.
>       *
> @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId)
>          relcacheInvalsReceived++;
>          RelationFlushRelation(relation);
>      }
> +    else
> +    {
> +        int i;
> +        for (i = 0; OidIsValid(inProgress[i].relid); i++)
> +            if (inProgress[i].relid == relationId)
> +                inProgress[i].invalidated = true;
> +    }
>  }

Desperately needs comments. Without a commit message and without
comments it's hard to review this without re-reading the entire thread -
which approximately nobody will do.


> From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Sat, 7 Aug 2021 20:27:14 +0500
> Subject: [PATCH v12 5/6] PoC fix clear xid
> 
> ---
>  src/backend/access/transam/xact.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
> index 387f80419a..9b19d939eb 100644
> --- a/src/backend/access/transam/xact.c
> +++ b/src/backend/access/transam/xact.c
> @@ -2500,7 +2500,6 @@ PrepareTransaction(void)
>       * done *after* the prepared transaction has been marked valid, else
>       * someone may think it is unlocked and recyclable.
>       */
> -    ProcArrayClearTransaction(MyProc);
>  
>      /*
>       * In normal commit-processing, this is all non-critical post-transaction
> @@ -2535,6 +2534,8 @@ PrepareTransaction(void)
>      PostPrepare_MultiXact(xid);
>  
>      PostPrepare_Locks(xid);
> +
> +    ProcArrayClearTransaction(MyProc);
>      PostPrepare_PredicateLocks(xid);

The comment above ProcArrayClearTransaction would need to be moved and
updated...

> From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001
> From: Andrey Borodin <amborodin@acm.org>
> Date: Mon, 19 Jul 2021 11:50:02 +0500
> Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids
>  converted to 2pc
...

> +/*
> + * TwoPhaseGetXidByVXid
> + *        Try to lookup for vxid among prepared xacts
> + */
> +XidListEntry
> +TwoPhaseGetXidByVXid(VirtualTransactionId vxid)
> +{
> +    int                i;
> +    XidListEntry    result;
> +    result.next = NULL;
> +    result.xid = InvalidTransactionId;
> +
> +    LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
> +
> +    for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> +    {
> +        GlobalTransaction    gxact = TwoPhaseState->prepXacts[i];
> +        PGPROC               *proc = &ProcGlobal->allProcs[gxact->pgprocno];
> +        VirtualTransactionId proc_vxid;
> +        GET_VXID_FROM_PGPROC(proc_vxid, *proc);
> +
> +        if (VirtualTransactionIdEquals(vxid, proc_vxid))
> +        {
> +            if (result.xid != InvalidTransactionId)
> +            {
> +                /* Already has candidate - need to alloc some space */
> +                XidListEntry *copy = palloc(sizeof(XidListEntry));
> +                copy->next = result.next;
> +                copy->xid = result.xid;
> +                result.next = copy;
> +            }
> +            result.xid = gxact->xid;
> +        }
> +    }
> +
> +    LWLockRelease(TwoPhaseStateLock);
> +
> +    return result;
> +}

Dynamic memory allocations while holding a fairly central lock - one
which is now going to be more contended - doesn't seem great.

Is the memory context this is called in guaranteed to be of a proper
duration?  Including being reset in case there's an error at some point
before the memory is freed?


> +/*
> + *        WaitXact
> + *
> + *        Wait for xid completition if have xid. Otherwise try to find xid among
> + *        two-phase procarray entries.
> + */
> +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait)
> +{
> +    LockAcquireResult    lar;
> +    LOCKTAG                tag;
> +    XidListEntry        xidlist;
> +    XidListEntry       *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */
> +    bool                result;
> +
> +    if (TransactionIdIsValid(xid))
> +    {
> +        /* We know exact xid - no need to search in 2PC state */
> +        xidlist.xid = xid;
> +        xidlist.next = NULL;
> +    }
> +    else
> +    {
> +        /* You cant have vxid among 2PCs if you have no 2PCs */
> +        if (max_prepared_xacts == 0)
> +            return true;
> +
> +        /*
> +         * We must search for vxids in 2pc state
> +         * XXX: O(N*N) complexity where N is number of prepared xacts
> +         */
> +        xidlist = TwoPhaseGetXidByVXid(vxid);
> +        /* Return if transaction is gone entirely */
> +        if (!TransactionIdIsValid(xidlist.xid))
> +            return true;
> +        xidlist_ptr = xidlist.next;
> +    }

Perhaps I missed this - but won't we constantly enter this path for
non-2pc transactions? E.g.

> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>       */
>      proc = BackendIdGetProc(vxid.backendId);
>      if (proc == NULL)
> -        return true;
> +        return WaitXact(vxid, InvalidTransactionId, wait);
>  
>      /*
>       * We must acquire this lock before checking the backendId and lxid
> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>          || proc->fpLocalTransactionId != vxid.localTransactionId)
>      {
>          LWLockRelease(&proc->fpInfoLock);
> -        return true;
> +        return WaitXact(vxid, InvalidTransactionId, wait);
>      }

It seems like it's going to add a substantial amount of work even when
no 2PC xacts are involved?


> diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
> index e27e1a8fe8..a5f28d3a80 100644
> --- a/src/include/access/twophase.h
> +++ b/src/include/access/twophase.h
> @@ -25,6 +25,17 @@
>   */
>  typedef struct GlobalTransactionData *GlobalTransaction;
>  
> +/* 
> + * XidListEntry is expected to be used as list very rarely. Under normal
> + * circumstances TwoPhaseGetXidByVXid() returns only one xid.
> + * But under certain conditions can return many xids or nothing.
> + */
> +typedef struct XidListEntry
> +{
> +    TransactionId xid;
> +    struct XidListEntry* next;
> +} XidListEntry;

I don't think we should invent additional ad-hoc list types.

Greetings,

Andres Freund

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

16 августа 2021 г., 05:13:17

On Sun, Aug 15, 2021 at 04:09:37PM +0500, Andrey Borodin wrote:
> > 15 авг. 2021 г., в 13:45, Noah Misch <noah@leadboat.com> написал(а):
> >> Do you see failures with that loop?  If so, can you diagnose them?

> I do not observe failure on my laptop, though reproduced it on a linux server.
> I've fixed one bug in TwoPhaseGetXidByVXid(). Also rebased on actual master.

> > Just one 1PC failure in six hours of 1PC test runtime, though.
> I've attached a patch that reproduces the problem in 30sec on my server.

Having studied the broader inval situation, I found just one additional gap
that seemed potentially relevant.  It didn't stop the failures under current
tests, however.  The attached patch replaces my last patch on this thread, so
it should replace
v12-0002-PoC-fix-for-race-in-RelationBuildDesc-and-relcac.patch in your
series.  (Like its predecessor, it's an unfinished proof-of-concept.)

With v12, on my machine, the same loop took 2000s to get three failures, both
of which were in the 1PC tests.  I ran out of time to study the failure
mechanism.  Would you diagnose what happens when it fails on your server?
Also see the larger review from Andres.

Вложения

inval-build-race-v0.1.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

22 августа 2021 г., 12:30:47

Andres, Noah, thanks for the review.

> 16 авг. 2021 г., в 10:13, Noah Misch <noah@leadboat.com> написал(а):
>
> With v12, on my machine, the same loop took 2000s to get three failures, both
> of which were in the 1PC tests.  I ran out of time to study the failure
> mechanism.  Would you diagnose what happens when it fails on your server?
I'm still in progress of investigation. With your fix for RelationBuildDesc() invals my reproduction is also quite slow
-I get error within 10 minutes or so. 

I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid
(takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? 
Is it possible that xid is in PGPROC before backend acquires its own lock on xid?


> Also see the larger review from Andres.

Sure, I will address Andres's notes as long as patches will finally work without errors. I agree with the idea of not
inventingdynamic list and other pointed problems. 

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

22 августа 2021 г., 17:42:01


> 22 авг. 2021 г., в 17:30, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
> I've observed few times that same Xid was WaitXact()'es twice. Is it possible that taking ShareLock on running xid
(takenfrom PGPGROC of vxid) is not a good way to wait for transaction completition? 
> Is it possible that xid is in PGPROC before backend acquires its own lock on xid?

Oh, that's it. We first publish xid in PGPROC and only then take a lock in lock manager on it.
Ok, I know how to fix this.


Currently when testing combination of all fixes I observe things like
'error running SQL: 'psql:<stdin>:1: ERROR:  prepared transaction with identifier "a" is busy''
Looks like kind of race condition in tests.

Thanks!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

23 августа 2021 г., 17:38:00


> 22 авг. 2021 г., в 22:42, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):
>
>
> Currently when testing combination of all fixes I observe things like
> 'error running SQL: 'psql:<stdin>:1: ERROR:  prepared transaction with identifier "a" is busy''
> Looks like kind of race condition in tests.

There was a race condition in deterministic 2PC test. Fixed with synchronisation points.
PFA patches that work on my machines.
I'm going to start fixing review notes if it will not break until tomorrow.

BTW are subtransaction anything special wrt CIC? is it worth to sprinkle some SAVEPOINTs here and there, just to be
sure?

Thanks!

Best regards, Andrey Borodin.

Hi!

> 15 авг. 2021 г., в 18:45, Andres Freund <andres@anarazel.de> написал(а):
>
> Hi,
>
> On 2021-08-15 16:09:37 +0500, Andrey Borodin wrote:
>> From 929736512ebf8eb9ac6ddaaf49b9e6148d72cfbb Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Fri, 30 Jul 2021 14:40:16 +0500
>> Subject: [PATCH v12 2/6] PoC fix for race in RelationBuildDesc() and relcache
>> invalidation
>>
>> ---
>> src/backend/utils/cache/relcache.c | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
>> index 13d9994af3..7eec7b1f41 100644
>> --- a/src/backend/utils/cache/relcache.c
>> +++ b/src/backend/utils/cache/relcache.c
>> @@ -997,9 +997,16 @@ equalRSDesc(RowSecurityDesc *rsdesc1, RowSecurityDesc *rsdesc2)
>>  *        (suggesting we are trying to access a just-deleted relation).
>>  *        Any other error is reported via elog.
>>  */
>> +typedef struct InProgressRels {
>> +    Oid relid;
>> +    bool invalidated;
>> +} InProgressRels;
>> +static InProgressRels inProgress[100];
>> +
>> static Relation
>> RelationBuildDesc(Oid targetRelId, bool insertIt)
>> {
>> +    int in_progress_offset;
>>     Relation    relation;
>>     Oid            relid;
>>     HeapTuple    pg_class_tuple;
>> @@ -1033,6 +1040,14 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>>     }
>> #endif
>>
>> +    for (in_progress_offset = 0;
>> +         OidIsValid(inProgress[in_progress_offset].relid);
>> +         in_progress_offset++)
>> +        ;
>> +    inProgress[in_progress_offset].relid = targetRelId;
>> +retry:
>> +    inProgress[in_progress_offset].invalidated = false;
>> +
>>     /*
>>      * find the tuple in pg_class corresponding to the given relation id
>>      */
>> @@ -1213,6 +1228,12 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
>>      */
>>     heap_freetuple(pg_class_tuple);
>>
>> +    if (inProgress[in_progress_offset].invalidated)
>> +        goto retry;                /* TODO free old one */
>> +    /* inProgress is in fact the stack, we can safely remove it's top */
>> +    inProgress[in_progress_offset].relid = InvalidOid;
>> +    Assert(inProgress[in_progress_offset + 1].relid == InvalidOid);
>> +
>>     /*
>>      * Insert newly created relation into relcache hash table, if requested.
>>      *
>> @@ -2805,6 +2826,13 @@ RelationCacheInvalidateEntry(Oid relationId)
>>         relcacheInvalsReceived++;
>>         RelationFlushRelation(relation);
>>     }
>> +    else
>> +    {
>> +        int i;
>> +        for (i = 0; OidIsValid(inProgress[i].relid); i++)
>> +            if (inProgress[i].relid == relationId)
>> +                inProgress[i].invalidated = true;
>> +    }
>> }
>
> Desperately needs comments. Without a commit message and without
> comments it's hard to review this without re-reading the entire thread -
> which approximately nobody will do.
I've added some comments. But it seems we should use dynamic allocations instead of 100-based array.

>
>
>> From 7e47dae2828d88ddb2161fda0c3b08a158c6cf37 Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Sat, 7 Aug 2021 20:27:14 +0500
>> Subject: [PATCH v12 5/6] PoC fix clear xid
>>
>> ---
>> src/backend/access/transam/xact.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
>> index 387f80419a..9b19d939eb 100644
>> --- a/src/backend/access/transam/xact.c
>> +++ b/src/backend/access/transam/xact.c
>> @@ -2500,7 +2500,6 @@ PrepareTransaction(void)
>>      * done *after* the prepared transaction has been marked valid, else
>>      * someone may think it is unlocked and recyclable.
>>      */
>> -    ProcArrayClearTransaction(MyProc);
>>
>>     /*
>>      * In normal commit-processing, this is all non-critical post-transaction
>> @@ -2535,6 +2534,8 @@ PrepareTransaction(void)
>>     PostPrepare_MultiXact(xid);
>>
>>     PostPrepare_Locks(xid);
>> +
>> +    ProcArrayClearTransaction(MyProc);
>>     PostPrepare_PredicateLocks(xid);
>
> The comment above ProcArrayClearTransaction would need to be moved and
> updated...
Fixed.

>
>> From 6db9cafd146db1a645bb6885157b0e1f032765e0 Mon Sep 17 00:00:00 2001
>> From: Andrey Borodin <amborodin@acm.org>
>> Date: Mon, 19 Jul 2021 11:50:02 +0500
>> Subject: [PATCH v12 4/6] Fix CREATE INDEX CONCURRENTLY in precence of vxids
>> converted to 2pc
> ...
>
>> +/*
>> + * TwoPhaseGetXidByVXid
>> + *        Try to lookup for vxid among prepared xacts
>> + */
>> +XidListEntry
>> +TwoPhaseGetXidByVXid(VirtualTransactionId vxid)
>> +{
>> +    int                i;
>> +    XidListEntry    result;
>> +    result.next = NULL;
>> +    result.xid = InvalidTransactionId;
>> +
>> +    LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
>> +
>> +    for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
>> +    {
>> +        GlobalTransaction    gxact = TwoPhaseState->prepXacts[i];
>> +        PGPROC               *proc = &ProcGlobal->allProcs[gxact->pgprocno];
>> +        VirtualTransactionId proc_vxid;
>> +        GET_VXID_FROM_PGPROC(proc_vxid, *proc);
>> +
>> +        if (VirtualTransactionIdEquals(vxid, proc_vxid))
>> +        {
>> +            if (result.xid != InvalidTransactionId)
>> +            {
>> +                /* Already has candidate - need to alloc some space */
>> +                XidListEntry *copy = palloc(sizeof(XidListEntry));
>> +                copy->next = result.next;
>> +                copy->xid = result.xid;
>> +                result.next = copy;
>> +            }
>> +            result.xid = gxact->xid;
>> +        }
>> +    }
>> +
>> +    LWLockRelease(TwoPhaseStateLock);
>> +
>> +    return result;
>> +}
>
> Dynamic memory allocations while holding a fairly central lock - one
> which is now going to be more contended - doesn't seem great.
>
> Is the memory context this is called in guaranteed to be of a proper
> duration?  Including being reset in case there's an error at some point
> before the memory is freed?
I've removed custom list and all memory allocations. If there are multiple 2PCs with same vxid - we just wait for one,
thenretry. 

>> +/*
>> + *        WaitXact
>> + *
>> + *        Wait for xid completition if have xid. Otherwise try to find xid among
>> + *        two-phase procarray entries.
>> + */
>> +static bool WaitXact(VirtualTransactionId vxid, TransactionId xid, bool wait)
>> +{
>> +    LockAcquireResult    lar;
>> +    LOCKTAG                tag;
>> +    XidListEntry        xidlist;
>> +    XidListEntry       *xidlist_ptr = NULL; /* pointer to TwoPhaseGetXidByVXid()s pallocs */
>> +    bool                result;
>> +
>> +    if (TransactionIdIsValid(xid))
>> +    {
>> +        /* We know exact xid - no need to search in 2PC state */
>> +        xidlist.xid = xid;
>> +        xidlist.next = NULL;
>> +    }
>> +    else
>> +    {
>> +        /* You cant have vxid among 2PCs if you have no 2PCs */
>> +        if (max_prepared_xacts == 0)
>> +            return true;
>> +
>> +        /*
>> +         * We must search for vxids in 2pc state
>> +         * XXX: O(N*N) complexity where N is number of prepared xacts
>> +         */
>> +        xidlist = TwoPhaseGetXidByVXid(vxid);
>> +        /* Return if transaction is gone entirely */
>> +        if (!TransactionIdIsValid(xidlist.xid))
>> +            return true;
>> +        xidlist_ptr = xidlist.next;
>> +    }
>
> Perhaps I missed this - but won't we constantly enter this path for
> non-2pc transactions? E.g.
I've restored heuristics that if it's not 2PC - we just exit from WaitPreparedXact().


>
>> @@ -4573,7 +4649,7 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>>      */
>>     proc = BackendIdGetProc(vxid.backendId);
>>     if (proc == NULL)
>> -        return true;
>> +        return WaitXact(vxid, InvalidTransactionId, wait);
>>
>>     /*
>>      * We must acquire this lock before checking the backendId and lxid
>> @@ -4587,9 +4663,12 @@ VirtualXactLock(VirtualTransactionId vxid, bool wait)
>>         || proc->fpLocalTransactionId != vxid.localTransactionId)
>>     {
>>         LWLockRelease(&proc->fpInfoLock);
>> -        return true;
>> +        return WaitXact(vxid, InvalidTransactionId, wait);
>>     }
>
> It seems like it's going to add a substantial amount of work even when
> no 2PC xacts are involved?
Only if 2PCs are enabled.

>> diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
>> index e27e1a8fe8..a5f28d3a80 100644
>> --- a/src/include/access/twophase.h
>> +++ b/src/include/access/twophase.h
>> @@ -25,6 +25,17 @@
>>  */
>> typedef struct GlobalTransactionData *GlobalTransaction;
>>
>> +/*
>> + * XidListEntry is expected to be used as list very rarely. Under normal
>> + * circumstances TwoPhaseGetXidByVXid() returns only one xid.
>> + * But under certain conditions can return many xids or nothing.
>> + */
>> +typedef struct XidListEntry
>> +{
>> +    TransactionId xid;
>> +    struct XidListEntry* next;
>> +} XidListEntry;
>
> I don't think we should invent additional ad-hoc list types.
Fixed, removed list here entirely.

I'm attaching new version. It works fine on my machines.

Thanks!

Best regards, Andrey Borodin.

On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>      proc->pgprocno = gxact->pgprocno;
>      SHMQueueElemInit(&(proc->links));
>      proc->waitStatus = PROC_WAIT_STATUS_OK;
> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
> -    proc->lxid = (LocalTransactionId) xid;
> +    /* We set up the gxact's VXID as real for CIC purposes */
> +    proc->lxid = MyProc->lxid;

This breaks the case where the server restarted after PREPARE TRANSACTION.
MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
false.  I'm attaching a test case addition.  Can you repair this?

>      proc->xid = xid;
>      Assert(proc->xmin == InvalidTransactionId);
>      proc->delayChkpt = false;
>      proc->statusFlags = 0;
>      proc->pid = 0;
> -    proc->backendId = InvalidBackendId;
> +    /* May be backendId of startup process */
> +    proc->backendId = MyBackendId;

Incidentally, MyBackendId of startup process depends on other facts.  When
using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
Otherwise, including clean startup of a non-standby node, MyBackendId is
InvalidBackendId.  This may be harmless.  I didn't know about it.

On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> > > 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> > >>> It seems like it's going to add a substantial amount of work even when
> > >>> no 2PC xacts are involved?
> > >> Only if 2PCs are enabled.
> > > 
> > > I don't think that's good enough. Plenty of systems have 2PC enabled but very
> > > few if any transactions end up as 2PC ones.
> 
> > Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> > 
> > Does it worth the complexity?
> 
> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
> then I'm content with it.  I'd certainly expect some performance benefit.
> Andres, what do you think?

A few more benefits (beyond lock contention) come to mind:

- Looking at the three VirtualXactLock() callers, waiting for final
  disposition of prepared transactions is necessary for
  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
  prepared transaction won't do further database reads.  Waiting on the
  prepared transaction there could give CIC an arbitrarily-long, needless
  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
  prepared transaction, because prepared transactions hold no locks during
  recovery.  (If a prepared transaction originally acquired
  AccessExclusiveLock, the startup process holds that lock on its behalf.)
  Coordinating the XID search at a higher layer would let us change
  WaitForLockersMultiple() without changing the others.

- v13 WaitPreparedXact() experiences starvation when a steady stream of
  prepared transactions have the same VXID.  Since VXID reuse entails
  reconnecting, starvation will be unnoticeable in systems that follow best
  practices around connection lifespan.  The 2021-08-23 patch version didn't
  have that hazard.

None of those benefits clearly justify the complexity, but they're relevant to
the decision.

Вложения

test-2pc-wait-after-recovery-v0.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

25 сентября 2021 г., 17:25:05


> 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):
>
> On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
>> --- a/src/backend/access/transam/twophase.c
>> +++ b/src/backend/access/transam/twophase.c
>> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>>     proc->pgprocno = gxact->pgprocno;
>>     SHMQueueElemInit(&(proc->links));
>>     proc->waitStatus = PROC_WAIT_STATUS_OK;
>> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
>> -    proc->lxid = (LocalTransactionId) xid;
>> +    /* We set up the gxact's VXID as real for CIC purposes */
>> +    proc->lxid = MyProc->lxid;
>
> This breaks the case where the server restarted after PREPARE TRANSACTION.
> MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
> false.  I'm attaching a test case addition.  Can you repair this?
Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's PGPROC,
whilevxid is not valid. 
I see two ways to solve this:
1. Always set valid vxid, but fake 'vxid from xid' for gxact.
2. Teach GetLockConflicts() to use xid if vxid is invalid.
Both ways lead to identical GetLockConflicts() output.
PFA implementation of approach 1.

>>     proc->xid = xid;
>>     Assert(proc->xmin == InvalidTransactionId);
>>     proc->delayChkpt = false;
>>     proc->statusFlags = 0;
>>     proc->pid = 0;
>> -    proc->backendId = InvalidBackendId;
>> +    /* May be backendId of startup process */
>> +    proc->backendId = MyBackendId;
>
> Incidentally, MyBackendId of startup process depends on other facts.  When
> using hot standby, InitRecoveryTransactionEnvironment() sets MyBackendId=1.
> Otherwise, including clean startup of a non-standby node, MyBackendId is
> InvalidBackendId.  This may be harmless.  I didn't know about it.
I think we should avoid using backendId during startup. The backend itself has nothing to do with the transaction.

> On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
>> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
>>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
>>>>>> It seems like it's going to add a substantial amount of work even when
>>>>>> no 2PC xacts are involved?
>>>>> Only if 2PCs are enabled.
>>>>
>>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very
>>>> few if any transactions end up as 2PC ones.
>>
>>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED). 
>>>
>>> Does it worth the complexity?
>>
>> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
>> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
>> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
>> then I'm content with it.  I'd certainly expect some performance benefit.
>> Andres, what do you think?
>
> A few more benefits (beyond lock contention) come to mind:
>
> - Looking at the three VirtualXactLock() callers, waiting for final
>  disposition of prepared transactions is necessary for
>  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
>  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
>  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
>  prepared transaction won't do further database reads.  Waiting on the
>  prepared transaction there could give CIC an arbitrarily-long, needless
>  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
>  prepared transaction, because prepared transactions hold no locks during
>  recovery.  (If a prepared transaction originally acquired
>  AccessExclusiveLock, the startup process holds that lock on its behalf.)
>  Coordinating the XID search at a higher layer would let us change
>  WaitForLockersMultiple() without changing the others.
BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to
VirtualXactLock()if we want to wait on 2PC or not. Does it make sense? 
>
>
> - v13 WaitPreparedXact() experiences starvation when a steady stream of
>  prepared transactions have the same VXID.  Since VXID reuse entails
>  reconnecting, starvation will be unnoticeable in systems that follow best
>  practices around connection lifespan.  The 2021-08-23 patch version didn't
>  have that hazard.
I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a stream
ofoverlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream to
becameone-2PC-at-a-time for a moment. 

Thanks!

Best regards, Andrey Borodin.

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

25 сентября 2021 г., 20:10:12

On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote:
> > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):
> > On Mon, Aug 23, 2021 at 10:38:00PM +0500, Andrey Borodin wrote:
> >> --- a/src/backend/access/transam/twophase.c
> >> +++ b/src/backend/access/transam/twophase.c
> >> @@ -459,14 +459,15 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
> >>     proc->pgprocno = gxact->pgprocno;
> >>     SHMQueueElemInit(&(proc->links));
> >>     proc->waitStatus = PROC_WAIT_STATUS_OK;
> >> -    /* We set up the gxact's VXID as InvalidBackendId/XID */
> >> -    proc->lxid = (LocalTransactionId) xid;
> >> +    /* We set up the gxact's VXID as real for CIC purposes */
> >> +    proc->lxid = MyProc->lxid;
> > 
> > This breaks the case where the server restarted after PREPARE TRANSACTION.
> > MyProc->lxid is 0 in the startup process, and LocalTransactionIdIsValid(0) is
> > false.  I'm attaching a test case addition.  Can you repair this?
> Yup. Indeed, that's a bug. Root cause it that GetLockConflicts() does not try to extract real xid from gxact's
PGPROC,while vxid is not valid.
 
> I see two ways to solve this:
> 1. Always set valid vxid, but fake 'vxid from xid' for gxact.
> 2. Teach GetLockConflicts() to use xid if vxid is invalid.
> Both ways lead to identical GetLockConflicts() output.
> PFA implementation of approach 1.

That's reasonable.  I'll queue a task to review the rest of the patch.

> > On Tue, Sep 07, 2021 at 11:45:15PM -0700, Noah Misch wrote:
> >> On Sun, Aug 29, 2021 at 11:38:03PM +0500, Andrey Borodin wrote:
> >>>> 29 авг. 2021 г., в 23:09, Andres Freund <andres@anarazel.de> написал(а):
> >>>>>> It seems like it's going to add a substantial amount of work even when
> >>>>>> no 2PC xacts are involved?
> >>>>> Only if 2PCs are enabled.
> >>>> 
> >>>> I don't think that's good enough. Plenty of systems have 2PC enabled but very
> >>>> few if any transactions end up as 2PC ones.
> >> 
> >>> Best optimisation I can imagine here is to gather all vxids with unknown xids and search for them in one call to
TwoPhaseGetXidByVXid()with one LWLockAcquire(TwoPhaseStateLock, LW_SHARED).
 
> >>> 
> >>> Does it worth the complexity?
> >> 
> >> https://www.postgresql.org/search/?m=1&q=TwoPhaseStateLock&l=&d=-1&s=r
> >> suggests this is the first postgresql.org discussion of TwoPhaseStateLock as a
> >> bottleneck.  Nonetheless, if Andres Freund finds it's worth the complexity,
> >> then I'm content with it.  I'd certainly expect some performance benefit.
> >> Andres, what do you think?
> > 
> > A few more benefits (beyond lock contention) come to mind:
> > 
> > - Looking at the three VirtualXactLock() callers, waiting for final
> >  disposition of prepared transactions is necessary for
> >  WaitForLockersMultiple(), disadvantageous for WaitForOlderSnapshots(), and
> >  dead code for ResolveRecoveryConflictWithVirtualXIDs().  In
> >  WaitForOlderSnapshots(), PREPARE is as good as COMMIT/ABORT, because a
> >  prepared transaction won't do further database reads.  Waiting on the
> >  prepared transaction there could give CIC an arbitrarily-long, needless
> >  delay.  ResolveRecoveryConflictWithVirtualXIDs() will never wait on a
> >  prepared transaction, because prepared transactions hold no locks during
> >  recovery.  (If a prepared transaction originally acquired
> >  AccessExclusiveLock, the startup process holds that lock on its behalf.)
> >  Coordinating the XID search at a higher layer would let us change
> >  WaitForLockersMultiple() without changing the others.
> BTW WaitForOlderSnapshots() is used in ATExecDetachPartitionFinalize(). Probably, we could indicate to
VirtualXactLock()if we want to wait on 2PC or not. Does it make sense?
 

For both CIC and ATExecDetachPartitionFinalize(), one needs unlucky timing for
the operation to needlessly wait on a prepared xact.  Specifically, the
needless wait arises if a PREPARE happens while WaitForOlderSnapshots() is
running, after its GetCurrentVirtualXIDs() and before its VirtualXactLock().
I would not choose to "indicate to VirtualXactLock() if we want to wait on 2PC
or not", but code like that wouldn't be too bad.  I probably wouldn't remove
code like that if you chosen to write it.

The alternative I have in mind would work like the following pseudocode.  I'm
leaning toward thinking it's not worth doing, since none of the three benefits
are known to be important.  But maybe it is worth doing.

struct LockConflictData
{
    /* VXIDs seen to have XIDs */
    List *vxid_of_xid;
    /* VXIDs without known XIDs */
    List *vxid_no_xid;
    /*
     * XIDs.  Has one entry per entry in vxid_of_xid, and it may have up to
     * max_prepared_transactions additional entries.
     */
    List *xid;
};
void
WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
{
    struct LockConflictData holders;
    List *latest_xid = NIL;
    List *need_mapping = NIL;
    ListCell   *lc;
    int            total = 0;
    int            done = 0;

    /* Collect the transactions we need to wait on */
    foreach(lc, locktags)
    {
        LOCKTAG    *locktag = lfirst(lc);
        int            count;

        GetLockConflicts(&holders, locktag, lockmode,
                         progress ? &count : NULL);
    }

    /* wait on VXIDs known to have XIDs, and wait on known XIDs */
    foreach(lc, holders.vxid_of_xid)
        VirtualXactLock(lfirst_int(lc), true, NULL);
    foreach(lc, holders.xid)
        XactLockTableWait(lfirst_int(lc), other_params);
    /* wait on remaining VXIDs, possibly discovering more XIDs */
    foreach(lc, holders.vxid_no_xid)
    {
        VirtualTransactionId *v = lfirst(lc);
        TransactionId xid = InvalidTransactionId;
        /*
         * Under this design, VirtualXactLock() would know nothing about 2PC,
         * but it would gain the ability to return proc->xid of the waited
         * proc.  Under this design, VirtualTransactionId is always a local
         * transaction, like it was before commit 8a54e12.
         */
        VirtualXactLock(*v, true, &xid);
        if (TransactionIdIsValid(xid))
            latest_xid = lappend_int(late_xid, xid);
        else
            need_mapping = lappend_int(need_mapping, v);
    }
    /* wait on XIDs just discovered */
    foreach(lc, latest_xid)
        XactLockTableWait(lfirst_int(lc), other_params);
    /*
     * If we never saw an XID associated with a particular VXID, check whether
     * the VXID became a prepared xact.
     */
    latest_xid = TwoPhaseGetXidByVXid(need_mapping);
    foreach(lc, latest_xid)
        XactLockTableWait(lfirst_int(lc), other_params);
}

> > - v13 WaitPreparedXact() experiences starvation when a steady stream of
> >  prepared transactions have the same VXID.  Since VXID reuse entails
> >  reconnecting, starvation will be unnoticeable in systems that follow best
> >  practices around connection lifespan.  The 2021-08-23 patch version didn't
> >  have that hazard.
> I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a
streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream
tobecame one-2PC-at-a-time for a moment.
 

You're probably right about that.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

17 октября 2021 г., 15:12:05

On Sat, Sep 25, 2021 at 01:10:12PM -0700, Noah Misch wrote:
> On Sat, Sep 25, 2021 at 10:25:05PM +0500, Andrey Borodin wrote:
> > > 20 сент. 2021 г., в 09:41, Noah Misch <noah@leadboat.com> написал(а):

> I'll queue a task to review the rest of the patch.

I think the attached version is ready for commit.  Notable differences
vs. v14:

- Made TwoPhaseGetXidByVXid() stop leaking TwoPhaseStateLock when it found a
  second match.

- Instead of moving the ProcArrayClearTransaction() call within
  PrepareTransaction(), move the PostPrepare_Locks() call.  I didn't find any
  bug in the other way, but it's a good principle to maximize similarity with
  CommitTransaction().  PostPrepare_Locks() has no counterpart in
  CommitTransaction(), so that principle is indifferent to moving it.

- inval-build-race-v0.1.patch was incompatible with debug_discard_caches.  The
  being-built relation would always get invalidated, making
  RelationBuildDesc() an infinite loop.  I've fixed this by making
  RelationCacheInvalidate() invalidate in-progress rels only when called for
  sinval overflow, not when called for debug_discard_caches.  This adds some
  function arguments that only matter in assert builds.  That's not great, but
  InvalidateSystemCaches() is expensive anyway.  I considered instead adding
  functions HoldDebugInval() and ResumeDebugInval(), which RelationBuildDesc()
  would use to suppress debug_discard_caches during any iteration after the
  first.  That didn't seem better.

- Discard the in-progress array after an ERROR during RelationBuildDesc().

- Made the relcache.c changes repalloc the list of in-progress rels as needed.

- Changed the background_pgbench args from ($dbname, $stdin, \$stdout, $timer,
  $files, $opts) to ($opts, $files, \$stdout, $timer).  $dbname was unused.
  pgbench doesn't make interesting use of its stdin, so I dropped that arg
  until we have a use case.  $opts and $files seem akin to the $dbname arg of
  background_psql, so I moved them to the front.  I'm not sure that last
  change was an improvement.

- Made 001_pgbench_with_server.pl use PostgresNode::pgbench(), rather than
  duplicate code.  Factored out a subroutine of PostgresNode::pgbench() and
  PostgresNode::background_pgbench().

- Lots of comment and naming changes.

One thing not done here is to change the tests to use CREATE INDEX
CONCURRENTLY instead of REINDEX CONCURRENTLY, so they're back-patchable to v11
and earlier.  I may do that before pushing, or I may just omit the tests from
older branches.

> > > - v13 WaitPreparedXact() experiences starvation when a steady stream of
> > >  prepared transactions have the same VXID.  Since VXID reuse entails
> > >  reconnecting, starvation will be unnoticeable in systems that follow best
> > >  practices around connection lifespan.  The 2021-08-23 patch version didn't
> > >  have that hazard.
> > I think the probability of such a stream is abysmal. You not only need a stream of 2PC with the same vxid, but a
streamof overlapping 2PC with the same vxid. And the most critical thing that can happen - CIC waiting for the stream
tobecame one-2PC-at-a-time for a moment.
 
> 
> You're probably right about that.

I didn't know of the "stateP->nextLXID = nextLocalTransactionId;" in
CleanupInvalidationState(), which indeed makes this all but impossible.

On Sat, Aug 07, 2021 at 03:19:57PM -0700, Noah Misch wrote:
> On Sun, Aug 08, 2021 at 12:00:55AM +0500, Andrey Borodin wrote:
> > Changes:
> > 1. Added assert in step 2 (fix for missed invalidation message). I wonder how deep possibly could be
RelationBuildDesc()inside RelationBuildDesc() inside RelationBuildDesc() ... ? If the depth is unlimited we, probably,
needa better data structure.
 
> 
> I don't know either, hence that quick data structure to delay the question.
> debug_discard_caches=3 may help answer the question.  RelationBuildDesc()
> reads pg_constraint, which is !rd_isnailed.  Hence, I expect one can at least
> get RelationBuildDesc("pg_constraint") inside RelationBuildDesc("user_table").

debug_discard_caches=5 yields a depth of eight when opening a relation having
a CHECK constraint:

my_rel_having_check_constraint
pg_constraint_conrelid_contypid_conname_index
pg_index
pg_constraint
pg_constraint
pg_constraint
pg_constraint
pg_constraint

While debug_discard_caches doesn't permit higher values, I think one could
reach depths greater than eight by, for example, having a number of sessions
invalidate pg_constraint as often as possible.  Hence, I'm glad the code no
longer relies on a depth limit.


> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> We can get around that though, by using pg_try_advisory_lock and not
> proceeding if it fails.  The attached POC does this for the 002 test;
> it looks like the same thing could be done to 003.

That's a neat idea. PFA patch with copy of this changes to 003.
Also I've added a script to stress not only CREATE INDEX CONCURRENTLY, but also REINDEX CONCURRENTLY.
Because it was easy and checks slightly more code paths.
I do not observe failures on Dragonfly 6.0 with the patch, but I didn't run for a long yet time.

Best regards, Andrey Borodin.

Вложения

v2-0001-CIC-test-without-background-pgbench.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

28 октября 2021 г., 14:04:59

On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > Pushed.
> 
> cm@enterprisedb.com, could you post a stack trace from buildfarm member
> gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> those seen today:
> 
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
>   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> 
> Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> broke something, but I've run out of ideas, and my hpux boxes died.

cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
concerned that the 2021-11-11 releases will contain a regression if the
gharial failures remain undiagnosed.  I don't know how to diagnose them
without your help.  (I currently can't justify reverting the suspect bug fix
on the basis of two master-branch-only failures in one buildfarm member.)

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Tom Lane

Дата:

28 октября 2021 г., 15:47:01

Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> 26 окт. 2021 г., в 22:41, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>> We can get around that though, by using pg_try_advisory_lock and not
>> proceeding if it fails.  The attached POC does this for the 002 test;
>> it looks like the same thing could be done to 003.

> That's a neat idea. PFA patch with copy of this changes to 003.

Pushed.  It'll be interesting to see if conchuela's behavior changes.

            regards, tom lane

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Thomas Munro

Дата:

28 октября 2021 г., 20:59:14

On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote:
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.

I recently managed to scrounge up an account in another generous
project's operating system graveyard, and could possibly try to repro
that or put you in touch.  On first attempt just now I couldn't build
because of weird problems with ifaddr.c's need for structs in
<net/if.h> that seem to be kernel/internal-only that I unfortunately
won't have time to debug until next week (I've successfully built here
before so I must be doing something stupid).

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

29 октября 2021 г., 02:49:44

On Fri, Oct 29, 2021 at 09:59:14AM +1300, Thomas Munro wrote:
> On Fri, Oct 29, 2021 at 3:05 AM Noah Misch <noah@leadboat.com> wrote:
> > > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > > broke something, but I've run out of ideas, and my hpux boxes died.
> 
> I recently managed to scrounge up an account in another generous
> project's operating system graveyard, and could possibly try to repro
> that or put you in touch.  On first attempt just now I couldn't build
> because of weird problems with ifaddr.c's need for structs in
> <net/if.h> that seem to be kernel/internal-only that I unfortunately
> won't have time to debug until next week (I've successfully built here
> before so I must be doing something stupid).

Thanks.  Either would help.  If you can put me in touch this week, let's try
that since there's little time left before release.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Noah Misch

Дата:

29 октября 2021 г., 02:56:00

On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
> > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
> > These failures started on 2021-10-09, the day conchuela updated from DragonFly
> > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
> > Since the theorized kernel bug seems not to affect
> > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
> > from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
> > the newest failure sites is that they don't write anything to the child stdin.

> >  If not, does passing the script via stdin, like "pgbench -f-
> > <script.sql", work around the problem?
> 
> I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.

Did that work?  Commit 7f580aa should make this unnecessary for v12+
contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
needs a fix in all branches.  A backup plan is just to skip affected tests on
dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
optimistic that a better workaround will present itself.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Tom Lane

Дата:

29 октября 2021 г., 03:19:56

Noah Misch <noah@leadboat.com> writes:
> Did that work?  Commit 7f580aa should make this unnecessary for v12+
> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
> needs a fix in all branches.  A backup plan is just to skip affected tests on
> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
> optimistic that a better workaround will present itself.

It indeed is looking like 7f580aa made the problem go away on conchuela,
but do we understand why?  The only theory I can think of is "kernel bug",
but while that's plausible for prairiedog it seems hard to credit for a
late-model BSD kernel.

            regards, tom lane

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

29 октября 2021 г., 03:42:31

On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It indeed is looking like 7f580aa made the problem go away on conchuela,
> but do we understand why?  The only theory I can think of is "kernel bug",
> but while that's plausible for prairiedog it seems hard to credit for a
> late-model BSD kernel.

I have yet to even log into a DBSD system (my attempt to install the
6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
do intend to get it working at some point.  But I can offer a poorly
researched wildly speculative hypothesis: DBSD forked from FBSD in
2003.  macOS 10.3 took FBSD's kqueue code in... 2003.  So maybe a bug
was fixed later that they both inherited?  Or perhaps that makes no
sense, I dunno.  It'd be nice to try to write a repro and send them a
report, if we can.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Andrey Borodin

Дата:

29 октября 2021 г., 06:39:34


> 29 окт. 2021 г., в 08:42, Thomas Munro <thomas.munro@gmail.com> написал(а):
>
> On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It indeed is looking like 7f580aa made the problem go away on conchuela,
>> but do we understand why?  The only theory I can think of is "kernel bug",
>> but while that's plausible for prairiedog it seems hard to credit for a
>> late-model BSD kernel.
>
> I have yet to even log into a DBSD system (my attempt to install the
> 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
> do intend to get it working at some point.

Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use.
root password is P@ssw0rd

To reproduce
cd postgres/contrib/amcheck
su x4m
/tesh.sh

Best regards, Andrey Borodin.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Noah Misch

Дата:

29 октября 2021 г., 11:57:25

On Fri, Oct 29, 2021 at 04:42:31PM +1300, Thomas Munro wrote:
> On Fri, Oct 29, 2021 at 4:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It indeed is looking like 7f580aa made the problem go away on conchuela,
> > but do we understand why?

I don't.

> > The only theory I can think of is "kernel bug",
> > but while that's plausible for prairiedog it seems hard to credit for a
> > late-model BSD kernel.

DragonFly BSD is a niche OS, so I'm more willing than usual to conclude that.
Could be a bug in IPC::Run or in the port of Perl to DragonFly, but those feel
less likely than the kernel.  The upgrade from DragonFly v4.4.3 to DragonFly
v6.0.0, which introduced this form of PostgreSQL test breakage, also updated
Perl from v5.20.3 to 5.32.1.

> I have yet to even log into a DBSD system (my attempt to install the
> 6.0.1 ISO on bhyve failed for lack of a driver, or something), but I
> do intend to get it working at some point.  But I can offer a poorly
> researched wildly speculative hypothesis: DBSD forked from FBSD in
> 2003.  macOS 10.3 took FBSD's kqueue code in... 2003.  So maybe a bug
> was fixed later that they both inherited?  Or perhaps that makes no
> sense, I dunno.  It'd be nice to try to write a repro and send them a
> report, if we can.

The conchuela bug and the prairiedog bug both present with a timeout in
IPC::Run::finish, but the similarity ends there.  On prairiedog, the
postmaster was stuck when it should have been reading a query from pgbench.
On conchuela, pgbench ran to completion and became a zombie, and IPC::Run got
stuck when it should have been reaping that zombie.  Good thought, however.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

30 октября 2021 г., 03:57:57

On Fri, Oct 29, 2021 at 7:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here's an exported VM https://storage.yandexcloud.net/freebsd/dffb.ova if it's of any use.

Thanks!  I may yet need to use that because I haven't seen the problem
yet, but I finally managed to get the OS running with a reusable
Vagrant file after solving a couple of stupid problems[1].

[1] https://github.com/macdice/postgresql-dev-vagrant/blob/master/dragonfly6/Vagrantfile

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

30 октября 2021 г., 06:09:18

Hi Noah,

On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:

On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > Pushed.
>
> cm@enterprisedb.com, could you post a stack trace from buildfarm member
> gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> those seen today:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
>
> Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> broke something, but I've run out of ideas, and my hpux boxes died.

cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
concerned that the 2021-11-11 releases will contain a regression if the
gharial failures remain undiagnosed. I don't know how to diagnose them
without your help. (I currently can't justify reverting the suspect bug fix
on the basis of two master-branch-only failures in one buildfarm member.)

Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD

Please let me know if I missed something

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

30 октября 2021 г., 07:26:01


> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"?  I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> >   https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE.  My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not?  I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed.  I don't know how to diagnose them
> without your help.  (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-(  I checked the reports on the dashboard and there was
afailure then but the reports after 25th Oct looks fine. The back branches all look fine as well.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were
observedonly on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this
wouldaffect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault. 
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this
runsfail it's vital to collect backtraces of run. 

Many thanks!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

30 октября 2021 г., 07:59:27

On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed. I don't know how to diagnose them
> without your help. (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.

Hi Andrew,

OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to run multiple times in a day. Unfortunately, I can't run it ~100 times because it's a legacy (slow) server and also we run another animal (anole - with a different compiler) on the same server. Depending on the time every run on HEAD takes, I'll increase the frequency. Hope this helps.

Many thanks!

Best regards, Andrey Borodin.

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

30 октября 2021 г., 08:14:56

On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:

On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

> 30 окт. 2021 г., в 11:09, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
> On Thu, Oct 28, 2021 at 7:35 PM Noah Misch <noah@leadboat.com> wrote:
> On Sun, Oct 24, 2021 at 04:35:02PM -0700, Noah Misch wrote:
> > On Sat, Oct 23, 2021 at 10:00:28PM -0700, Noah Misch wrote:
> > > Pushed.
> >
> > cm@enterprisedb.com, could you post a stack trace from buildfarm member
> > gharial failing in "make check"? I'm looking for a trace from a SIGSEGV like
> > those seen today:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
> >
> > Those SIGSEGV are all in ALTER TABLE. My commits from this thread probably
> > broke something, but I've run out of ideas, and my hpux boxes died.
>
> cm@enterprisedb.com, is this still in your queue, or not? I am somewhat
> concerned that the 2021-11-11 releases will contain a regression if the
> gharial failures remain undiagnosed. I don't know how to diagnose them
> without your help. (I currently can't justify reverting the suspect bug fix
> on the basis of two master-branch-only failures in one buildfarm member.)
>
> Sorry for responding late as this email was missed somehow :-( I checked the reports on the dashboard and there was a failure then but the reports after 25th Oct looks fine. The back branches all look fine as well. https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=gharial&br=HEAD
>
> Please let me know if I missed something

Hi Sandeep, thank you for the response!

Some failed runs on this animal indicate SegFault in places changed by bugfix relevant to this thread. Fails were observed only on HEAD, but this may be a result of concurrent nature of possible new bug. If there is a new bug - this would affect upcoming point release. But we do not know for sure. Other animals show no signs of any related SegFault.
Can you please run make check phase on this animal on HEAD multiple time (preferably ~hundred times)? If some of this runs fail it's vital to collect backtraces of run.

Hi Andrew,

OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to run multiple times in a day. Unfortunately, I can't run it ~100 times because it's a legacy (slow) server and also we run another animal (anole - with a different compiler) on the same server. Depending on the time every run on HEAD takes, I'll increase the frequency. Hope this helps.

I've used "--force" option so that it ignores the last running status.

./run_build.pl --verbose --force HEAD

Many thanks!

Best regards, Andrey Borodin.

--
Sandeep Thakkar

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

30 октября 2021 г., 19:19:19

On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote:
> On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:
> > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Some failed runs on this animal indicate SegFault in places changed by
> >> bugfix relevant to this thread. Fails were observed only on HEAD, but this
> >> may be a result of concurrent nature of possible new bug. If there is a new
> >> bug - this would affect upcoming point release. But we do not know for
> >> sure. Other animals show no signs of any related SegFault.
> >> Can you please run make check phase on this animal on HEAD multiple time
> >> (preferably ~hundred times)? If some of this runs fail it's vital to
> >> collect backtraces of run.
> >
> > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to
> > run multiple times in a day. Unfortunately, I can't run it ~100 times
> > because it's a legacy (slow) server and also we run another animal (anole -
> > with a different compiler) on the same server. Depending on the time every
> > run on HEAD takes, I'll increase the frequency. Hope this helps.
>
> I've used "--force" option so that it ignores the last running status.
> ./run_build.pl --verbose --force HEAD

Thanks, but I don't think that gets us closer to having a stack trace for the
SIGSEGV.  If some run gets a SIGSEGV, the core dump will get unlinked.  I'd
disable both buildfarm member cron jobs and then run this script (derived from
gharial's buildfarm configuration):

git clone https://git.postgresql.org/git/postgresql.git
cd postgresql
git checkout 70bef49
export
LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
cat >temp.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
force_parallel_mode = regress
EOF
./configure --enable-cassert --enable-debug --with-perl --without-readline \
        --with-libxml --with-libxslt
--with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64
\

--with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include
\
        --with-pgport=5678 CFLAGS=-mlp64 CC=gcc
make
while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done
# When the loop stops, use the core file to make a stack trace.  If it runs
# for a week without stopping, give up.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

31 октября 2021 г., 07:31:00

Hi Noah,

On Sun, Oct 31, 2021 at 12:49 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Oct 30, 2021 at 01:44:56PM +0530, Sandeep Thakkar wrote:
> On Sat, Oct 30, 2021 at 1:29 PM Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> wrote:
> > On Sat, Oct 30, 2021 at 12:56 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >> Some failed runs on this animal indicate SegFault in places changed by
> >> bugfix relevant to this thread. Fails were observed only on HEAD, but this
> >> may be a result of concurrent nature of possible new bug. If there is a new
> >> bug - this would affect upcoming point release. But we do not know for
> >> sure. Other animals show no signs of any related SegFault.
> >> Can you please run make check phase on this animal on HEAD multiple time
> >> (preferably ~hundred times)? If some of this runs fail it's vital to
> >> collect backtraces of run.
> >
> > OK I got it now. I've scheduled "./run_build.pl HEAD" on this animal to
> > run multiple times in a day. Unfortunately, I can't run it ~100 times
> > because it's a legacy (slow) server and also we run another animal (anole -
> > with a different compiler) on the same server. Depending on the time every
> > run on HEAD takes, I'll increase the frequency. Hope this helps.
>
> I've used "--force" option so that it ignores the last running status.
> ./run_build.pl --verbose --force HEAD

Thanks, but I don't think that gets us closer to having a stack trace for the
SIGSEGV. If some run gets a SIGSEGV, the core dump will get unlinked. I'd
disable both buildfarm member cron jobs and then run this script (derived from
gharial's buildfarm configuration):

git clone https://git.postgresql.org/git/postgresql.git
cd postgresql
git checkout 70bef49
export LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
cat >temp.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
force_parallel_mode = regress
EOF
./configure --enable-cassert --enable-debug --with-perl --without-readline \
--with-libxml --with-libxslt --with-libs=/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64 \
--with-includes=/opt/packages/zlib-1.2.5/inst/include:/opt/packages/libxslt-1.1.26/inst/include:/opt/packages/libxml2-2.7.8/inst/include \
--with-pgport=5678 CFLAGS=-mlp64 CC=gcc
make
while make check TEMP_CONFIG=$PWD/temp.conf NO_LOCALE=1; do :; done
# When the loop stops, use the core file to make a stack trace. If it runs
# for a week without stopping, give up.

Yes, that makes sense. I did what you suggested (except that I used git:// instead of https:// during cloning as the latter returns with an error; I'll fix it later) and in the first run the core files got generated:

bash-4.1$ ls -l ./postgresql/src/test/regress/tmp_check/data/core.postgres.*
-rw------- 1 pgbfarm users 6672384 Oct 30 22:04 ./postgresql/src/test/regress/tmp_check/data/core.postgres.1267
-rw------- 1 pgbfarm users 3088384 Oct 30 22:09 ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422

Here is the stack trace:

bash-4.1$ gdb ./postgresql/tmp_install/usr/local/pgsql/bin/postgres ./postgresql/src/test/regress/tmp_check/data/core.postgres.5422
HP gdb 6.1 for HP Itanium (32 or 64 bit) and target HP-UX 11iv2 and 11iv3.
Copyright 1986 - 2009 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 6.1 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
SEGV_MAPERR - Address not mapped to object
#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()
(gdb) bt
#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()
warning: Attempting to unwind past bad PC 0x3fffffffff3fdbf0
#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
tupdesc2=0x60000000001fba08)
#2 0x40000000017f9660:0 in RelationClearRelation (
relation=0x60000000001f3730, rebuild=true)
#3 0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)
#4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)
#5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
msg=0x60000000001a46b8)
#6 0x40000000017c84e0:0 in ProcessInvalidationMessages (
group=0x60000000001a43e4, func=0x87ffffffef7b7250)
#7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()
#8 0x40000000006b1c50:0 in AtCCI_LocalCache ()
#9 0x40000000006b0910:0 in CommandCounterIncrement ()
#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
OIDOldToast=0)
#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
reloptions=0, lockmode=8) at toasting.c:62
#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
lockmode=8, context=0x87ffffffffffc590)

Дата:

01 ноября 2021 г., 13:10:04

On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> >
> > tupdesc2=0x60000000001fba08)

Could you print out *tupdesc1, *tupdesc2? And best also
p tupdesc1->attrs[0]
p tupdesc1->attrs[1]
p tupdesc1->attrs[2]
p tupdesc2->attrs[0]
p tupdesc2->attrs[1]
p tupdesc2->attrs[2]

you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files where this needs to be done or give me a patch?

> > #2 0x40000000017f9660:0 in RelationClearRelation (
> >
> > relation=0x60000000001f3730, rebuild=true)

Hm, too bad that we don't have working line numbers for some reason?

> > #3 0x40000000017fa730:0 in RelationFlushRelation
> > (relation=0x60000000001f3730)
> >
> > #4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry
> > (relationId=27272)
> >
> > #5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (
> >
> > msg=0x60000000001a46b8)
> >
> > #6 0x40000000017c84e0:0 in ProcessInvalidationMessages (
> >
> > group=0x60000000001a43e4, func=0x87ffffffef7b7250)
> >
> > #7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()
> >
> > #8 0x40000000006b1c50:0 in AtCCI_LocalCache ()
> >
> > #9 0x40000000006b0910:0 in CommandCounterIncrement ()
> >
> > #10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,
> >
> > toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,
> >
> > OIDOldToast=0)
> >
> > #11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,
> >
> > reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88
> >
> > #12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,
> >
> > reloptions=0, lockmode=8) at toasting.c:62
> >
> > #13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,
> >
> > lockmode=8, context=0x87ffffffffffc590)

This crash could suggest that somehow the catalogs were corrupted and
that we're not reading back valid tupledescs from them. Hopefully we'll
know more after looking at the tupledescs.

Greetings,

Andres Freund

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Andrey Borodin

Дата:

01 ноября 2021 г., 13:18:18


> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
>
>
> On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > > #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> > >
> > >     tupdesc2=0x60000000001fba08)
>
> Could you print out *tupdesc1, *tupdesc2? And best also
> p tupdesc1->attrs[0]
> p tupdesc1->attrs[1]
> p tupdesc1->attrs[2]
> p tupdesc2->attrs[0]
> p tupdesc2->attrs[1]
> p tupdesc2->attrs[2]
>
> you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files
wherethis needs to be done or give me a patch? 

You can paste these commands into gdb. Just as you did with bt. It will print values.

Thank you!

Best regards, Andrey Borodin.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

01 ноября 2021 г., 13:33:26

On Mon, Nov 1, 2021 at 6:48 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

> 1 нояб. 2021 г., в 18:10, Sandeep Thakkar <sandeep.thakkar@enterprisedb.com> написал(а):
>
>
>
> On Mon, Nov 1, 2021 at 12:48 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2021-10-31 13:01:00 +0530, Sandeep Thakkar wrote:
> > > #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> > >
> > > tupdesc2=0x60000000001fba08)
>
> Could you print out *tupdesc1, *tupdesc2? And best also
> p tupdesc1->attrs[0]
> p tupdesc1->attrs[1]
> p tupdesc1->attrs[2]
> p tupdesc2->attrs[0]
> p tupdesc2->attrs[1]
> p tupdesc2->attrs[2]
>
> you mean make the changes in the .c files to print these values and rerun the build? Can you please share the files where this needs to be done or give me a patch?

You can paste these commands into gdb. Just as you did with bt. It will print values.

Here you go (and with full bt):

(gdb) bt

#0 0x3fffffffff3fdbf0 in <unknown_procedure> ()

#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,

tupdesc2=0x60000000001fba08)

#2 0x40000000017f9660:0 in RelationClearRelation (

relation=0x60000000001f3730, rebuild=true)

#3 0x40000000017fa730:0 in RelationFlushRelation (relation=0x60000000001f3730)

#4 0x40000000017fabb0:0 in RelationCacheInvalidateEntry (relationId=27272)

#5 0x40000000017c8f20:0 in LocalExecuteInvalidationMessage (

msg=0x60000000001a46b8)

#6 0x40000000017c84e0:0 in ProcessInvalidationMessages (

group=0x60000000001a43e4, func=0x87ffffffef7b7250)

#7 0x40000000017cb420:0 in CommandEndInvalidationMessages ()

#8 0x40000000006b1c50:0 in AtCCI_LocalCache ()

#9 0x40000000006b0910:0 in CommandCounterIncrement ()

#10 0x4000000000807130:0 in create_toast_table (rel=0x60000000001f3730,

toastOid=0, toastIndexOid=0, reloptions=0, lockmode=8, check=true,

OIDOldToast=0)

#11 0x4000000000805ac0:0 in CheckAndCreateToastTable (relOid=27272,

reloptions=0, lockmode=8, check=true, OIDOldToast=0) at toasting.c:88

#12 0x4000000000805850:0 in AlterTableCreateToastTable (relOid=27272,

reloptions=0, lockmode=8) at toasting.c:62

#13 0x4000000000aa9a30:0 in ATRewriteCatalogs (wqueue=0x87ffffffffffc3b0,

lockmode=8, context=0x87ffffffffffc590)

---Type <return> to continue, or q <return> to quit---

#14 0x4000000000aa6fc0:0 in ATController (parsetree=0x60000000000e04e0,

rel=0x60000000001f3730, cmds=0x60000000000e0488, recurse=true, lockmode=8,

context=0x87ffffffffffc590)

#15 0x4000000000aa6410:0 in AlterTable (stmt=0x60000000000e04e0, lockmode=8,

context=0x87ffffffffffc590)

#16 0x40000000012f9a50:0 in ProcessUtilitySlow (pstate=0x6000000000117210,

pstmt=0x60000000000e0830,

queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",

context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#17 0x40000000012f8890:0 in standard_ProcessUtility (pstmt=0x60000000000e0830,

queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",

readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,

queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#18 0x40000000012f6150:0 in ProcessUtility (pstmt=0x60000000000e0830,

queryString=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;",

readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,

queryEnv=0x0, dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#19 0x40000000012f24c0:0 in PortalRunUtility (portal=0x600000000015baf0,

pstmt=0x60000000000e0830, isTopLevel=true, setHoldSnapshot=false,

dest=0x60000000000e0920, qc=0x87ffffffffffd838)

#20 0x40000000012f2c90:0 in PortalRunMulti (portal=0x600000000015baf0,

isTopLevel=true, setHoldSnapshot=false, dest=0x60000000000e0920,

---Type <return> to continue, or q <return> to quit---

altdest=0x60000000000e0920, qc=0x87ffffffffffd838)

#21 0x40000000012f08d0:0 in PortalRun (portal=0x600000000015baf0,

count=9223372036854775807, isTopLevel=true, run_once=true,

dest=0x60000000000e0920, altdest=0x60000000000e0920, qc=0x87ffffffffffd838)

#22 0x40000000012dcf70:0 in exec_simple_query (

query_string=0x60000000000df7f0 "ALTER TABLE attmp ADD COLUMN c text;")

#23 0x40000000012eaaf0:0 in PostgresMain (

dbname=0x60000000000aaa78 "regression",

username=0x60000000000a8ff8 "pgbfarm")

#24 0x40000000010751b0:0 in BackendRun (port=0x60000000001168e0)

#25 0x4000000001074040:0 in BackendStartup (port=0x60000000001168e0)

#26 0x4000000001068e60:0 in ServerLoop ()

#27 0x40000000010679b0:0 in PostmasterMain (argc=8, argv=0x87ffffffffffe610)

#28 0x4000000000d45660:0 in main (argc=8, argv=0x87ffffffffffe610)

at main.c:146

(gdb) p tupdesc1->attrs[1]

No symbol "tupdesc1" in current context.

(gdb) p tupdesc1->attrs[2]

No symbol "tupdesc1" in current context.

(gdb) p tupdesc2->attrs[0]

No symbol "tupdesc2" in current context.

(gdb) p tupdesc2->attrs[1]

No symbol "tupdesc2" in current context.

(gdb)

Thank you!

Best regards, Andrey Borodin.

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Robert Haas

Дата:

01 ноября 2021 г., 14:17:46

On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> (gdb) p tupdesc1->attrs[1]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc1->attrs[2]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc2->attrs[0]
> No symbol "tupdesc2" in current context.
> (gdb) p tupdesc2->attrs[1]
> No symbol "tupdesc2" in current context.
> (gdb)

I think you need to select stack frame 1 before running these
commands. I believe just running "frame 1" before you run these print
commands should do the trick.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Sandeep Thakkar

Дата:

02 ноября 2021 г., 00:50:42

On Mon, Nov 1, 2021 at 7:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 1, 2021 at 9:33 AM Sandeep Thakkar
<sandeep.thakkar@enterprisedb.com> wrote:
> (gdb) p tupdesc1->attrs[1]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc1->attrs[2]
> No symbol "tupdesc1" in current context.
> (gdb) p tupdesc2->attrs[0]
> No symbol "tupdesc2" in current context.
> (gdb) p tupdesc2->attrs[1]
> No symbol "tupdesc2" in current context.
> (gdb)

I think you need to select stack frame 1 before running these
commands. I believe just running "frame 1" before you run these print
commands should do the trick.

Thanks Robert, that worked. Here is the output:

(gdb) frame 1

#1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,

tupdesc2=0x60000000001fba08)

(gdb) p tupdesc1->attrs[0]

$1 = {attrelid = 27272, attname = {

data = "initial", '\000' <repeats 56 times>}, atttypid = 23,

attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0,

atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p',

attcompression = 0 '\000', attnotnull = false, atthasdef = false,

atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',

attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}

(gdb) p tupdesc1->attrs[1]

$2 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>},

atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0,

attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i',

attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,

atthasdef = true, atthasmissing = true, attidentity = 0 '\000',

attgenerated = 0 '\000', attisdropped = false, attislocal = true,

attinhcount = 0, attcollation = 0}

(gdb) p tupdesc1->attrs[2]

$3 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},

atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,

attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',

attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,

atthasdef = false, atthasmissing = false, attidentity = 0 '\000',

attgenerated = 0 '\000', attisdropped = false, attislocal = true,

attinhcount = 0, attcollation = 950}

(gdb) p tupdesc2->attrs[0]

$4 = {attrelid = 27272, attname = {

data = "initial", '\000' <repeats 56 times>}, atttypid = 23,

attstattarget = -1, attlen = 4, attnum = 1, attndims = 0, attcacheoff = 0,

atttypmod = -1, attbyval = true, attalign = 105 'i', attstorage = 112 'p',

attcompression = 0 '\000', attnotnull = false, atthasdef = false,

atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',

attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 0}

(gdb) p tupdesc2->attrs[1]

$5 = {attrelid = 27272, attname = {data = "a", '\000' <repeats 62 times>},

atttypid = 23, attstattarget = -1, attlen = 4, attnum = 2, attndims = 0,

attcacheoff = -1, atttypmod = -1, attbyval = true, attalign = 105 'i',

attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,

atthasdef = true, atthasmissing = true, attidentity = 0 '\000',

attgenerated = 0 '\000', attisdropped = false, attislocal = true,

attinhcount = 0, attcollation = 0}

(gdb) p tupdesc2->attrs[2]

$6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},

atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,

attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',

attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,

atthasdef = false, atthasmissing = false, attidentity = 0 '\000',

attgenerated = 0 '\000', attisdropped = false, attislocal = true,

attinhcount = 0, attcollation = 950}

(gdb)

--
Robert Haas
EDB: http://www.enterprisedb.com

Sandeep Thakkar

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

03 ноября 2021 г., 04:42:03

On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote:
> (gdb) frame 1
> 
> #1  0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
> 
>     tupdesc2=0x60000000001fba08)
> 
...
> (gdb) p tupdesc2->attrs[2]
> 
> $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
>   atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
>   attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
>   attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
>   atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
>   attgenerated = 0 '\000', attisdropped = false, attislocal = true,
>   attinhcount = 0, attcollation = 950}

That looks healthy.  Since gdb isn't giving line numbers, let's single-step
from the start of the function and see if that is informative.  Please apply
the attached patch, which just adds a test file.  Then run "make -C
src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and
attach
src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a
reply to this email.

On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote:
> So far I didn't come up with a clear understanding how this might happen.

Agreed.

> The only idea I have:
> 1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
> 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
> 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added
sizeof(relminmxid)?
> #define CLASS_TUPLE_SIZE \
>      (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))

See the comment at overread_tuplestruct_pg_cast for the reason why I think
that can't cause an actual malfunction.  Still, there's some possibility of
this being the explanation.

Вложения

step-to-crash-v0.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

03 ноября 2021 г., 08:03:56

Hi Noah

Sandeep is on vacation. So I am looking into this

I am facing some issues while applying the patch

bash-4.1$ git apply step-to-crash-v0.patch

fatal: git apply: bad git-diff - expected /dev/null on line 4

bash-4.1$ patch -p1 < step-to-crash-v0.patch

Hmm... I can't seem to find a patch in there anywhere.

Then i decided to manually create that src/test/subscription/t/080_step_equalTupleDescs.pl file and placed the content there once file is created i run this command make -C src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl but it didn't generated any src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs file

Also, I get this at the end of the make command

TAP tests not enabled. Try configuring with --enable-tap-tests

After enabling tap tests and executing the configure command again I get this error message

checking for fop... no

checking for dbtoepub... no

checking for perl module IPC::Run 0.79... no

checking for perl module Test::More 0.87... no

checking for perl module Time::HiRes 1.52... ok

configure: error: Additional Perl modules are required to run TAP tests

Seems like can't enable tap tests with current frameworks

On Wed, Nov 3, 2021 at 9:42 AM Noah Misch <noah@leadboat.com> wrote:

On Tue, Nov 02, 2021 at 06:20:42AM +0530, Sandeep Thakkar wrote:
> (gdb) frame 1
>
> #1 0x40000000003fdc00:0 in equalTupleDescs (tupdesc1=0x60000000001f65e0,
>
> tupdesc2=0x60000000001fba08)
>
...
> (gdb) p tupdesc2->attrs[2]
>
> $6 = {attrelid = 27272, attname = {data = "b", '\000' <repeats 62 times>},
> atttypid = 19, attstattarget = -1, attlen = 64, attnum = 3, attndims = 0,
> attcacheoff = -1, atttypmod = -1, attbyval = false, attalign = 99 'c',
> attstorage = 112 'p', attcompression = 0 '\000', attnotnull = false,
> atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
> attgenerated = 0 '\000', attisdropped = false, attislocal = true,
> attinhcount = 0, attcollation = 950}

That looks healthy. Since gdb isn't giving line numbers, let's single-step
from the start of the function and see if that is informative. Please apply
the attached patch, which just adds a test file. Then run "make -C
src/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl" and
attach
src/test/subscription/tmp_check/log/regress_log_080_step_equalTupleDescs in a
reply to this email.

On Mon, Nov 01, 2021 at 12:01:08AM +0500, Andrey Borodin wrote:
> So far I didn't come up with a clear understanding how this might happen.

Agreed.

> The only idea I have:
> 1. It seems equalTupleDescs() got two valid pointers, probably with broken data.
> 2. Maybe relation->rd_rel (alloceted just before relation->rd_att) was used incorrectly?
> 3. This could happen if CLASS_TUPLE_SIZE is calculated wrong. Don't we need to MAXALIGN everything due to added sizeof(relminmxid)?
> #define CLASS_TUPLE_SIZE \
> (offsetof(FormData_pg_class,relminmxid) + sizeof(TransactionId))

See the comment at overread_tuplestruct_pg_cast for the reason why I think
that can't cause an actual malfunction. Still, there's some possibility of
this being the explanation.

Thanks & Regards,
Semab

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Thomas Munro

Дата:

04 ноября 2021 г., 00:50:16

On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> checking for fop... no
> checking for dbtoepub... no
> checking for perl module IPC::Run 0.79... no
> checking for perl module Test::More 0.87... no
> checking for perl module Time::HiRes 1.52... ok
> configure: error: Additional Perl modules are required to run TAP tests
>
> Seems like can't enable tap tests with current frameworks

You could always install that locally with CPAN, with something like:

export PERL5LIB=~/perl5/lib/perl5
cpan -T IPC::Run

Hmm, speaking of missing package management, GCC 4.6.0 is long dead
and missing many bug fixes, and in fact the whole 4.6.x line is
finished.

I remembered another phase of weird segmentation faults that was never
really explained, on that system:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLukanJE9W8C%2B0n8iRsZDpbuhcWOxBMjGaUO-RNHhBGXw%40mail.gmail.com

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

05 ноября 2021 г., 01:35:53

On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote:
> On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> > checking for fop... no
> > checking for dbtoepub... no
> > checking for perl module IPC::Run 0.79... no
> > checking for perl module Test::More 0.87... no
> > checking for perl module Time::HiRes 1.52... ok
> > configure: error: Additional Perl modules are required to run TAP tests
> >
> > Seems like can't enable tap tests with current frameworks
> 
> You could always install that locally with CPAN, with something like:
> 
> export PERL5LIB=~/perl5/lib/perl5
> cpan -T IPC::Run

Here's something more self-contained:

mkdir $HOME/perl-for-tap
(cd $HOME/perl-for-tap &&
 wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz &&
 gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - &&
 wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz &&
 gzip -d <IPC-Run-0.79.tar.gz | tar xf -)
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"'

Semab, if you run those commands and then rerun the "configure" that failed,
does that help?

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

05 ноября 2021 г., 06:02:16

On Fri, Nov 5, 2021 at 6:35 AM Noah Misch <noah@leadboat.com> wrote:

On Thu, Nov 04, 2021 at 01:50:16PM +1300, Thomas Munro wrote:
> On Wed, Nov 3, 2021 at 9:04 PM Semab Tariq <semab.tariq@enterprisedb.com> wrote:
> > checking for fop... no
> > checking for dbtoepub... no
> > checking for perl module IPC::Run 0.79... no
> > checking for perl module Test::More 0.87... no
> > checking for perl module Time::HiRes 1.52... ok
> > configure: error: Additional Perl modules are required to run TAP tests
> >
> > Seems like can't enable tap tests with current frameworks
>
> You could always install that locally with CPAN, with something like:
>
> export PERL5LIB=~/perl5/lib/perl5
> cpan -T IPC::Run

Here's something more self-contained:

mkdir $HOME/perl-for-tap
(cd $HOME/perl-for-tap &&
wget http://backpan.perl.org/modules/by-authors/id/M/MS/MSCHWERN/Test-Simple-0.87_01.tar.gz &&
gzip -d <Test-Simple-0.87_01.tar.gz | tar xf - &&
wget https://cpan.metacpan.org/authors/id/R/RS/RSOD/IPC-Run-0.79.tar.gz &&
gzip -d <IPC-Run-0.79.tar.gz | tar xf -)
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib
perl -MIPC::Run -MTest::More -e 'print $IPC::Run::VERSION, "\n", $Test::More::VERSION, "\n"'

Yes configure works with these commands thanks

After configure when i run make -C sarc/test/subscription check PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching regress_log_080_step_equalTupleDescs file and log file of make command

Thanks & Regards,
Semab

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

05 ноября 2021 г., 09:03:47

On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote:
> After configure when i run make -C sarc/test/subscription check
> PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching
> regress_log_080_step_equalTupleDescs file and log file of make command

Thanks.  Please replace the previous test case patch with this one, and run it
again.  (On your platform, gdb needs an explicit path to the executable.)

Вложения

step-to-crash-v0.1.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

05 ноября 2021 г., 10:21:15

On Fri, Nov 5, 2021 at 2:03 PM Noah Misch <noah@leadboat.com> wrote:

On Fri, Nov 05, 2021 at 11:02:16AM +0500, Semab Tariq wrote:
> After configure when i run make -C sarc/test/subscription check
> PROVE_TESTS=t/080_step_equalTupleDescs.pl it got stucked I am attaching
> regress_log_080_step_equalTupleDescs file and log file of make command

Thanks. Please replace the previous test case patch with this one, and run it
again.

Did the same with the updated patch and this time make command exit with an error

Bailout called. Further testing stopped: command "pg_ctl -D /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/pgdata -m immediate stop" exited with value 1 make: *** [check] Error 1 make: Leaving directory `/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription'

PFA make command log and regress_log_080_step_equalTupleDescs files

Thanks & Regards,
Semab

Вложения

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

05 ноября 2021 г., 11:22:36

On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
>     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)

The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
address; it may be in a program text section.  tupdesc2 is a stack address.
In the earlier stack trace from
https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
both tupdesc1 and tupdesc2 were heap addresses.

>
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8:
Errorin sourced command file:

> Error accessing memory address 0x40010006f968: Bad address.

Thanks.  Please try the attached test version, which avoids exiting too early
like the last version did.

Вложения

step-to-crash-v0.2.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

06 ноября 2021 г., 23:08:46

On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> 
> The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> address; it may be in a program text section.  tupdesc2 is a stack address.
> In the earlier stack trace from
> https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> both tupdesc1 and tupdesc2 were heap addresses.
> 
> >
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8:
Errorin sourced command file:
 
> > Error accessing memory address 0x40010006f968: Bad address.
> 
> Thanks.  Please try the attached test version, which avoids exiting too early
> like the last version did.

If you haven't run that yet, please use this version instead.  It collects
more data.  The log will probably be too big to be proper for the mailing
list, so please compress it.

Вложения

step-to-crash-v0.3.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

07 ноября 2021 г., 15:25:09

On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:

On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
>
> The addresses there are weird. tupdesc1 is neither a stack address nor a heap
> address; it may be in a program text section. tupdesc2 is a stack address.
> In the earlier stack trace from
> https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> both tupdesc1 and tupdesc2 were heap addresses.
>
> > /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/tmp_check/t_080_step_equalTupleDescs_main_data/commands-gdb:8: Error in sourced command file:
> > Error accessing memory address 0x40010006f968: Bad address.
>
> Thanks. Please try the attached test version, which avoids exiting too early
> like the last version did.

If you haven't run that yet, please use this version instead. It collects
more data. The log will probably be too big to be proper for the mailing
list, so please compress it.

Hi Noah

PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

Thanks & Regards,
Semab

Вложения

log.tar.bz2

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

07 ноября 2021 г., 23:22:02

On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote:
> On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > > >     tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> > >
> > > The addresses there are weird.  tupdesc1 is neither a stack address nor a heap
> > > address; it may be in a program text section.  tupdesc2 is a stack address.
> > > In the earlier stack trace from
> > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> > > both tupdesc1 and tupdesc2 were heap addresses.

That turned out to be a false alarm.  On gharial, a breakpoint at the start of
the function doesn't see the real arguments.  After a ten-instruction
prologue, the real arguments appear, and they are heap addresses.

> PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
defval2->adbin).  Please run the attached version, which collects yet more
information.

Вложения

step-to-crash-v0.4.patch

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

08 ноября 2021 г., 06:22:45

On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:

On Sun, Nov 07, 2021 at 08:25:09PM +0500, Semab Tariq wrote:
> On Sun, Nov 7, 2021 at 4:08 AM Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Nov 05, 2021 at 04:22:36AM -0700, Noah Misch wrote:
> > > On Fri, Nov 05, 2021 at 03:21:15PM +0500, Semab Tariq wrote:
> > > > Breakpoint 1, 0x40000000003fcb50:0 in equalTupleDescs (
> > > > tupdesc1=0x40010006f968, tupdesc2=0x87ffffffffffac50)
> > >
> > > The addresses there are weird. tupdesc1 is neither a stack address nor a heap
> > > address; it may be in a program text section. tupdesc2 is a stack address.
> > > In the earlier stack trace from
> > > https://postgr.es/m/CANFyU94Xa8a5+4sZ7PxOiDLq+yN89g6y-9nNk-eLEvX6YUXbXA@mail.gmail.com
> > > both tupdesc1 and tupdesc2 were heap addresses.

That turned out to be a false alarm. On gharial, a breakpoint at the start of
the function doesn't see the real arguments. After a ten-instruction
prologue, the real arguments appear, and they are heap addresses.

> PFA new regress_log_080_step_equalTupleDescs file generated from your latest patch

Thanks. That shows the crash happened sometime after strcmp(defval1->adbin,
defval2->adbin). Please run the attached version,

PFA the new log file

Thanks & Regards,
Semab

Вложения

log-1.tar.bz2

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

08 ноября 2021 г., 07:09:50

On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks.  That shows the crash happened sometime after strcmp(defval1->adbin,
> > defval2->adbin).  Please run the attached version,
> 
> PFA the new log file

> 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8)
> 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>:          br.call.sptk.many rp=0x3fffffffff3fdc30
> 0x3fffffffff3fdc30 in <unknown_procedure> ()
> 0x3fffffffff3fdc30:    Error accessing memory address 0x3fffffffff3fdc30: Bad address.

This postgres binary apparently contains an explicit branch to
0x3fffffffff3fdc30, which is not an address reasonably expected to contain
code.  (It's not a known heap, a known stack, or a CODE section from the
binary file.)  This probably confirms a toolchain bug.

Would you do "git checkout 166f943" in the source directory you've been
testing, then rerun the test and post the compressed tmp_check/log directory?
I'm guessing that will show the bad branch instruction no longer present.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

08 ноября 2021 г., 09:09:15

On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 11:22:45AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 4:22 AM Noah Misch <noah@leadboat.com> wrote:
> > Thanks. That shows the crash happened sometime after strcmp(defval1->adbin,
> > defval2->adbin). Please run the attached version,
>
> PFA the new log file

> 0x40000000003fdc30:2 in equalTupleDescs (tupdesc1=0x60000000001fdb98, tupdesc2=0x6000000000202da8)
> 0x40000000003fdc30:2 <equalTupleDescs+0x10e2>: br.call.sptk.many rp=0x3fffffffff3fdc30
> 0x3fffffffff3fdc30 in <unknown_procedure> ()
> 0x3fffffffff3fdc30: Error accessing memory address 0x3fffffffff3fdc30: Bad address.

This postgres binary apparently contains an explicit branch to
0x3fffffffff3fdc30, which is not an address reasonably expected to contain
code. (It's not a known heap, a known stack, or a CODE section from the
binary file.) This probably confirms a toolchain bug.

Would you do "git checkout 166f943" in the source directory you've been
testing, then rerun the test and post the compressed tmp_check/log directory?

So make command fails after I applied the v0.4 patch to 166f943 also it did not create any tmp_check/log directory

PFA make output

I'm guessing that will show the bad branch instruction no longer present.

Thanks & Regards,
Semab

Вложения

make-2.log

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

08 ноября 2021 г., 14:46:39

On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > This postgres binary apparently contains an explicit branch to
> > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > code.  (It's not a known heap, a known stack, or a CODE section from the
> > binary file.)  This probably confirms a toolchain bug.
> >
> > Would you do "git checkout 166f943" in the source directory you've been
> > testing, then rerun the test and post the compressed
> > tmp_check/log directory?
> >
>
> So make command fails after I applied the v0.4 patch to 166f943 also it did
> not create any tmp_check/log directory

> t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains:
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription
/opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/5.8.8
/opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64/opt/perl_64/lib/site_perl/5.8.8
/opt/perl_64/lib/site_perl/opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64
/opt/perl_64/lib/vendor_perl/5.8.8/opt/perl_64/lib/vendor_perl .) at
/home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm
line102. 

It looks like this attempt did not use a shell with the environment variables
discussed upthread.  Use these commands first to bring back the required
environment:

export
LD_LIBRARY_PATH='/opt/uuid-1.6.2/inst/lib:/opt/packages/uuid-1.6.2/inst/lib:/opt/packages/krb5-1.11.3/inst/lib/hpux64:/opt/packages/libmemcached-0.46/inst/lib/hpux64:/opt/packages/libevent-2.0.10/inst/lib/hpux64:/opt/packages/expat-2.0.1/inst/lib/hpux64:/opt/packages/gdbm-1.8.3/inst/lib/hpux64:/opt/packages/openldap-2.4.24/inst/lib/hpux64:/opt/packages/proj-4.7.0/inst/lib:/opt/packages/geos-3.2.2/inst/lib:/opt/packages/db-5.1.19/inst/lib/hpux64:/opt/packages/freetype-2.4.4/inst/lib/hpux64:/opt/packages/tcltk-8.5.9/inst/lib/hpux64:/opt/packages/openssl-1.0.0d/inst/lib/hpux64:/opt/packages/editline-2.9/inst/lib/hpux64:/opt/packages/mutt-1.5.21/inst/lib/hpux64:/opt/packages/libidn-1.20/inst/lib/hpux64:/opt/packages/libxslt-1.1.26/inst/lib/hpux64:/opt/packages/libgcrypt-1.4.6/inst/lib/hpux64:/opt/packages/libgpg_error-1.10/inst/lib/hpux64:/opt/packages/libxml2-2.7.8/inst/lib/hpux64:/opt/packages/zlib-1.2.5/inst/lib/hpux64:/opt/packages/grep-2.7/inst/lib/hpux64:/opt/packages/pcre-8.12/inst/lib/hpux64:/opt/packages/ncurses-5.8/inst/lib/hpux64:/opt/packages/termcap-1.3.1/inst/lib/hpux64:/opt/packages/gettext-0.18.1.1/inst/lib/hpux64:/opt/packages/libiconv-1.13.1/inst/lib/hpux64:/opt/packages/sdk-10.2.0.5.0-hpux-ia64/instantclient_10_2/lib'
export PERL5LIB=$HOME/perl-for-tap/Test-Simple-0.87_01/lib:$HOME/perl-for-tap/IPC-Run-0.79/lib

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Semab Tariq

Дата:

09 ноября 2021 г., 06:55:57

On Mon, Nov 8, 2021 at 7:46 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > This postgres binary apparently contains an explicit branch to
> > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > code. (It's not a known heap, a known stack, or a CODE section from the
> > binary file.) This probably confirms a toolchain bug.
> >
> > Would you do "git checkout 166f943" in the source directory you've been
> > testing, then rerun the test and post the compressed
> > tmp_check/log directory?
> >
>
> So make command fails after I applied the v0.4 patch to 166f943 also it did
> not create any tmp_check/log directory

> t/080_step_equalTupleDescs....Can't locate IPC/Run.pm in @INC (@INC contains: /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription /opt/perl_64/lib/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/5.8.8 /opt/perl_64/lib/site_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/site_perl/5.8.8 /opt/perl_64/lib/site_perl /opt/perl_64/lib/vendor_perl/5.8.8/IA64.ARCHREV_0-thread-multi-LP64 /opt/perl_64/lib/vendor_perl/5.8.8 /opt/perl_64/lib/vendor_perl .) at /home/pgbfarm/buildroot-gharial-HEAD/postgresql/src/test/subscription/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 102.

It looks like this attempt did not use a shell with the environment variables
discussed upthread.

Yes, I missed the trick PFA tmp_check.tar.bz2

Thanks & Regards,
Semab

Вложения

tmp_check.tar.bz2

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

09 ноября 2021 г., 14:40:21

On Tue, Nov 09, 2021 at 11:55:57AM +0500, Semab Tariq wrote:
> On Mon, Nov 8, 2021 at 7:46 PM Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Nov 08, 2021 at 02:09:15PM +0500, Semab Tariq wrote:
> > > On Mon, Nov 8, 2021 at 12:09 PM Noah Misch <noah@leadboat.com> wrote:
> > > > This postgres binary apparently contains an explicit branch to
> > > > 0x3fffffffff3fdc30, which is not an address reasonably expected to contain
> > > > code.  (It's not a known heap, a known stack, or a CODE section from the
> > > > binary file.)  This probably confirms a toolchain bug.
> > > >
> > > > Would you do "git checkout 166f943" in the source directory you've been
> > > > testing, then rerun the test and post the compressed
> > > > tmp_check/log directory?

> PFA tmp_check.tar.bz2

Excellent.  No crash, and the only difference in equalTupleDescs() code
generation is the branch destination addresses.  At commit 70bef49, gharial's
toolchain generates the invalid branch destination:

$ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
--- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
+++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
@@ -100 +100 @@
-      br.call.sptk.many rp=0x40000000003cdc20
+      br.call.sptk.many rp=0x40000000003cde20
@@ -658 +658 @@
-      br.call.sptk.many rp=0x40000000003cdc20
+      br.call.sptk.many rp=0x40000000003cde20
@@ -817 +817 @@
-             br.call.sptk.many rp=0x3fffffffff3fdc30
+             br.call.sptk.many rp=0x4000000000400c30
@@ -949 +949 @@
-             br.call.sptk.many rp=0x40000000003cdc20
+             br.call.sptk.many rp=0x40000000003cde20
@@ -970 +970 @@
-             br.call.sptk.many rp=0x40000000003cdc20
+             br.call.sptk.many rp=0x40000000003cde20

Since "git diff 70bef49 166f943" contains nothing that correlates with such a
change, I'm concluding that this is a bug in gharial's toolchain.

It looks like gharial's automatic buildfarm runs have been paused for nine
days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
years pass between these events.  Here are all gharial "signal 11" failures,
likely some of which have other causes:

 sysname │      snapshot       │    branch     │                                            bfurl
                    
 

─────────┼─────────────────────┼───────────────┼─────────────────────────────────────────────────────────────────────────────────────────────
 gharial │ 2018-04-10 00:32:08 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-10%2000%3A32%3A08
 gharial │ 2019-03-08 01:30:45 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2001%3A30%3A45
 gharial │ 2019-03-08 08:55:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2008%3A55%3A31
 gharial │ 2019-03-08 19:55:38 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-03-08%2019%3A55%3A38
 gharial │ 2019-08-20 09:57:27 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-20%2009%3A57%3A27
 gharial │ 2019-08-21 08:04:58 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-21%2008%3A04%3A58
 gharial │ 2019-08-22 00:37:03 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2000%3A37%3A03
 gharial │ 2019-08-22 12:42:02 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-22%2012%3A42%3A02
 gharial │ 2019-08-24 18:43:52 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-24%2018%3A43%3A52
 gharial │ 2019-08-25 11:14:36 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2011%3A14%3A36
 gharial │ 2019-08-25 18:44:04 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-25%2018%3A44%3A04
 gharial │ 2019-08-26 08:47:19 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2008%3A47%3A19
 gharial │ 2019-08-26 22:30:23 │ REL_12_STABLE │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2019-08-26%2022%3A30%3A23
 gharial │ 2021-04-08 03:21:42 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-08%2003%3A21%3A42
 gharial │ 2021-04-09 06:40:31 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-09%2006%3A40%3A31
 gharial │ 2021-10-24 16:19:05 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2016%3A19%3A05
 gharial │ 2021-10-24 20:38:39 │ HEAD          │
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-10-24%2020%3A38%3A39
(17 rows)

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Thomas Munro

Дата:

09 ноября 2021 г., 21:55:13

On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote:
> $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
> --- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
> +++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
> @@ -100 +100 @@
> -      br.call.sptk.many rp=0x40000000003cdc20
> +      br.call.sptk.many rp=0x40000000003cde20
> @@ -658 +658 @@
> -      br.call.sptk.many rp=0x40000000003cdc20
> +      br.call.sptk.many rp=0x40000000003cde20
> @@ -817 +817 @@
> -             br.call.sptk.many rp=0x3fffffffff3fdc30
> +             br.call.sptk.many rp=0x4000000000400c30
> @@ -949 +949 @@
> -             br.call.sptk.many rp=0x40000000003cdc20
> +             br.call.sptk.many rp=0x40000000003cde20
> @@ -970 +970 @@
> -             br.call.sptk.many rp=0x40000000003cdc20
> +             br.call.sptk.many rp=0x40000000003cde20
>
> Since "git diff 70bef49 166f943" contains nothing that correlates with such a
> change, I'm concluding that this is a bug in gharial's toolchain.

Excellent detective work.

> It looks like gharial's automatic buildfarm runs have been paused for nine
> days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
> setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
> years pass between these events.  Here are all gharial "signal 11" failures,
> likely some of which have other causes:

Yeah I spent time investigating some of these at the time and I know
others did too.

IMHO we should hunt down dead toolchains and gently see if we can get
them updated.   There is no point in testing PostgreSQL on every
commit on a compiler someone built from a tarball many years ago and
never updated with bug fixes.

Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

От

Noah Misch

Дата:

10 ноября 2021 г., 03:53:41

On Wed, Nov 10, 2021 at 10:55:13AM +1300, Thomas Munro wrote:
> On Wed, Nov 10, 2021 at 3:40 AM Noah Misch <noah@leadboat.com> wrote:
> > $ diff -U0 <(cut -b47- disasm/70bef49/equalTupleDescs) <(cut -b47- disasm/166f943/equalTupleDescs)
> > --- /dev/fd/63  2021-11-09 06:11:20.927444437 -0800
> > +++ /dev/fd/62  2021-11-09 06:11:20.926444428 -0800
> > @@ -100 +100 @@
> > -      br.call.sptk.many rp=0x40000000003cdc20
> > +      br.call.sptk.many rp=0x40000000003cde20
> > @@ -658 +658 @@
> > -      br.call.sptk.many rp=0x40000000003cdc20
> > +      br.call.sptk.many rp=0x40000000003cde20
> > @@ -817 +817 @@
> > -             br.call.sptk.many rp=0x3fffffffff3fdc30
> > +             br.call.sptk.many rp=0x4000000000400c30
> > @@ -949 +949 @@
> > -             br.call.sptk.many rp=0x40000000003cdc20
> > +             br.call.sptk.many rp=0x40000000003cde20
> > @@ -970 +970 @@
> > -             br.call.sptk.many rp=0x40000000003cdc20
> > +             br.call.sptk.many rp=0x40000000003cde20
> >
> > Since "git diff 70bef49 166f943" contains nothing that correlates with such a
> > change, I'm concluding that this is a bug in gharial's toolchain.
> 
> Excellent detective work.

Thanks.

> > It looks like gharial's automatic buildfarm runs have been paused for nine
> > days.  Feel free to unpause it.  Also, I recommend using the buildfarm client
> > setnotes.pl to add a note like 'Rare signal 11 from toolchain bug'.  Months or
> > years pass between these events.  Here are all gharial "signal 11" failures,
> > likely some of which have other causes:
> 
> Yeah I spent time investigating some of these at the time and I know
> others did too.
> 
> IMHO we should hunt down dead toolchains and gently see if we can get
> them updated.   There is no point in testing PostgreSQL on every
> commit on a compiler someone built from a tarball many years ago and
> never updated with bug fixes.

+1 for "gently see if".  It's good that we have a few old-gcc animals, but
having versions of intermediate age is less important.  Some owners will
decline, and that's okay.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Noah Misch

Дата:

10 ноября 2021 г., 04:15:41

On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote:
> On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
> > > 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
> > > These failures started on 2021-10-09, the day conchuela updated from DragonFly
> > > v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
> > > Since the theorized kernel bug seems not to affect
> > > src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
> > > from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
> > > the newest failure sites is that they don't write anything to the child stdin.
> 
> > >  If not, does passing the script via stdin, like "pgbench -f-
> > > <script.sql", work around the problem?
> > 
> > I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.
> 
> Did that work?  Commit 7f580aa should make this unnecessary for v12+
> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
> needs a fix in all branches.  A backup plan is just to skip affected tests on
> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
> optimistic that a better workaround will present itself.

Is this still in your queue, or not?  The conchuela breakage in
src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm
noise right now.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Andrey Borodin

Дата:

10 ноября 2021 г., 05:49:33


> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>
> On Thu, Oct 28, 2021 at 07:56:00PM -0700, Noah Misch wrote:
>> On Sun, Oct 24, 2021 at 09:03:27PM +0300, Andrey Borodin wrote:
>>>> 24 окт. 2021 г., в 19:19, Noah Misch <noah@leadboat.com> написал(а):
>>>> These failures started on 2021-10-09, the day conchuela updated from DragonFly
>>>> v4.4.3-RELEASE to DragonFly v6.0.0-RELEASE.  It smells like a kernel bug.
>>>> Since the theorized kernel bug seems not to affect
>>>> src/test/subscription/t/015_stream.pl, I wonder if we can borrow a workaround
>>>> from other tests.  One thing in common with src/test/recovery/t/017_shm.pl and
>>>> the newest failure sites is that they don't write anything to the child stdin.
>>
>>>> If not, does passing the script via stdin, like "pgbench -f-
>>>> <script.sql", work around the problem?
>>>
>>> I'll test it tomorrow, the refactoring does not seem trivial given we use many simultaneous scripts.
>>
>> Did that work?  Commit 7f580aa should make this unnecessary for v12+
>> contrib/amcheck tests, but older branches still need a fix, and 017_shm.pl
>> needs a fix in all branches.  A backup plan is just to skip affected tests on
>> dragonfly 6+.  Since the breakage has been limited to so few tests, I'm
>> optimistic that a better workaround will present itself.
>
> Is this still in your queue, or not?  The conchuela breakage in
> src/test/recovery and v11+v10 src/bin/pgbench is a large source of buildfarm
> noise right now.
Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week.

Best regards, Andrey Borodin.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Andrey Borodin

Дата:

13 ноября 2021 г., 18:47:43


>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>>
> Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next week.
> 

I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer branches.
But so far I didn't come up with some clever mutex idea for REL_10.

Best regards, Andrey Borodin.

Вложения

0001-Get-rid-of-background_pgbench-in-CIC-tests.patch

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Noah Misch

Дата:

14 ноября 2021 г., 00:09:52

On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
> >> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
> > Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next
week.
> 
> I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
> I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
> But so far I didn't come up with some clever mutex idea for REL_10.

That's a reasonable sentiment, but removing background_pgbench() isn't going
to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
test files on dragonfly >= 6 if you decide to cease figuring out how to make
them pass like the others do.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

14 ноября 2021 г., 01:24:54

On Sun, Nov 14, 2021 at 1:17 PM Noah Misch <noah@leadboat.com> wrote:
> On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
> > I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
> > I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
> > But so far I didn't come up with some clever mutex idea for REL_10.
>
> That's a reasonable sentiment, but removing background_pgbench() isn't going
> to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
> src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
> test files on dragonfly >= 6 if you decide to cease figuring out how to make
> them pass like the others do.

Hmm, so if "IPC::Run got stuck when it should have been reaping that
zombie", what's it stuck in, I guess select() or waitpid()?  Maybe
there' s a kernel bug but it seems hard to believe that a Unix system
would have bugs in such fundamental facilities and still be able to
build itself and ship a release...  Otherwise I guess Perl, or perl
scripts, would need to be confusing fds or pids or something?  But
that's hard to believe on its own, too, given the lack of problems on
other systems that are pretty similar.  If Andrey can still reproduce
this, it'd be interesting to see a gdb backtrace, and also "ps O
wchan" or perhaps kill -INFO $pid, and lsof for the process (or
according to old pages found with google, perhaps the equivalent tool
is "fstat" on that system).

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Andrey Borodin

Дата:

15 ноября 2021 г., 07:19:56


> 14 нояб. 2021 г., в 05:09, Noah Misch <noah@leadboat.com> написал(а):
>
> On Sat, Nov 13, 2021 at 11:47:43PM +0500, Andrey Borodin wrote:
>>>> 10 нояб. 2021 г., в 09:15, Noah Misch <noah@leadboat.com> написал(а):
>>> Uh, sorry, this problem fell out of my attention somehow. I'll try to do something with 10 and 11 this or next
week.
>>
>> I've adapted 7f580aa to functionality of REL_11 using "\if 0 = :client_id" metacommand.
>> I really do not like idea of supporting background_pgbench() in older branches without counterpart in newer
branches.
>> But so far I didn't come up with some clever mutex idea for REL_10.
>
> That's a reasonable sentiment, but removing background_pgbench() isn't going
> to fix 017_shm.pl.  I'm not enthusiastic about any fix that repairs
> src/bin/pgbench without repairing 017_shm.pl.  I'm okay with skipping affected
> test files on dragonfly >= 6 if you decide to cease figuring out how to make
> them pass like the others do.

Let's skip these tests. How this can be accomplished?
Should we mute only 022_cic.pl, 023_cic_2pc.pl, and 017_shm.pl? Or everything that calls harness->finish?

I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used...

Best regards, Andrey Borodin.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

15 ноября 2021 г., 21:23:02

On Mon, Nov 15, 2021 at 8:20 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I've sent some diagnostics info to Thomas offlist, but I do not understand how it can be used...

Summary for the record: defunct child, and gdb showed perl blocked in
select(), and ps revealed that Dragonfly internally maps select() onto
kevent() (wchan = kqread), which is interesting, but apparently not
new.  (Hmmm, /me eyes the recent changes to events reported on
half-closed pipes[1].)

[1] https://github.com/DragonFlyBSD/DragonFlyBSD/commits/master/sys/kern/sys_generic.c

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

19 ноября 2021 г., 04:09:04

I managed to reproduce the 017_shm.pl hang using my Vagrant file (see
earlier).  It happens for "gmake -s check-world", but not "gmake -s -C
src/test/recovery check", which might have confused me last time.  The
symptoms are as described already, though this time I learned with
truss that it's in a retry loop waiting 1 second at a time.

I don't know how to get debug symbols for libc on this operating
system (well, probably needed to select debug option while installing
the OS, but I didn't install the OS, I'm using a lazy bones pre-rolled
Vagrant image).  So in order to be able to inspect the arguments to
select, I made my own LD_PRELOAD wrapper for select (see end), and I
observed that it was calling select() with an empty fd_set:

(gdb) bt
#0  0x00000008014c1e6c in select () from /lib/libc.so.8
#1  0x00000008009f0fe6 in select () from /usr/lib/libpthread.so.0
#2  0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0,
writefds=0x8017f6d50, exceptfds=0x0,
    timeout=0x7fffffdfd4a0) at select.c:13
#3  0x00000008007bfaf9 in Perl_pp_sselect () from
/usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#4  0x000000080076b036 in Perl_runops_standard ()
   from /usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#5  0x00000008006da3b7 in perl_run () from
/usr/local/lib/perl5/5.32/mach/CORE/libperl.so.5.32
#6  0x0000000000400de4 in main ()
(gdb) f 2
#2  0x000000080044c622 in select (nfds=16, readfds=0x8017f6de0,
writefds=0x8017f6d50, exceptfds=0x0,
    timeout=0x7fffffdfd4a0) at select.c:13
13        return real_function(nfds, readfds, writefds, exceptfds, timeout);
(gdb) print nfds
$1 = 16
(gdb) print *readfds
$2 = {fds_bits = {0 <repeats 16 times>}}
(gdb) print *writefds
$3 = {fds_bits = {0 <repeats 16 times>}}

So it looks a lot like something on the perl side has lost track of
the pipe it's supposed to be selecting on.  If I understand correctly,
it's supposed to be waiting for one of the following pipes to appear
as readable, whichever is the one that the zombie psql process
previously held the other end of:

$ fstat -p 73032
USER     CMD          PID   FD PATH                  INUM MODE
  SZ|DV R/W
vagrant  perl       73032 root /                        1 drwxr-xr-x
 offset:0  r
vagrant  perl       73032   wd
/home/vagrant/postgres/src/test/recovery     4434524058 drwxr-xr-x
offset:0  r
vagrant  perl       73032 text /pfs/@@-1:00001/local/bin/perl
     4329919842 -rwxr-xr-x    offset:0  r
vagrant  perl       73032    0 /dev/pts/0
          1335 crw--w----     pts/0:155171 rw
vagrant  perl       73032    1
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    2
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    3
/home/vagrant/postgres/src/test/recovery/tmp_check/log/regress_log_017_shm
  4477427235 -rw-r--r--    offset:9457  w
vagrant  perl       73032    4* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    5* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    6* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    7* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    8* pipe fffff800aac1dca0 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032    9* pipe fffff800ab6e9020 (B<->A) ravail 0
wavail 0 rw
vagrant  perl       73032   13* pipe fffff800ab157560 (B<->A) ravail 0
wavail 0 rw


=== select wrapper ===

$ cat select.c
#include <dlfcn.h>
#include <sys/select.h>

int
select(int nfds,fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
    struct timeval *timeout)
{
    static int (*real_function)(int, fd_set *, fd_set *, fd_set *,
        struct timeval *);

    if (!real_function)
        real_function = dlsym(RTLD_NEXT, "select");

    return real_function(nfds, readfds, writefds, exceptfds, timeout);
}

$ cc -Wall -fPIC -shared -g -o myselect.so select.c -ldl
$ cd postgres
$ LD_PRELOAD=/home/vagrant/myselect.so gmake check-world -s

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

19 ноября 2021 г., 04:52:59

On Fri, Nov 19, 2021 at 5:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> (gdb) print *writefds
> $3 = {fds_bits = {0 <repeats 16 times>}}

Oops, that was after it had been cleared already by the OS; duh.  On
entry to my wrapper, writefds does in fact contain the bit pattern for
fd 13.  That led me to try a very simply C program which runs to
completion on Linux and FreeBSD, but hangs forever on Dragonfly.

Вложения

test.c

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Noah Misch

Дата:

19 ноября 2021 г., 06:13:25

On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote:
> led me to try a very simply C program which runs to
> completion on Linux and FreeBSD, but hangs forever on Dragonfly.

> [pipe FD considered non-writable, but writing would give EPIPE]

Looks like optimal material for a kernel bug report.  Excellent discovery.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Andrey Borodin

Дата:

19 ноября 2021 г., 08:43:56


> 19 нояб. 2021 г., в 11:13, Noah Misch <noah@leadboat.com> написал(а):
>
> Looks like optimal material for a kernel bug report.  Excellent discovery.

Are we going to wait for the fix or disable tests partially on conchuela?

Best regards, Andrey Borodin.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

19 ноября 2021 г., 09:07:42

On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote:
> On Fri, Nov 19, 2021 at 05:52:59PM +1300, Thomas Munro wrote:
> > led me to try a very simply C program which runs to
> > completion on Linux and FreeBSD, but hangs forever on Dragonfly.
>
> > [pipe FD considered non-writable, but writing would give EPIPE]
>
> Looks like optimal material for a kernel bug report.  Excellent discovery.

https://bugs.dragonflybsd.org/issues/3307

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Tom Lane

Дата:

19 ноября 2021 г., 14:56:31

Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Nov 19, 2021 at 7:13 PM Noah Misch <noah@leadboat.com> wrote:
>> Looks like optimal material for a kernel bug report.  Excellent discovery.

> https://bugs.dragonflybsd.org/issues/3307

I see they're pushing back on whether this is a bug.  I failed to find a
way to quickly comment on the bug report, but I suggest you quote POSIX
select(2) at them:

    A descriptor shall be considered ready for reading when a call to an
    input function with O_NONBLOCK clear would not block, whether or not
    the function would transfer data successfully. (The function might
    return data, an end-of-file indication, or an error other than one
    indicating that it is blocked, and in each of these cases the
    descriptor shall be considered ready for reading.)

    A descriptor shall be considered ready for writing when a call to an
    output function with O_NONBLOCK clear would not block, whether or not
    the function would transfer data successfully.

I don't know whether it'd help to point out that the test program works
as expected on other BSDen.  (I'm planning to go try it on a few more
platforms in a bit, but in any case the standards-compliance question
looks pretty open-and-shut to me.)

            regards, tom lane

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Tom Lane

Дата:

19 ноября 2021 г., 14:57:12

Andrey Borodin <x4mmm@yandex-team.ru> writes:
> Are we going to wait for the fix or disable tests partially on conchuela?

Let's wait a bit and see if a fix will be forthcoming promptly.

            regards, tom lane

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Tom Lane

Дата:

19 ноября 2021 г., 16:22:26

Thomas Munro <thomas.munro@gmail.com> writes:
> Oops, that was after it had been cleared already by the OS; duh.  On
> entry to my wrapper, writefds does in fact contain the bit pattern for
> fd 13.  That led me to try a very simply C program which runs to
> completion on Linux and FreeBSD, but hangs forever on Dragonfly.

For completeness, I checked this on macOS, OpenBSD, and NetBSD,
and they all give the expected results.

            regards, tom lane

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

19 ноября 2021 г., 19:35:15

On Sat, Nov 20, 2021 at 3:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrey Borodin <x4mmm@yandex-team.ru> writes:
> > Are we going to wait for the fix or disable tests partially on conchuela?
>
> Let's wait a bit and see if a fix will be forthcoming promptly.

The fix came forth.  I'll wait for the back-patch and then ask Mikael
to upgrade conchuela.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Thomas Munro

Дата:

29 декабря 2021 г., 19:53:13

On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> The fix came forth.  I'll wait for the back-patch and then ask Mikael
> to upgrade conchuela.

Done; thanks Mikael.

Re: conchuela timeouts since 2021-10-09 system upgrade

От

Mikael Kjellström

Дата:

29 декабря 2021 г., 22:58:22

On 2021-12-29 20:53, Thomas Munro wrote:
> On Sat, Nov 20, 2021 at 8:35 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> The fix came forth.  I'll wait for the back-patch and then ask Mikael
>> to upgrade conchuela.
> 
> Done; thanks Mikael.

It has only had 1 pass through the buildfarm so far but all branches passed.

So looking promising.

Let's see how it looks after a few more passes.

/Mikael

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: CREATE INDEX CONCURRENTLY does not index prepared xact's data

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения