Re: recent failures on lorikeet

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: recent failures on lorikeet
Дата
Msg-id ce94774f-0583-7be2-8ec3-2bb161b959fd@dunslane.net
обсуждение исходный текст
Ответ на Re: recent failures on lorikeet  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: recent failures on lorikeet  (Andrew Dunstan <andrew@dunslane.net>)
Re: recent failures on lorikeet  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 6/14/21 9:39 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I've been looking at the recent spate of intermittent failures on my
>> Cygwin animal lorikeet. Most of them look something like this, where
>> there's 'VACUUM FULL pg_class' and an almost simultaneous "CREATE TABLE'
>> which fails.
> Do you have any idea what "exit code 127" signifies on that platform?
> (BTW, not all of them look like that; many are reported as plain
> segfaults.)  I hadn't spotted the association with a concurrent "VACUUM
> FULL pg_class" before, that does seem interesting.
>
>> Getting stack traces in this platform can be very difficult. I'm going
>> to try forcing complete serialization of the regression tests
>> (MAX_CONNECTIONS=1) to see if the problem goes away. Any other
>> suggestions might be useful. Note that we're not getting the same issue
>> on REL_13_STABLE, where the same group pf tests run together (inherit
>> typed_table, vacuum)
> If it does go away, that'd be interesting, but I don't see how it gets
> us any closer to a fix.  Seems like a stack trace is a necessity to
> narrow it down.
>
>             


Some have given stack traces and some not, not sure why. The one from
June 13 has this:


---- backtrace ----
??
??:0
WaitOnLock
src/backend/storage/lmgr/lock.c:1831
LockAcquireExtended
src/backend/storage/lmgr/lock.c:1119
LockRelationOid
src/backend/storage/lmgr/lmgr.c:135
relation_open
src/backend/access/common/relation.c:59
table_open
src/backend/access/table/table.c:43
ScanPgRelation
src/backend/utils/cache/relcache.c:322
RelationBuildDesc
src/backend/utils/cache/relcache.c:1039
RelationIdGetRelation
src/backend/utils/cache/relcache.c:2045
relation_open
src/backend/access/common/relation.c:59
table_open
src/backend/access/table/table.c:43
ExecInitPartitionInfo
src/backend/executor/execPartition.c:510
ExecPrepareTupleRouting
src/backend/executor/nodeModifyTable.c:2311
ExecModifyTable
src/backend/executor/nodeModifyTable.c:2559
ExecutePlan
src/backend/executor/execMain.c:1557



The line in lmgr.c is where the process title gets changed to "waiting".
I recently stopped setting process title on this animal on REL_13_STABLE
and its similar errors have largely gone away. I can do the same on
HEAD. But it does make me wonder what the heck has changed to make this
code fragile.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Use extended statistics to estimate (Var op Var) clauses
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: PG 14 release notes, first draft