Re: IPC/MultixactCreation on the Standby server
| От | Andrey Borodin |
|---|---|
| Тема | Re: IPC/MultixactCreation on the Standby server |
| Дата | |
| Msg-id | F1EE5EBE-D7E0-41C6-900B-2D650CC6E014@yandex-team.ru обсуждение исходный текст |
| Ответ на | Re: IPC/MultixactCreation on the Standby server (Ivan Bykov <i.bykov@modernsys.ru>) |
| Список | pgsql-hackers |
Hi Ivan! Thanks for the review. > On 24 Oct 2025, at 19:33, Ivan Bykov <I.Bykov@modernsys.ru> wrote: > > I believe that we should provide more comprehensible feedback to developers > by interrupting the deadlock state with a psql timeout. 15 seconds seems safe > enough to distinguish between slow node operation and deadlock issue. Makes sense, but I used $PostgreSQL::Test::Utils::timeout_default seconds. > On 24 Oct 2025, at 22:16, Bykov Ivan <i.bykov@ftdata.ru> wrote: > > In GetNewMultiXactId() this code may lead to error > --- > ExtendMultiXactOffset(MultiXactState->nextMXact + 1); > --- > If MultiXactState->nextMXact = MaxMultiXactId (0xFFFFFFFF) > we will not extend MultiXactOffset as we should It will extend SLRU, this calculations are handled in ExtendMultiXactOffset(). Moreover, we should eliminate "multi != FirstMultiXactId"bailout condition too. I've extended comments and added Assert(). I've added test for this, but it's whacky-hackywith dd, resetwal and such. But it uncovered wrond assertion in this code: /* This is an overflow-only branch */ Assert(next_entryno == 0 || next == FirstMultiXactId); This "next == FirstMultiXactId" was missing. I also included Kirill's suggestions. GPT is also worried by initialization of page 0, but I don't fully understand it's concerns: "current MXID’s page is never extended: GetNewMultiXactId() now calls ExtendMultiXactOffset(MultiXactState->nextMXact + 1)instead of extending the page that will hold result. This skips zeroing the page for the MXID we are about to assign, sothe first allocation on a fresh cluster (or after wrap) tries to write into an uninitialized SLRU page and will fail/crashonce the buffer manager attempts I/O." I think we have now good coverage of what happens on fresh cluster and after a wraparound. "wraparound can’t recreate page 0: ExtendMultiXactOffset() now asserts multi != FirstMultiXactId, yet after wrap the firstID on page 0 is exactly FirstMultiXactId. Because callers no longer pass that value, page 0 is never re-created andwe keep reusing stale offsets." Page 0 is actually created via different path on cluster initialization. Though everything works fine on wraparound. Thanks! Best regards, Andrey Borodin.
Вложения
В списке pgsql-hackers по дате отправления: