Re: Random pg_upgrade 004_subscription test failure on drongo
От | Heikki Linnakangas |
---|---|
Тема | Re: Random pg_upgrade 004_subscription test failure on drongo |
Дата | |
Msg-id | cd3189f8-aaed-4ef6-a6b6-da72c1251f34@iki.fi обсуждение исходный текст |
Ответ на | Random pg_upgrade 004_subscription test failure on drongo (vignesh C <vignesh21@gmail.com>) |
Ответы |
Re: Random pg_upgrade 004_subscription test failure on drongo
Re: Random pg_upgrade 004_subscription test failure on drongo |
Список | pgsql-hackers |
On 13/03/2025 11:04, vignesh C wrote: > ## Analysis > I think it was caused due to the STATUS_DELETE_PENDING failure, not > related with recent > updates for pg_upgrade. > > The file "base/1/2683" is an index file for > pg_largeobject_loid_pn_index, and the > output meant that file creation failed. Below is a backtrace. > > ``` > pgwin32_open() // <-- this returns -1 > open() > BasicOpenFilePerm() > PathNameOpenFilePerm() > PathNameOpenFile() > mdcreate() > smgrcreate() > RelationCreateStorage() > RelationSetNewRelfilenumber() > ExecuteTruncateGuts() > ExecuteTruncate() > ``` > > But this is strange. Before calling mdcreate(), we surely unlink the > file which have the same name. Below is a trace until unlink. > > ``` > pgunlink() > unlink() > mdunlinkfork() > mdunlink() > smgrdounlinkall() > RelationSetNewRelfilenumber() // common path with above > ExecuteTruncateGuts() > ExecuteTruncate() > ``` > > I found Thomas said that [4] pgunlink sometimes could not remove a > file even if it returns OK, at that time NTSTATUS is > STATUS_DELETE_PENDING. Also, a comment in pgwin32_open_handle() > mentions the same thing: > > ``` > /* > * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet > * gone (Windows NT status code is STATUS_DELETE_PENDING). In that > * case, we'd better ask for the NT status too so we can translate it > * to a more Unix-like error. We hope that nothing clobbers the NT > * status in between the internal NtCreateFile() call and CreateFile() > * returning. > * > ``` > > The definition of STATUS_DELETE_PENDING can be seen in [5]. Based on > that, indeed, open() would be able to fail with STATUS_DELETE_PENDING > if the deletion is pending but it is trying to open. > --------------------------------------------- > > This was fixed by the following change in the target upgrade nodes: > bgwriter_lru_maxpages = 0 > checkpoint_timeout = 1h > > Attached is a patch in similar lines for 004_subscription. Hmm, this problem isn't limited to this one pg_upgrade test, right? It could happen with any pg_upgrade invocation. And perhaps in a running server too, if a relfilenumber is reused quickly. In dropdb() and DropTableSpace() we do this: WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); Should we do the same here? Not sure where exactly to put that; perhaps in mdcreate(), if the creation fails with STATUS_DELETE_PENDING. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-hackers по дате отправления: