Re: md.c vs elog.c vs smgrreleaseall() in barrier
От | Andres Freund |
---|---|
Тема | Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Дата | |
Msg-id | kpopzahilmjtcpeefrsbrtzvwddzbhadsfn3cvdelnpi24gbgq@5ty2xbzoyua3 обсуждение исходный текст |
Ответ на | Re: md.c vs elog.c vs smgrreleaseall() in barrier (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: md.c vs elog.c vs smgrreleaseall() in barrier
Re: md.c vs elog.c vs smgrreleaseall() in barrier |
Список | pgsql-hackers |
Hi, On 2025-03-14 12:57:51 +1300, Thomas Munro wrote: > On Fri, Mar 14, 2025 at 12:23 PM Andres Freund <andres@anarazel.de> wrote: > > > > 3. Considering errfinish()'s stated goal, a sort of backstop to help > > > you cancel looping message-spewing code only, I wonder if we should > > > just restrict the kinds of interrupts it processes, so that it only > > > calls CFI() if we're definitely throwing ERROR or FATAL? > > > > I'm not even sure it'd be safe to ERROR out in all the relevant places... > > > > E.g. > > /* implementation-specific initialization */ > > smgrsw[reln->smgr_which].smgr_open(reln); > > > > /* it is not pinned yet */ > > reln->pincount = 0; > > dlist_push_tail(&unpinned_relns, &reln->node); > > > > doesn't this mean that ->pincount is uninitialized in case smgr_open() errors > > out and that we'd loose track of the reln because it wasn't yet added to > > unpinned_rels? > > Ugh, right. Patch for that attached. > > > > If we go for that, I can see an argument for doing that in smgr.c instead of > > > > md.c, afaict any plausible smgr implementation would run into this, as the > > > > smgrcloseall() is triggered from the smgr level. > > > > > > Seems like maybe not a great idea to assume that future smgrs will be > > > OK with being non-interruptible, if it can be avoided? > > > > I think you'd need a fairly large surgery of smgr.c to make that viable - I > > rather doubt that smgr.c itself is actually safe against interrupts. > > > > I can see some arguable uses of smgr handling interrupts, say an smgr > > implementation based on a network backed store, but you'd need rather massive > > changes to actually be sure that smgr.c is called while accepting interrupts - > > e.g. today sgmrwritev() will largely be called with interrupts held. Plenty > > reads too. I doubt that undoing a few HOLD_INTERRUPTS() is a meaningful part > > of the necessary work. > > Right, exactly the case I was thinking of: if some hypothetical future > network smgr wants to be able to process *some* kinds of things > carefully while waiting for the network. I don't think we want to > constrain ourselves to NFS-style "pretend it's local and make it > non-interruptible" without any escape hatches, but you're quite right > that that's probably a whole research project of its own and we just > haven't refined the interrupt system enough for that yet, so yeah I > see how you arrived at that conclusion and it makes sense. Here's a proposed patch for this. It turns out that the bug might already be reachable, even without defining FDDEBUG. There's a debug ereport() in register_dirty_segment() - but it's hard to reach in practice. I don't really know whether that means we ought to backpatch or not - which makes me want to be conservative and not backpatch. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: