Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
| От | Andres Freund | 
|---|---|
| Тема | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible | 
| Дата | |
| Msg-id | rkej7nbahcoeaonn3dxdbk2wzsi2gi3l75mse7txzigleggk3c@egrki7isobok обсуждение исходный текст | 
| Ответ на | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible (Jacob Champion <jacob.champion@enterprisedb.com>) | 
| Ответы | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible | 
| Список | pgsql-hackers | 
On 2025-03-13 10:29:49 -0700, Jacob Champion wrote: > On Thu, Mar 13, 2025 at 9:56 AM Andres Freund <andres@anarazel.de> wrote: > > I am wondering if PAM is so fundamentally incompatible with handling > > interrupts / a non-blocking interface that we have little choice but to > > eventually remove it... > > Given the choice between a usually-working PAM module with known > architectural flaws, and not having PAM at all, I think many users > would rather continue using what's working for them. authentication_timeout currently doesn't reliably work while in some auth methods, nor does pg_terminate_backend() etc. That's IMO is rather bad from a DOSability perspective. The fact that some auth methods are broken like that has had a sizable negative impact on postgres for a long time. Not just when those methods are used, but also architecturally. It's e.g. one of the main reasons we need the ugly escalating logic in postmaster shutdowns to send SIGQUITs and then SIGKILL after a while, because we don't have a reliable way of terminating backends normally. This used to be way worse because historically postgres considered it sane (why, I have no idea) to ereport() in timeout functions, which then occasionally lead to backends stuck in malloc locks etc. > > FWIW, I continue to think that it's better to invest in making more auth > > methods non-blocking, rather than adding wait events for code that could maybe > > sometimes wait on different things internally. > > I think we disagree on the either/or nature of that. If I can get > proof that a certain thing is causing bugs in the wild, then I have > ammunition to fix that thing. FWIW, I've have repeatedly seen production issues due to authentication timeout not working for some auth methods. It's not hard to see why - e.g. a non-resonsive radius server just leaves the backend hanging in select(). Even though it would get interrupted by signals, we'll just retry without even checking interrupts / timeouts :(. > Right now there is no visibility, and my interest in rewriting old > authentication methods without bug reports to motivate that work is pretty > low. I'm not willing to sign up for that at the moment. Fair enough. > (But I do really appreciate the review. I'm just feeling crispy about > the overall result...) Also fair enough :) Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: