Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
| От | Jelte Fennema-Nio |
|---|---|
| Тема | Re: Add GoAway protocol message for graceful but fast server shutdown/switchover |
| Дата | |
| Msg-id | DHB2ZT3ZN8L5.21CRG9GA9317G@jeltef.nl обсуждение |
| Ответ на | Re: Add GoAway protocol message for graceful but fast server shutdown/switchover (Tomas Vondra <tomas@vondra.me>) |
| Список | pgsql-hackers |
On Fri, 20 Mar 2026 at 20:20, Tomas Vondra <tomas@vondra.me> wrote: > It'd be very helpful if there was some sort of PoC > support on the pooler/client side, so that I can experiment with it and > see how helpful the new protocol message is. But I realize that's a bit > too much to ask for. I'll see if I can whip something up, it shouldn't be too hard. > Why not to have a pg_goaway_backend() function, that'd send the > message to a single backend? I like this idea a lot. So I added it in the attached v8 patch. This also allowed we me to add low level tests using the libpq_pipeline testsuite. > * In fact, does it improve the smart shutdown case in practice? Let's > say we have a single instance, and we're restarting it. It'll send > GoAway to all the clients, the good clients will try to reconnect. But > if there's even a single "bad" client ignoring the GoAway, all the > well-behaved clients will get stuck. Ofc, that can happen without the > GoAway message too - a client may disconnect because of timeout etc. But > it makes it more likely, and it'll affect the well-behaved clients. For primary server restarts, I don't think anyone should be using smart shutdown right now either. Any new connections to the database will be failing for an indeterminate amount of time. I agree that sending GoAway might worsen the problem in some cases, but it's already terrible to start with. Fast shutdown is the only sensible restart mode for a primary server. This seems to be generally accepted knowledge, given that we use SIGINT (fast shutdown) in our systemd example[1]. Sending a GoAway on smart shutdown makes that shutdown mode very useful for read replicas during a planned switch-over to another replica. Now clients can finish their work and quickly reconnect to the new read replica, minimizing switchover time while preventing errors. Even when restarting primary servers, triggering a smart shutdown has a benefits, as long as it's followed by a fast shutdown after a short delay (e.g., 1 second). This causes slightly longer downtime (the additional delay), but it allows most clients to disconnect on their own terms instead of in the middle of a query. Connection errors can often be retried transparently more easily than errors in the middle of a query. In effect, for many applications, this could mean a reduction in errors and only an increase in latency during a restart. > * Would it make sense to have some payload in the GoAway message? I'm > thinking about (a) some deadline by which the client should disconnect, > e.g. time of planned restart / shutdown, (b) priority, expressing how > much the client should try to disconnect (and maybe take more drastic > actions). I thought some more about this, but ultimately, the payloads you suggest only seem useful if a client has something inbetween "disconnect hard now" and "disconnect when the connection is unused". I cannot think of any such cases. i.e. what other "drastic actions" could a client take instead of simply closing the connection. If that's the only possibility, why not simply have the server close the connection in that case. Overall, I agree that having no payload in this new message feels a bit weird. But ultimately, clients don't need any payload to do something useful. > Also, two minor comments: Fixed.
Вложения
В списке pgsql-hackers по дате отправления: