Обсуждение: Add GoAway protocol message for graceful but fast server shutdown/switchover
Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
This change introduces a new GoAway backend-to-frontend protocol message (byte 'g') that the server can send to the client to politely request that client to disconnect/reconnect when convenient. This message is advisory only - the connection remains fully functional and clients may continue executing queries and starting new transactions. "When convenient" is obviously not very well defined, but the primary target clients are clients that maintain a connection pool. Such clients should disconnect/reconnect a connection in the pool when there's no user of that connection. This is similar to how such clients often currently remove a connection from the pool after the connection hits a maximum lifetime of e.g. 1 hour. This new message is used by Postgres during the already existing "smart" shutdown procedure (i.e. when postmaster receives SIGTERM). When Postgres is in "smart" shutdown mode existing clients can continue to run queries as usual but new connection attempts are rejected. This mode is primarily useful when triggering a switchover of a read replica. A load balancer can route new connections only to the new read replica, while the old load balancer keeps serving the existing connections until they disconnect. The problem is that this draining of connections could often take a long time. Even when clients only run very short queries/transactions because the session can be kept open much longer (many connection pools use 1 hour max lifetime of a connection by default). With the introduction of the GoAway message Postgres now sends this message to all connected clients when it enters smart shutdown mode. If these clients respond to the message by reconnecting/disconnecting earlier than their maximum connection lifetime the draining can complete much quicker. Similar benefits to switchover duration can be achieved for other applications or proxies implementing the Postgres protocol, like when switching over a cluster of PgBouncer machines to a newer version. Applications/clients that use libpq can periodically check the result of the new PQgoAwayReceived() function to find out whether they have been asked to reconnect.
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Kirill Reshke
Дата:
On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me@jeltef.nl> wrote:
>
> This change introduces a new GoAway backend-to-frontend protocol
> message (byte 'g') that the server can send to the client to politely
> request that client to disconnect/reconnect when convenient. This message is
> advisory only - the connection remains fully functional and clients may
> continue executing queries and starting new transactions. "When
> convenient" is obviously not very well defined, but the primary target
> clients are clients that maintain a connection pool. Such clients should
> disconnect/reconnect a connection in the pool when there's no user of
> that connection. This is similar to how such clients often currently
> remove a connection from the pool after the connection hits a maximum
> lifetime of e.g. 1 hour.
>
> This new message is used by Postgres during the already existing "smart"
> shutdown procedure (i.e. when postmaster receives SIGTERM). When
> Postgres is in "smart" shutdown mode existing clients can continue to
> run queries as usual but new connection attempts are rejected. This mode
> is primarily useful when triggering a switchover of a read replica. A
> load balancer can route new connections only to the new read replica,
> while the old load balancer keeps serving the existing connections until
> they disconnect. The problem is that this draining of connections could
> often take a long time. Even when clients only run very short
> queries/transactions because the session can be kept open much longer
> (many connection pools use 1 hour max lifetime of a connection by default).
> With the introduction of the GoAway message Postgres now sends this
> message to all connected clients when it enters smart shutdown mode.
> If these clients respond to the message by reconnecting/disconnecting
> earlier than their maximum connection lifetime the draining can complete
> much quicker. Similar benefits to switchover duration can be achieved
> for other applications or proxies implementing the Postgres protocol,
> like when switching over a cluster of PgBouncer machines to a newer
> version.
>
> Applications/clients that use libpq can periodically check the result of
> the new PQgoAwayReceived() function to find out whether they have been
> asked to reconnect.
Hi!
Im +1 on this idea. This is something I wanted back in 2020, when
implementing the 'online restart' feature for odyssey[0], but never
bothered to create a thread.
Due to its asyn engine complexity, odyssey cannot simply reuse tcp
connections from 'old' binary, so we accept new connections in new
binary and try to drop connections in old binary with some rate.
About patches:
in 0001:
>+
>+ if (strcmp(value, "latest") == 0)
>+ {
>+ *result = PG_PROTOCOL_LATEST;
>+ return true;
>+ }
Not needed? we already have this check at the beginning of
pqParseProtocolVersion
In 0002:
> + The <literal>GoAway</literal> message is sent by the server during a
> + smart shutdown to politely request that clients disconnect.
I'm not sure this wording is super-foolproof. First of all, is it
'client', not 'clients'? Looks like we should describe single client
to single server interaction in this doc.
Maybe also change the last sentence to ' ... to instruct clients to
disconnect.' ? Maybe this wording is not great also, but I want to
reflect in doc that disconnection is
strongly advised, yet not obligatory
> + Applications should check this flag
> + periodically and disconnect gracefully when possible, such as after
> + completing the current transaction or unit of work.
What flag? Also, 'Applications should' - no, they shouldn't, is it
just an option? Maybe we should change wording to something like
'Applications can decide that it is recommendatory to close (or maybe
re-open) their connection with the server as soon as they get at least
one 'GoAway' msg.'
Also, can the server send more than one 'GoAway' msg? If yes, should
we document this?
> - * notice. (An ERROR is very possibly the backend telling us why
> + * notice. (An ERROR is very possibly the backend telling us why
This change is unrelated
Other coding changes looks straightforward and are fine to me.
[0] https://github.com/yandex/odyssey
--
Best regards,
Kirill Reshke
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Fri Oct 24, 2025 at 7:04 AM CEST, Kirill Reshke wrote: > On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me@jeltef.nl> wrote: > Im +1 on this idea. This is something I wanted back in 2020, when > implementing the 'online restart' feature for odyssey[0], but never > bothered to create a thread. Yeah, to be clear: A big goal of this is definitely to be used by poolers/proxies/middleware. Those systems will often be more frequently restarted than the actual database servers, so being able to do that quickly without disrupting active connections is much more important there than with plain PostgreSQL servers. > About patches: Thanks for the review. Attached is a new patchset. I think I addressed all of your comments (I almost fully rewrote the docs). I also fixed two other issues that I found: - updating docs for 3.3 in more places - handling the GoAway message in more code paths on the client side
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Ajit Awekar
Дата:
Hi Jelte,
Thank you for proposing the GoAway protocol message.
I've developed a patch that serves as a strong, immediate use case for its inclusion. https://www.postgresql.org/message-id/flat/CAER375OvH3_ONmc-SgUFpA6gv_d6eNj2KdZktzo-f_uqNwwWNw%40mail.gmail.com
Thanks & Best Regards,
Ajit
On Fri, 24 Oct 2025 at 17:24, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
On Fri Oct 24, 2025 at 7:04 AM CEST, Kirill Reshke wrote:
> On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me@jeltef.nl> wrote:
> Im +1 on this idea. This is something I wanted back in 2020, when
> implementing the 'online restart' feature for odyssey[0], but never
> bothered to create a thread.
Yeah, to be clear: A big goal of this is definitely to be used by
poolers/proxies/middleware. Those systems will often be more frequently
restarted than the actual database servers, so being able to do that
quickly without disrupting active connections is much more important
there than with plain PostgreSQL servers.
> About patches:
Thanks for the review. Attached is a new patchset. I think I addressed
all of your comments (I almost fully rewrote the docs). I also fixed
two other issues that I found:
- updating docs for 3.3 in more places
- handling the GoAway message in more code paths on the client side
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Thu Oct 23, 2025 at 3:04 PM CEST, Jelte Fennema-Nio wrote: > This change introduces a new GoAway backend-to-frontend protocol > message (byte 'g') that the server can send to the client to politely > request that client to disconnect/reconnect when convenient. After pushback on another threadabout introducing additional minor protocol versions[1], I've decided to change this patch to use a protocol extension instead of a minor version bump. I personally don't think this patch is any better now, but that's fine. If this means it has a chance of going into PG19, that's totally worth it to me. (also I'd like to stop spending time on discussions where clearly neither side will agree with eachother). The automated test requires the not yet committed pytest changes[2]. I don't think the automated test is required for a merge, so I don't think this is blocked on pytest support getting in. It's here mainly as an example and as a regression test during development, to know I did not break the goaway functionality while changing the implementation. [1]: https://www.postgresql.org/message-id/flat/CADK3HHKe1PA1U6aB5-7tWBQ0yZGgNvY7H=ECDD9955Pas_zx_Q@mail.gmail.com [2]: https://commitfest.postgresql.org/patch/6045/
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Thu Jan 8, 2026 at 10:15 AM CET, Jelte Fennema-Nio wrote: > After pushback on another threadabout introducing additional minor > protocol versions[1], I've decided to change this patch to use a > protocol extension instead of a minor version bump. Turns out there were still some leftovers from using a version bump in the libpq_pipeline tests. Removed those too now.
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Zsolt Parragi
Дата:
Hello!
I only have a few stylistic comments and one question about the wal
sender part - but maybe I don't understand something there.
+ /*
+ * Only signal regular backends and walsenders. Skip
+ * auxiliary processes and dead-end backends.
+ */
+ if (bp->bkend_type == B_BACKEND ||
+ bp->bkend_type == B_WAL_SENDER)
+ {
+ SendProcSignal(bp->pid, PROCSIG_SMART_SHUTDOWN,
+ INVALID_PROC_NUMBER);
I don't see related changes in walsenders, am I missing something,
shouldn't this have some handling in WalSndLoop? Also, shouldn't
walsenders exit later, after normal backends have already stopped? So
I'm not sure how this is supposed to improve them.
+ /*
+ * Parse any available data to see if a GoAway message has arrived.
+ */
+ pqParseInput3(conn);
This is just stylistic, but other places seem to call parseInput instead.
There are also two typos/mistakes in the documentation:
* "rquest"
* "server requests the server"
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Tue, 27 Jan 2026 at 13:57, Zsolt Parragi <zsolt.parragi@percona.com> wrote: > I only have a few stylistic comments and one question about the wal > sender part - but maybe I don't understand something there. Thanks for the review. I updated the stylistic things. And I agree with you on the walsender note, so I stopped sending the signal to those backends. I also rebased to use Jacob his new table and slightly tweaked the docs a bit in a few places. Attached is v5
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Sun Feb 8, 2026 at 11:03 PM CET, Jelte Fennema-Nio wrote: > Attached is v5 Attached is v6 which fixes a rebase conflict.
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Zsolt Parragi
Дата:
+ /* + * Only signal regular backends, since those need to notify + * their clients using a GoAway message. + */ + if (bp->bkend_type == B_BACKEND) This condition is slightly different to how SignalChildren works, is that intentional? I don't think it causes any practical difference, and I don't see an easy way to reuse SignalChildren for this, but maybe it could still follow the same pattern. Otherwise I don't see any other issues, and this also doesn't seem to be an important comment. Since the pytest framework seems unlikely to be included in PG19, have you considered a different test implementation, to have at least some minimal coverage?
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Wed Feb 25, 2026 at 4:08 PM CET, Zsolt Parragi wrote: > + /* > + * Only signal regular backends, since those need to notify > + * their clients using a GoAway message. > + */ > + if (bp->bkend_type == B_BACKEND) > > This condition is slightly different to how SignalChildren works, is > that intentional? I don't think it causes any practical difference, > and I don't see an easy way to reuse SignalChildren for this, but > maybe it could still follow the same pattern. Changed it to be consistent now, and resolved a rebase conflict. > Since the pytest framework seems unlikely to be included in PG19, have > you considered a different test implementation, to have at least some > minimal coverage? I now included some basic support for GoAway in psql and added a perl test based on that.
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Tomas Vondra
Дата:
Hi, I've looked at this patch today, to check if there's something we could get done in PG19. I find it a bit we didn't get much feedback from people working on the client/downstream stuff - clients, connection poolers/middleware, that sort of stuff. OK, we did hear from Kiril, and he seems to like it. I'm not very involved in the protocol stuff, so I'm sure there's a lot details I'm missing. It'd be very helpful if there was some sort of PoC support on the pooler/client side, so that I can experiment with it and see how helpful the new protocol message is. But I realize that's a bit too much to ask for. A couple thoughts about this (some of this may be missing what the patch aims to do). * Does it make sense to tie this to smart shutdowns? I realize it's just an example, and it probably makes sense to send the GoAway message before a shutdown. But isn't this a bit similar to cancel/terminate of a backend? Why not to have a pg_goaway_backend() function, that'd send the message to a single backend? It might be useful for load-balancing, if we could pick a "heavy" backend and ask it to reconnect / move to a different replica. (Could that be handled by a middleware?) * In fact, does it improve the smart shutdown case in practice? Let's say we have a single instance, and we're restarting it. It'll send GoAway to all the clients, the good clients will try to reconnect. But if there's even a single "bad" client ignoring the GoAway, all the well-behaved clients will get stuck. Ofc, that can happen without the GoAway message too - a client may disconnect because of timeout etc. But it makes it more likely, and it'll affect the well-behaved clients. * Would it make sense to have some payload in the GoAway message? I'm thinking about (a) some deadline by which the client should disconnect, e.g. time of planned restart / shutdown, (b) priority, expressing how much the client should try to disconnect (and maybe take more drastic actions). Also, two minor comments: * The sgml docs say the function is defined as int PQgoAwayReceived(const PGconn *conn); but in the .h file it's defined without the "const". * The new entry in protocol.sgml (in the "Supported Protocol Extensions" table) says <entry><literal>goaway</literal></entry> but the following table includes "_pq_" in the entry name. Should the new entry do the same? regards -- Tomas Vondra
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Jacob Champion
Дата:
On Fri, Mar 20, 2026 at 12:20 PM Tomas Vondra <tomas@vondra.me> wrote: > * In fact, does it improve the smart shutdown case in practice? Let's > say we have a single instance, and we're restarting it. It'll send > GoAway to all the clients, the good clients will try to reconnect. But > if there's even a single "bad" client ignoring the GoAway, all the > well-behaved clients will get stuck. Ofc, that can happen without the > GoAway message too - a client may disconnect because of timeout etc. But > it makes it more likely, and it'll affect the well-behaved clients. > > * Would it make sense to have some payload in the GoAway message? I'm > thinking about (a) some deadline by which the client should disconnect, > e.g. time of planned restart / shutdown, (b) priority, expressing how > much the client should try to disconnect (and maybe take more drastic > actions). I'd been wondering about these as well, but in the context of the tangential thread [1]. HTTP has much stronger semantics for its GOAWAY frames, for instance. --Jacob [1] https://postgr.es/m/CAOYmi%2BmSn8xQ7ExqY07V6G2oFXN2nY%2B7f4yf_RV2%3D7xNCKwW-Q%40mail.gmail.com
On Fri, Mar 20, 2026 at 2:20 PM Tomas Vondra <tomas@vondra.me> wrote:
* Does it make sense to tie this to smart shutdowns? I realize it's just
an example, and it probably makes sense to send the GoAway message
before a shutdown. But isn't this a bit similar to cancel/terminate of a
backend? Why not to have a pg_goaway_backend() function, that'd send the
message to a single backend? It might be useful for load-balancing, if
we could pick a "heavy" backend and ask it to reconnect / move to a
different replica. (Could that be handled by a middleware?)
+1. Another scenario that comes to mind is asking for a reconnect based on backend memory consumption, since there's a bunch of internal structures (relcache, etc) that can grow in an unbounded fashion.
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Fri, 20 Mar 2026 at 20:20, Tomas Vondra <tomas@vondra.me> wrote: > It'd be very helpful if there was some sort of PoC > support on the pooler/client side, so that I can experiment with it and > see how helpful the new protocol message is. But I realize that's a bit > too much to ask for. I'll see if I can whip something up, it shouldn't be too hard. > Why not to have a pg_goaway_backend() function, that'd send the > message to a single backend? I like this idea a lot. So I added it in the attached v8 patch. This also allowed we me to add low level tests using the libpq_pipeline testsuite. > * In fact, does it improve the smart shutdown case in practice? Let's > say we have a single instance, and we're restarting it. It'll send > GoAway to all the clients, the good clients will try to reconnect. But > if there's even a single "bad" client ignoring the GoAway, all the > well-behaved clients will get stuck. Ofc, that can happen without the > GoAway message too - a client may disconnect because of timeout etc. But > it makes it more likely, and it'll affect the well-behaved clients. For primary server restarts, I don't think anyone should be using smart shutdown right now either. Any new connections to the database will be failing for an indeterminate amount of time. I agree that sending GoAway might worsen the problem in some cases, but it's already terrible to start with. Fast shutdown is the only sensible restart mode for a primary server. This seems to be generally accepted knowledge, given that we use SIGINT (fast shutdown) in our systemd example[1]. Sending a GoAway on smart shutdown makes that shutdown mode very useful for read replicas during a planned switch-over to another replica. Now clients can finish their work and quickly reconnect to the new read replica, minimizing switchover time while preventing errors. Even when restarting primary servers, triggering a smart shutdown has a benefits, as long as it's followed by a fast shutdown after a short delay (e.g., 1 second). This causes slightly longer downtime (the additional delay), but it allows most clients to disconnect on their own terms instead of in the middle of a query. Connection errors can often be retried transparently more easily than errors in the middle of a query. In effect, for many applications, this could mean a reduction in errors and only an increase in latency during a restart. > * Would it make sense to have some payload in the GoAway message? I'm > thinking about (a) some deadline by which the client should disconnect, > e.g. time of planned restart / shutdown, (b) priority, expressing how > much the client should try to disconnect (and maybe take more drastic > actions). I thought some more about this, but ultimately, the payloads you suggest only seem useful if a client has something inbetween "disconnect hard now" and "disconnect when the connection is unused". I cannot think of any such cases. i.e. what other "drastic actions" could a client take instead of simply closing the connection. If that's the only possibility, why not simply have the server close the connection in that case. Overall, I agree that having no payload in this new message feels a bit weird. But ultimately, clients don't need any payload to do something useful. > Also, two minor comments: Fixed.
Вложения
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
"Jelte Fennema-Nio"
Дата:
On Fri, 20 Mar 2026 at 21:11, Jacob Champion <jacob.champion@enterprisedb.com> wrote: > I'd been wondering about these as well, but in the context of the > tangential thread [1]. HTTP has much stronger semantics for its GOAWAY > frames, for instance. I reread the HTTP/3 GOAWAY spec[1], but I think our protocol is too different from HTTP/3 to take any lessons from it (at the moment at least). HTTP/3 "streams" are independent, we have no such concept. Our whole session is a single stream, due to all of our session state. So the semantics that on a single connection a client cannot open newer streams does not really mean any useful for us, i.e. there's no way to open a new stream. Even the "which messages have definitely not been processed feature" can already be inferred from the server right now, by tracking what responses have been received before the server closes the connection. So I cannot think of any useful payload to add to the GoAway message. [1]: https://www.rfc-editor.org/rfc/rfc9114.html#connection-shutdown
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Zsolt Parragi
Дата:
+ if (!goaway_reported && PQgoAwayReceived(pset.db))
+ {
+ pg_log_info("Server sent GoAway, requesting disconnect when convenient.");
+ goaway_reported = true;
+ }
Shouldn't this variable be reset in pqDropServerData?
+ It is possible for NoticeResponse, ParameterStatus and GoAway
messages to be
interspersed between CopyData messages; frontends must handle these cases,
The patch currently doesn't actually do this, did you add this as
future proofing?
> I thought some more about this, but ultimately, the payloads you suggest
> only seem useful if a client has something inbetween "disconnect hard
> now" and "disconnect when the connection is unused"
Couldn't a client optimize the reconnect time if it knows about the
deadline? If it knows that it still has 10 minutes before the server
kicks it out, it might choose to finish a 3-4 minute task, reconnect,
and then continue.
Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
От
Tatsuo Ishii
Дата:
Reading the proposal, I have some questions. Sorry if they have been already discussed. 1. If clients do not disconnect a session even if they have received the GoAway message, PostgreSQL server will give up the shutdown sequence. In this case, shouldn't the PostgreSQL server send a message indicating "I have given up the smart shutdown request"? Otherwise, the fact that GoAway has been received will remain in the client, and if the client does not check the receiving timely, the client may exit the session unnecessarily. 2. Can we use a NOTICE message instead of the new protocol GoAway for the purpose? Frontends are expected to receive and handle NOTICE messages while processing frontend/backend protocol. So I think it is fair to expect clients to find the NOTICE message and behave as we discussed. This way, we don't need the GoAway message. Regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp