Обсуждение: Add GoAway protocol message for graceful but fast server shutdown/switchover

Поиск
Список
Период
Сортировка

Add GoAway protocol message for graceful but fast server shutdown/switchover

От
"Jelte Fennema-Nio"
Дата:
This change introduces a new GoAway backend-to-frontend protocol
message (byte 'g') that the server can send to the client to politely
request that client to disconnect/reconnect when convenient. This message is
advisory only - the connection remains fully functional and clients may
continue executing queries and starting new transactions. "When
convenient" is obviously not very well defined, but the primary target
clients are clients that maintain a connection pool. Such clients should
disconnect/reconnect a connection in the pool when there's no user of
that connection. This is similar to how such clients often currently
remove a connection from the pool after the connection hits a maximum
lifetime of e.g. 1 hour.

This new message is used by Postgres during the already existing "smart"
shutdown procedure (i.e. when postmaster receives SIGTERM). When
Postgres is in "smart" shutdown mode existing clients can continue to
run queries as usual but new connection attempts are rejected. This mode
is primarily useful when triggering a switchover of a read replica. A
load balancer can route new connections only to the new read replica,
while the old load balancer keeps serving the existing connections until
they disconnect. The problem is that this draining of connections could
often take a long time. Even when clients only run very short
queries/transactions because the session can be kept open much longer
(many connection pools use 1 hour max lifetime of a connection by default).
With the introduction of the GoAway message Postgres now sends this
message to all connected clients when it enters smart shutdown mode.
If these clients respond to the message by reconnecting/disconnecting
earlier than their maximum connection lifetime the draining can complete
much quicker. Similar benefits to switchover duration can be achieved
for other applications or proxies implementing the Postgres protocol,
like when switching over a cluster of PgBouncer machines to a newer
version.

Applications/clients that use libpq can periodically check the result of
the new PQgoAwayReceived() function to find out whether they have been
asked to reconnect.

Вложения

Re: Add GoAway protocol message for graceful but fast server shutdown/switchover

От
Kirill Reshke
Дата:
On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me@jeltef.nl> wrote:
>
> This change introduces a new GoAway backend-to-frontend protocol
> message (byte 'g') that the server can send to the client to politely
> request that client to disconnect/reconnect when convenient. This message is
> advisory only - the connection remains fully functional and clients may
> continue executing queries and starting new transactions. "When
> convenient" is obviously not very well defined, but the primary target
> clients are clients that maintain a connection pool. Such clients should
> disconnect/reconnect a connection in the pool when there's no user of
> that connection. This is similar to how such clients often currently
> remove a connection from the pool after the connection hits a maximum
> lifetime of e.g. 1 hour.
>
> This new message is used by Postgres during the already existing "smart"
> shutdown procedure (i.e. when postmaster receives SIGTERM). When
> Postgres is in "smart" shutdown mode existing clients can continue to
> run queries as usual but new connection attempts are rejected. This mode
> is primarily useful when triggering a switchover of a read replica. A
> load balancer can route new connections only to the new read replica,
> while the old load balancer keeps serving the existing connections until
> they disconnect. The problem is that this draining of connections could
> often take a long time. Even when clients only run very short
> queries/transactions because the session can be kept open much longer
> (many connection pools use 1 hour max lifetime of a connection by default).
> With the introduction of the GoAway message Postgres now sends this
> message to all connected clients when it enters smart shutdown mode.
> If these clients respond to the message by reconnecting/disconnecting
> earlier than their maximum connection lifetime the draining can complete
> much quicker. Similar benefits to switchover duration can be achieved
> for other applications or proxies implementing the Postgres protocol,
> like when switching over a cluster of PgBouncer machines to a newer
> version.
>
> Applications/clients that use libpq can periodically check the result of
> the new PQgoAwayReceived() function to find out whether they have been
> asked to reconnect.

Hi!
Im +1 on this idea. This is something I wanted back in 2020, when
implementing the 'online restart' feature for odyssey[0], but never
bothered to create a thread.
Due to its asyn engine complexity, odyssey cannot simply reuse tcp
connections from 'old' binary, so we accept new connections in new
binary and try to drop connections in old binary with some rate.

About patches:

in 0001:

>+
>+ if (strcmp(value, "latest") == 0)
>+ {
>+ *result = PG_PROTOCOL_LATEST;
>+ return true;
>+ }

Not needed? we already have this check at the beginning of
pqParseProtocolVersion

In 0002:

> +       The <literal>GoAway</literal> message is sent by the server during a
> +       smart shutdown to politely request that clients disconnect.

I'm not sure this wording is super-foolproof. First of all, is it
'client', not 'clients'? Looks like we should describe single client
to single server interaction in this doc.
Maybe also change the last sentence to ' ... to instruct clients to
disconnect.' ? Maybe this wording is not great also, but I want to
reflect in doc that disconnection is
strongly advised, yet not obligatory

> +       Applications should check this flag
> +       periodically and disconnect gracefully when possible, such as after
> +       completing the current transaction or unit of work.

What flag? Also, 'Applications should' - no, they shouldn't, is it
just an option? Maybe we should change wording to something like

'Applications can decide that it is recommendatory to close (or maybe
re-open) their connection with the server as soon as they get at least
one 'GoAway' msg.'

Also, can the server send more than one 'GoAway' msg? If yes, should
we document this?


> - * notice.  (An ERROR is very possibly the backend telling us why
> + * notice. (An ERROR is very possibly the backend telling us why

This change is unrelated

Other coding changes looks straightforward and are fine to me.


[0] https://github.com/yandex/odyssey

-- 
Best regards,
Kirill Reshke



Re: Add GoAway protocol message for graceful but fast server shutdown/switchover

От
"Jelte Fennema-Nio"
Дата:
On Fri Oct 24, 2025 at 7:04 AM CEST, Kirill Reshke wrote:
> On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me@jeltef.nl> wrote:
> Im +1 on this idea. This is something I wanted back in 2020, when
> implementing the 'online restart' feature for odyssey[0], but never
> bothered to create a thread.

Yeah, to be clear: A big goal of this is definitely to be used by
poolers/proxies/middleware. Those systems will often be more frequently
restarted than the actual database servers, so being able to do that
quickly without disrupting active connections is much more important
there than with plain PostgreSQL servers.

> About patches:

Thanks for the review. Attached is a new patchset. I think I addressed
all of your comments (I almost fully rewrote the docs). I also fixed
two other issues that I found:
- updating docs for 3.3 in more places
- handling the GoAway message in more code paths on the client side


Вложения