Обсуждение: client_connection_check_interval default value

Поиск

Список

Период

Сортировка

client_connection_check_interval default value

От

Jeremy Schneider

Дата:

05 февраля, 08:30:32

What would people here think about changing the default value of
client_connection_check_interval to 2000 ms? Right now this is disabled
by default.

The background is that I recently saw an incident where a blocking-lock
brownout escalated from a row-level problem to a complete system
outage, due to a combination of factors including a bug in golang's pgx
postgres client (PR 2481 has now been merged w a fix) and a pgbouncer
setup that was missing peers configuration. As a result, cancel
messages were getting dropped while postgres connections were waiting
on a blocked lock, golang aggresively timed out on context deadlines
and retried, and once the database reached max_connections the whole
system ground to a halt.

At the time I thought it was weird that postgres wasn't checking for
dead connections while those conns were waiting for locks; I spent a
bunch of time investigating this and reproduced it and wrote up what I
was able to figure out.

Then, yesterday, I saw a LinkedIn post from Marat at Data Egret who
mentioned that client_connection_check_interval exists.

Plugged this into my repro and confirmed it can prevent postgres from
escalating the blocking-lock brownout into a complete outage due to
connection exhaustion.

While a fix has been merged in pgx for the most direct root cause of
the incident I saw, this setting just seems like a good behavior to
make Postgres more robust in general. 2000 ms seemed like a fairly
safe/conservative starting point for discussion. Thoughts?

-Jeremy


PS. Some more details and graphs are at
https://ardentperf.com/2026/02/04/postgres-client_connection_check_interval/


-- 
To know the thoughts and deeds that have marked man's progress is to
feel the great heart throbs of humanity through the centuries; and if
one does not feel in these pulsations a heavenward striving, one must
indeed be deaf to the harmonies of life.

Helen Keller, The Story Of My Life, 1902, 1903, 1905, introduction by
Ralph Barton Perry (Garden City, NY: Doubleday & Company, 1954), p90.

Вложения

0001-Change-default-client_connection_check_interval-to-2.patch

Re: client_connection_check_interval default value

От

Jeremy Schneider

Дата:

05 февраля, 16:36:21

On Wed, 4 Feb 2026 21:30:32 -0800
Jeremy Schneider <schneider@ardentperf.com> wrote:

> What would people here think about changing the default value of
> client_connection_check_interval to 2000 ms? Right now this is
> disabled by default.

Forgot the doc update in the attached patch. Updated

-Jeremy

Вложения

0001-Change-default-client_connection_check_interval-to-2.patch

Re: client_connection_check_interval default value

От

Greg Sabino Mullane

Дата:

05 февраля, 18:00:29

I'm a weak -1 on this. Certainly not 2s! That's a lot of context switching for a busy system for no real reason. Also see this past discussion:

https://www.postgresql.org/message-id/CTEB8LNLOHKR.3I6NK8QVBAGSQ@gonk

Cheers,

Greg

Re: client_connection_check_interval default value

От

Jacob Champion

Дата:

05 февраля, 20:26:34

On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
> While a fix has been merged in pgx for the most direct root cause of
> the incident I saw, this setting just seems like a good behavior to
> make Postgres more robust in general.

At the risk of making perfect the enemy of better, the protocol-level
heartbeat mentioned in the original thread [1] would cover more use
cases, which might give it a better chance of eventually becoming
default behavior. It might also be a lot of work, though.

--Jacob

[1] https://postgr.es/m/CA%2BhUKGLyj5Aqt6ojYfSc%2BqSeB1x%3D3RbU61hnus5sL0BKqEBsLw%40mail.gmail.com

Re: client_connection_check_interval default value

От

Jeremy Schneider

Дата:

06 февраля, 02:04:52

One interesting thing to me - it seems like all of the past mail
threads were focused on a situation different from mine. Lots of
discussion about freeing resources like CPU.

In the outage I saw, the system was idle and we completely ran out of
max_connections because all sessions were waiting on a row lock.

Importantly, the app was closing these conns but we had sockets stacking
up on the server in CLOSE-WAIT state - and postgres simply never
cleaned them up until we had an outage. The processes were completely
idle waiting for a row lock that was not going to be released.

Impact could have been isolated to sessions hitting that row (with this
GUC), but it escalated to a system outage. It's pretty simple to
reproduce this:
https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion

On Thu, 5 Feb 2026 09:26:34 -0800
Jacob Champion <jacob.champion@enterprisedb.com> wrote:

> On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider
> <schneider@ardentperf.com> wrote:
> > While a fix has been merged in pgx for the most direct root cause of
> > the incident I saw, this setting just seems like a good behavior to
> > make Postgres more robust in general.
>
> At the risk of making perfect the enemy of better, the protocol-level
> heartbeat mentioned in the original thread [1] would cover more use
> cases, which might give it a better chance of eventually becoming
> default behavior. It might also be a lot of work, though.

It seems like a fair bit of discussion is around OS coverage - even
Thomas' message there references keepalive working as expected on
Linux. Tom objects in 2023 that "the default behavior would then be
platform-dependent and that's a documentation problem we could do
without."

But it's been five years - has there been further work on implementing
a postgres-level heartbeat? And I see other places in the docs where we
note platform differences, is it really such a big problem to change
the default here?

On Thu, 5 Feb 2026 10:00:29 -0500
Greg Sabino Mullane <htamfids@gmail.com> wrote:

> I'm a weak -1 on this. Certainly not 2s! That's a lot of context
> switching for a busy system for no real reason. Also see this past
> discussion:

In the other thread I see larger perf concerns with some early
implementations before they refactored the patch? Konstantin's message
on 2019-08-02 said he didn't see much difference, and the value of the
timeout didn't seem to matter, and if anything the marginal effect was
simply from the presence of any timer (same effect as setting
statement_timeout) - and later on the thread it seems like Thomas also
saw minimal performance concern here.

I did see a real system outage that could have been prevented by an
appropriate default value here, since I didn't yet know to change it.

-Jeremy

Вложения

client_connection_check.png

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

06 февраля, 02:42:42

On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:
>
> One interesting thing to me - it seems like all of the past mail
> threads were focused on a situation different from mine. Lots of
> discussion about freeing resources like CPU.
>
> In the outage I saw, the system was idle and we completely ran out of
> max_connections because all sessions were waiting on a row lock.
>
> Importantly, the app was closing these conns but we had sockets stacking
> up on the server in CLOSE-WAIT state - and postgres simply never
> cleaned them up until we had an outage. The processes were completely
> idle waiting for a row lock that was not going to be released.
>
> Impact could have been isolated to sessions hitting that row (with this
> GUC), but it escalated to a system outage. It's pretty simple to
> reproduce this:
> https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion
>
>
> On Thu, 5 Feb 2026 09:26:34 -0800
> Jacob Champion <jacob.champion@enterprisedb.com> wrote:
>
> > On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider
> > <schneider@ardentperf.com> wrote:
> > > While a fix has been merged in pgx for the most direct root cause of
> > > the incident I saw, this setting just seems like a good behavior to
> > > make Postgres more robust in general.
> >
> > At the risk of making perfect the enemy of better, the protocol-level
> > heartbeat mentioned in the original thread [1] would cover more use
> > cases, which might give it a better chance of eventually becoming
> > default behavior. It might also be a lot of work, though.
>
> It seems like a fair bit of discussion is around OS coverage - even
> Thomas' message there references keepalive working as expected on
> Linux. Tom objects in 2023 that "the default behavior would then be
> platform-dependent and that's a documentation problem we could do
> without."
>
> But it's been five years - has there been further work on implementing
> a postgres-level heartbeat? And I see other places in the docs where we
> note platform differences, is it really such a big problem to change
> the default here?
>
>
> On Thu, 5 Feb 2026 10:00:29 -0500
> Greg Sabino Mullane <htamfids@gmail.com> wrote:
>
> > I'm a weak -1 on this. Certainly not 2s! That's a lot of context
> > switching for a busy system for no real reason. Also see this past
> > discussion:
>
> In the other thread I see larger perf concerns with some early
> implementations before they refactored the patch? Konstantin's message
> on 2019-08-02 said he didn't see much difference, and the value of the
> timeout didn't seem to matter, and if anything the marginal effect was
> simply from the presence of any timer (same effect as setting
> statement_timeout) - and later on the thread it seems like Thomas also
> saw minimal performance concern here.
>
> I did see a real system outage that could have been prevented by an
> appropriate default value here, since I didn't yet know to change it.

I'm not sure that client_connection_check_interval needs to be enabled
by default. However, if we do agree to change the default and apply it,
I think we should first address the related issue: with log_lock_waits enabled
by default, setting client_connection_check_interval to 2s would cause
"still waiting" messages to be logged every 2 seconds during waiting on
the lock. That could result in a lot of noisy logging under default settings.

The issue is that backends blocked in ProcSleep() are woken up every
client_connection_check_interval and may emit a "still waiting" message
each time if log_lock_waits is enabled. To mitigate this, just one idea is
to add a flag to track whether the "still waiting" message has already been
emitted during a call to ProcSleep(), and suppress further messages
once it has been logged.

Regards,

--
Fujii Masao

Re: client_connection_check_interval default value

От

Tom Lane

Дата:

06 февраля, 03:01:52

Fujii Masao <masao.fujii@gmail.com> writes:
> On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
> <schneider@ardentperf.com> wrote:
>> I did see a real system outage that could have been prevented by an
>> appropriate default value here, since I didn't yet know to change it.

> I'm not sure that client_connection_check_interval needs to be enabled
> by default.

I think enabling it by default is a nonstarter, because it changes
behavior in a significant way.  Specifically, it's always been the
case that if the client disconnects during a non-SELECT query (or
anything that doesn't produce output), the backend would complete that
query before ending the session.  I think it's very likely that there
are users depending on that behavior.  Jeremy is describing an
application that evidently was built on the assumption that
disconnecting early would abort a wait, but that assumption was not
based on any testing.  I think it's good that we now have an option
to make such an assumption hold, but that does not translate to
"we should force that behavior on everybody".  Whether or not the
overhead is insignificant, there's a good chance that the change
would make more people unhappy than happy.

> The issue is that backends blocked in ProcSleep() are woken up every
> client_connection_check_interval and may emit a "still waiting" message
> each time if log_lock_waits is enabled. To mitigate this, just one idea is
> to add a flag to track whether the "still waiting" message has already been
> emitted during a call to ProcSleep(), and suppress further messages
> once it has been logged.

Independently of what's the default, it seems like it'd be valuable to
make that interaction better.  I think it is reasonable to keep on
emitting "still waiting" every so often, but we could probably
rate-limit that to a lot less than every 2 seconds.

            regards, tom lane

Re: client_connection_check_interval default value

От

Jacob Champion

Дата:

06 февраля, 03:18:50

On Thu, Feb 5, 2026 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.

Ha, I hadn't even thought about the possibility of fire-and-forget...

> I think it's very likely that there
> are users depending on that behavior.

From a quick search, yeah, probably:

    https://dba.stackexchange.com/questions/247206/will-query-continue-to-run-even-after-network-times-out

--Jacob

Re: client_connection_check_interval default value

От

Jeremy Schneider

Дата:

06 февраля, 04:02:05

On Thu, 05 Feb 2026 19:01:52 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.  I think it's very likely that there
> are users depending on that behavior.  Jeremy is describing an
> application that evidently was built on the assumption that
> disconnecting early would abort a wait, but that assumption was not
> based on any testing.  I think it's good that we now have an option
> to make such an assumption hold, but that does not translate to
> "we should force that behavior on everybody".  Whether or not the
> overhead is insignificant, there's a good chance that the change
> would make more people unhappy than happy.

This application was trying to kill the connection, the postgres client
attempted to send a cancel message but the cancel message was lost and
the client naturally falls back to just hard killing the network
connection.

I totally understand the cancel messages should not have gotten lost.

But I'm surprised that we'd assume most users who kill -9 their hung
client want their 10 hour query to keep running forever? Yes we always
try to gracefully stop things first and let cancel messages get
through, but I'd think if server received a FIN over the network then we
could interpret that similar to a cancel message. This would be a
change in a new major version; we wouldn't backpatch a GUC default.

I see the concern around the behavior change, but I don't feel like
this difference between CTRL-C and kill-9 should be sancrosact, if
having them behave similarly seems generally reasonable. Postgres has
made more significant changes than this before in major versions.

-Jeremy

Re: client_connection_check_interval default value

От

Thomas Munro

Дата:

06 февраля, 04:36:56

Some space cadet thoughts I recall wondering about back then: If one
process had all the sockets, I suppose something could watch for
POLLRDHUP/KEV_EOF from all of them and send an interrupt to the owner
to handle in its next CFI, so that we don't have to have every backend
polling its socket.  Once we have a multithreaded server we could try
that, but it might become more reasonable to consider with
hypothetical models of socket management where there are executor
threads that are not tried to a session and socket, which implies a
new bit of architecture handling network events on behalf of sessions.
I suppose in theory it could be the postmaster today (just don't close
the client sockets after fork, and teach the postmaster's main loop to
do that), and I vaguely recall something along those lines being shot
down in some other thread... it might cause some extra contention in
the kernel depending how it's implemented (IDK), but it'd obviously
eat a lot of fd table entries which is probably what killed the
notion... though we'll need that large socket table eventually (and
perhaps we'll pay for it by sharing file fds somehow...).  But either way,
heartbeats might indeed make more sense, whatever the architecture.

Re: client_connection_check_interval default value

От

Laurenz Albe

Дата:

06 февраля, 12:40:02

On Thu, 2026-02-05 at 19:01 -0500, Tom Lane wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
> > On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
> > <schneider@ardentperf.com> wrote:
> > > I did see a real system outage that could have been prevented by an
> > > appropriate default value here, since I didn't yet know to change it.
>
> > I'm not sure that client_connection_check_interval needs to be enabled
> > by default.
>
> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way.  Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.  I think it's very likely that there
> are users depending on that behavior.

*Perhaps* there are some users who depend on the current behavior, but
my experience is that the vast majority of users don't want that statements
started by a connection that went dead should keep running.

I mean, it would be a change in behavior, but that is normal during a
major upgrade, and users who actively want the current behavior can
disable client_connection_check_interval.

I think that enabling client_connection_check_interval would be a net win,
as far as the core functionality is concerned.

Fujii Masao's concern that log_lock_waits would issue a message every
client_connection_check_interval is much more serious in my opinion, now
that log_lock_waits is enabled by default (at - erm - my insistence).

Why does the deadlock detector kick in every client_connection_check_interval?

Yours,
Laurenz Albe

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

18 февраля, 08:30:22

On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The issue is that backends blocked in ProcSleep() are woken up every
> > client_connection_check_interval and may emit a "still waiting" message
> > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > to add a flag to track whether the "still waiting" message has already been
> > emitted during a call to ProcSleep(), and suppress further messages
> > once it has been logged.
>
> Independently of what's the default, it seems like it'd be valuable to
> make that interaction better.  I think it is reasonable to keep on
> emitting "still waiting" every so often, but we could probably
> rate-limit that to a lot less than every 2 seconds.

Attached is a patch that rate-limits the "still waiting on lock" message
to at most once every 10s.

I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
discuss the appropriate interval and adjust it if needed. The value is
currently hard-coded, as making it configurable does not seem necessary.

Thoughts?

--
Fujii Masao

Вложения

v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch

Re: client_connection_check_interval default value

От

Laurenz Albe

Дата:

18 февраля, 11:03:48

On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote:
> On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > The issue is that backends blocked in ProcSleep() are woken up every
> > > client_connection_check_interval and may emit a "still waiting" message
> > > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > > to add a flag to track whether the "still waiting" message has already been
> > > emitted during a call to ProcSleep(), and suppress further messages
> > > once it has been logged.
> >
> > Independently of what's the default, it seems like it'd be valuable to
> > make that interaction better.  I think it is reasonable to keep on
> > emitting "still waiting" every so often, but we could probably
> > rate-limit that to a lot less than every 2 seconds.
>
> Attached is a patch that rate-limits the "still waiting on lock" message
> to at most once every 10s.
>
> I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> discuss the appropriate interval and adjust it if needed. The value is
> currently hard-coded, as making it configurable does not seem necessary.

I think that 10 seconds is good.

Yours,
Laurenz Albe

Re: client_connection_check_interval default value

От

Ants Aasma

Дата:

18 февраля, 12:00:40

On Wed, 18 Feb 2026 at 10:03, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote:
> > On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > The issue is that backends blocked in ProcSleep() are woken up every
> > > > client_connection_check_interval and may emit a "still waiting" message
> > > > each time if log_lock_waits is enabled. To mitigate this, just one idea is
> > > > to add a flag to track whether the "still waiting" message has already been
> > > > emitted during a call to ProcSleep(), and suppress further messages
> > > > once it has been logged.
> > >
> > > Independently of what's the default, it seems like it'd be valuable to
> > > make that interaction better.  I think it is reasonable to keep on
> > > emitting "still waiting" every so often, but we could probably
> > > rate-limit that to a lot less than every 2 seconds.
> >
> > Attached is a patch that rate-limits the "still waiting on lock" message
> > to at most once every 10s.
> >
> > I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> > discuss the appropriate interval and adjust it if needed. The value is
> > currently hard-coded, as making it configurable does not seem necessary.
>
> I think that 10 seconds is good.

I think 10 seconds is way too small. Having one long locker blocking a
couple hundred backends is something that happens somewhat regularly.
A 10s interval would result in tens of "still waiting" per second. It
will just make it harder to sift out what is actually going on between
all the waiters squawking "are we there yet?" in a loop.

I think something above 5 minutes would be more appropriate.

Regards,
Ants Aasma

Re: client_connection_check_interval default value

От

Chao Li

Дата:

24 февраля, 09:01:36


> On Feb 18, 2026, at 13:30, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The issue is that backends blocked in ProcSleep() are woken up every
>>> client_connection_check_interval and may emit a "still waiting" message
>>> each time if log_lock_waits is enabled. To mitigate this, just one idea is
>>> to add a flag to track whether the "still waiting" message has already been
>>> emitted during a call to ProcSleep(), and suppress further messages
>>> once it has been logged.
>>
>> Independently of what's the default, it seems like it'd be valuable to
>> make that interaction better.  I think it is reasonable to keep on
>> emitting "still waiting" every so often, but we could probably
>> rate-limit that to a lot less than every 2 seconds.
>
> Attached is a patch that rate-limits the "still waiting on lock" message
> to at most once every 10s.
>
> I chose 10s instead of the suggested 2s, since 2s felt too short. But we can
> discuss the appropriate interval and adjust it if needed. The value is
> currently hard-coded, as making it configurable does not seem necessary.
>
> Thoughts?
>
> --
> Fujii Masao
> <v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch>

I feel 10 seconds is good.

The other thinking is that, the message will only be printed after the first deadlock check is fired. So, if someone
setsdeadlock_timeout to a large value, say 30 or 60 seconds, then any waiting log would already be very delayed. In
thatcase, the user might not want to log more often than deadlock checks anyway. From this perspective, the rate limit
timeoutcould be max(10s, deadlock_timeout). Anyway, this is not a strong opinion. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

26 февраля, 04:30:40

On Wed, Feb 18, 2026 at 6:00 PM Ants Aasma <ants.aasma@cybertec.at> wrote:
> I think 10 seconds is way too small. Having one long locker blocking a
> couple hundred backends is something that happens somewhat regularly.
> A 10s interval would result in tens of "still waiting" per second. It
> will just make it harder to sift out what is actually going on between
> all the waiters squawking "are we there yet?" in a loop.
>
> I think something above 5 minutes would be more appropriate.

For that scenario, wouldn't it be better to emit the "still waiting" message
only once per lock wait (i.e., use the same behavior) regardless of
the client_connection_check_interval setting, rather than repeating it
every several minutes? Also logging it again after a long delay like 5min
could be confusing to users.

I used 10s in the patch, on the other hand I'm also ok with preserving
the existing behavior (emit once per wait) whether
client_connection_check_interval is set or not. Even without periodic
log messages, ongoing lock waits can still be monitored via pg_locks.

Regards,

--
Fujii Masao

Re: client_connection_check_interval default value

От

Hüseyin Demir

Дата:

09 марта, 12:02:26

Hi Fujii,

Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:

1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If
someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait
feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent.
 

2) The hardcoded 10s constant — could we define it as a named constant in a header file? That way it's easier to find
andreason about if it ever needs to change.
 

3) Would it make sense to add a regression test for this? Something that verifies the rate limiting actually suppresses
therepeated messages at the expected interval.
 

Regards,

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

09 марта, 17:12:29

On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
>
> Hi Fujii,
>
> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
>
> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If
someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait
feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 

Or perhaps they expect the log message to be emitted only once,
just after deadlock_timeout, similar to the current behavior when
client_connection_check_interval is not set, I guess.

I'm now starting thinking it might be better to preserve the existing
behavior (emitting the message once per wait) regardless of whether
client_connection_check_interval is set, and implement that first.

If there is a need to emit the message periodically, we could add that
as a separate feature later so that it works independently of
the client_connection_check_interval setting.

Thought?

Regards,

--
Fujii Masao

Re: client_connection_check_interval default value

От

Hüseyin Demir

Дата:

09 марта, 21:18:05

Hi,

Fujii Masao <masao.fujii@gmail.com>, 9 Mar 2026 Pzt, 15:12 tarihinde şunu yazdı:

Or perhaps they expect the log message to be emitted only once,
just after deadlock_timeout, similar to the current behavior when
client_connection_check_interval is not set, I guess.

I'm now starting thinking it might be better to preserve the existing
behavior (emitting the message once per wait) regardless of whether
client_connection_check_interval is set, and implement that first.

If there is a need to emit the message periodically, we could add that
as a separate feature later so that it works independently of
the client_connection_check_interval setting.

+1 to this idea. It would be a better approach in the future if we need to change the behaviour of emitting logs about these topics.

I do see the trade-off. Put simply with only one message, we can lose visibility into long lock waits. But I think that's a separate concern. If there's a real need for periodic "still waiting" messages in the future, we could introduce a dedicated GUC (something like log_lock_waits_interval) or even a simple constant to control that independently of client_connection_check_interval. That way deadlock detection, connection checking, and lock-wait logging each have their own rules and don't interfere with each other.

Regards.

Re: client_connection_check_interval default value

От

Chao Li

Дата:

10 марта, 04:42:17


> On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
>>
>> Hi Fujii,
>>
>> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
>>
>> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If
someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait
feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 
>
> Or perhaps they expect the log message to be emitted only once,
> just after deadlock_timeout, similar to the current behavior when
> client_connection_check_interval is not set, I guess.
>
> I'm now starting thinking it might be better to preserve the existing
> behavior (emitting the message once per wait) regardless of whether
> client_connection_check_interval is set, and implement that first.
>
> If there is a need to emit the message periodically, we could add that
> as a separate feature later so that it works independently of
> the client_connection_check_interval setting.
>
> Thought?

Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and
log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today,
havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

13 марта, 15:36:16

On Tue, Mar 10, 2026 at 10:42 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> >>
> >> Hi Fujii,
> >>
> >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
> >>
> >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting.
Ifsomeone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait
feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 
> >
> > Or perhaps they expect the log message to be emitted only once,
> > just after deadlock_timeout, similar to the current behavior when
> > client_connection_check_interval is not set, I guess.
> >
> > I'm now starting thinking it might be better to preserve the existing
> > behavior (emitting the message once per wait) regardless of whether
> > client_connection_check_interval is set, and implement that first.
> >
> > If there is a need to emit the message periodically, we could add that
> > as a separate feature later so that it works independently of
> > the client_connection_check_interval setting.
> >
> > Thought?
>
> Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and
log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today,
havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. 

So, attached is a patch that ensures the "still waiting on lock" message is
reported at most once during a lock wait, even if the wait is interrupted.

Regards,

--
Fujii Masao

Вложения

v2-0001-Ensure-still-waiting-on-lock-message-is-logged-on.patch

Re: client_connection_check_interval default value

От

Chao Li

Дата:

16 марта, 04:36:36


> On Mar 13, 2026, at 20:36, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 10:42 AM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>
>>
>>> On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>
>>> On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
>>>>
>>>> Hi Fujii,
>>>>
>>>> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
>>>>
>>>> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting.
Ifsomeone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait
feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 
>>>
>>> Or perhaps they expect the log message to be emitted only once,
>>> just after deadlock_timeout, similar to the current behavior when
>>> client_connection_check_interval is not set, I guess.
>>>
>>> I'm now starting thinking it might be better to preserve the existing
>>> behavior (emitting the message once per wait) regardless of whether
>>> client_connection_check_interval is set, and implement that first.
>>>
>>> If there is a need to emit the message periodically, we could add that
>>> as a separate feature later so that it works independently of
>>> the client_connection_check_interval setting.
>>>
>>> Thought?
>>
>> Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and
log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today,
havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. 
>
> So, attached is a patch that ensures the "still waiting on lock" message is
> reported at most once during a lock wait, even if the wait is interrupted.
>
> Regards,
>
> --
> Fujii Masao
> <v2-0001-Ensure-still-waiting-on-lock-message-is-logged-on.patch>

V2 overall looks good to me.

A small comment is about the variable name logged_lock_waits that sounds like “count of waits”, I would suggest
“lock_wait_logged”.But I see that the name follows the naming convention of the existing variable
logged_recovery_conflict,so maybe just rename to logged_lock_wait (remove the “s”). 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: client_connection_check_interval default value

От

Hüseyin Demir

Дата:

16 марта, 10:04:53

Hi,

Fujii Masao <masao.fujii@gmail.com>, 13 Mar 2026 Cum, 13:36 tarihinde
şunu yazdı:
>
> On Tue, Mar 10, 2026 at 10:42 AM Chao Li <li.evan.chao@gmail.com> wrote:
> >
> >
> >
> > > On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote:
> > >
> > > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> > >>
> > >> Hi Fujii,
> > >>
> > >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts:
> > >>
> > >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth
adopting.If someone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent
lock-waitfeedback. Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 
> > >
> > > Or perhaps they expect the log message to be emitted only once,
> > > just after deadlock_timeout, similar to the current behavior when
> > > client_connection_check_interval is not set, I guess.
> > >
> > > I'm now starting thinking it might be better to preserve the existing
> > > behavior (emitting the message once per wait) regardless of whether
> > > client_connection_check_interval is set, and implement that first.
> > >
> > > If there is a need to emit the message periodically, we could add that
> > > as a separate feature later so that it works independently of
> > > the client_connection_check_interval setting.
> > >
> > > Thought?
> >
> > Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and
log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today,
havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. 
>
> So, attached is a patch that ensures the "still waiting on lock" message is
> reported at most once during a lock wait, even if the wait is interrupted.
>

The new v2 patch looks good to me.

One open question from my side is should we include a test for this
behaviour ? Because we mentioned adding a different GUC in the future
to manage this rate-limiting approach. It can be useful in the future
once we consider/re-visit this approach. If the tests and other future
ideas can be developed later together we can consider adding tests
later.

Thanks for the patch again!

Regards.

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

16 марта, 12:21:50

On Mon, Mar 16, 2026 at 10:37 AM Chao Li <li.evan.chao@gmail.com> wrote:
> V2 overall looks good to me.
>
> A small comment is about the variable name logged_lock_waits that sounds like “count of waits”, I would suggest
“lock_wait_logged”.But I see that the name follows the naming convention of the existing variable
logged_recovery_conflict,so maybe just rename to logged_lock_wait (remove the “s”). 
>

TBH I'm not sure whether renaming the variable that way is a good idea,
but I don't have a strong opinion about this. So I renamed it to
logged_lock_wait (i.e., removed the trailing "s").

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

16 марта, 12:22:23

On Mon, Mar 16, 2026 at 4:05 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> The new v2 patch looks good to me.
>
> One open question from my side is should we include a test for this
> behaviour ? Because we mentioned adding a different GUC in the future
> to manage this rate-limiting approach. It can be useful in the future
> once we consider/re-visit this approach. If the tests and other future
> ideas can be developed later together we can consider adding tests
> later.

I agree it's worth adding such tests. From a quick look at the regression tests,
there don't seem to be any tests for log_lock_waits itself. So before adding
tests for the behavior introduced by the patch, we might first need to add
some basic tests for log_lock_waits.

Regards,

--
Fujii Masao

Re: client_connection_check_interval default value

От

Hüseyin Demir

Дата:

17 марта, 22:59:04

Fujii Masao <masao.fujii@gmail.com>, 16 Mar 2026 Pzt, 10:22 tarihinde
şunu yazdı:
>
> On Mon, Mar 16, 2026 at 4:05 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> > The new v2 patch looks good to me.
> >
> > One open question from my side is should we include a test for this
> > behaviour ? Because we mentioned adding a different GUC in the future
> > to manage this rate-limiting approach. It can be useful in the future
> > once we consider/re-visit this approach. If the tests and other future
> > ideas can be developed later together we can consider adding tests
> > later.
>
> I agree it's worth adding such tests. From a quick look at the regression tests,
> there don't seem to be any tests for log_lock_waits itself. So before adding
> tests for the behavior introduced by the patch, we might first need to add
> some basic tests for log_lock_waits.
>
> Regards,
>
> --
> Fujii Masao

I created a regression test for the behaviour we mentioned before
introducing further possible changes.

You can review it. Basically I tried to simulate the desired behaviour
which the current patch introduced.

Regards.

Вложения

v1-0001-add-regression-tests-for-still-waiting-on-lock-log-message.patch

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

06 апреля, 14:09:30

On Wed, Mar 18, 2026 at 4:59 AM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> I created a regression test for the behaviour we mentioned before
> introducing further possible changes.
>
> You can review it. Basically I tried to simulate the desired behaviour
> which the current patch introduced.

Thanks for the patch!

Commit 557a9f1e3e6 recently added test for lock stats, which also causes
lock wait situations. So it seems better to extend that test to cover
log_lock_waits rather than adding a new TAP test file.

I've prepared a patch to do this. Patch attached. Thought?

Regards,

Вложения

v2-0001-Add-TAP-tests-for-log_lock_waits.patch

Re: client_connection_check_interval default value

От

Hüseyin Demir

Дата:

06 апреля, 15:59:49

Hi,

Thanks for the patch!

Commit 557a9f1e3e6 recently added test for lock stats, which also causes
lock wait situations. So it seems better to extend that test to cover
log_lock_waits rather than adding a new TAP test file.

I've prepared a patch to do this. Patch attached. Thought?

Regards,

Appreciated for the patch. I reviewed it quickly.

In the test description it says that `still waiting logged exactly once despite pg_reload_conf() wakeups")` but the test sends via pg_log_backend_memory_contexts(). It would make sense to update it.

```

- 1, "still waiting logged exactly once despite pg_reload_conf() wakeups");
+ 1, "still waiting logged exactly once despite wakeups from pg_log_backend_memory_contexts()");

```

Secondly, before finishing the test it tries to check that no log_lock_waits messages are emitted. But the comment has the opposite meaning.

```

- 'check that log_lock_waits message is emitted when the lock is acquired after waiting'
+ 'check that no log_lock_waits message is emitted when the lock is acquired after waiting'

```

I'm not sure they need to change but these are the only topics I wanted to add. Otherwise, lgtm and thanks.

I attached the v3 to convey my ideas. You can use it or update the existing if you think the suggestions are reasonable.

Regards.

Вложения

v3-0001-Add-TAP-tests-for-log_lock_waits.patch

Re: client_connection_check_interval default value

От

Fujii Masao

Дата:

06 апреля, 17:55:08

On Mon, Apr 6, 2026 at 10:00 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote:
> Appreciated for the patch. I reviewed it quickly.
>
> In the test description it says that `still waiting logged exactly once despite pg_reload_conf() wakeups")` but the
testsends via pg_log_backend_memory_contexts(). It would make sense to update it. 
>
> ```
>   -  1, "still waiting logged exactly once despite pg_reload_conf() wakeups");
>   +  1, "still waiting logged exactly once despite wakeups from pg_log_backend_memory_contexts()");
> ```
>
> Secondly, before finishing the test it tries to check that no log_lock_waits messages are emitted. But the comment
hasthe opposite meaning. 
>
> ```
>   - 'check that log_lock_waits message is emitted when the lock is acquired after waiting'
>   + 'check that no log_lock_waits message is emitted when the lock is acquired after waiting'
> ```
>
> I'm not sure they need to change but these are the only topics I wanted to add. Otherwise, lgtm and thanks.
>
> I attached the v3 to convey my ideas. You can use it or update the existing if you think the suggestions are
reasonable.

Thanks for the review and for updating the patch!
Your changes look good to me.

I also added a comment explaining why the test wakes the backend,
and then pushed the patch. Thanks again!

Regards,

--
Fujii Masao

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: client_connection_check_interval default value

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения