Обсуждение: client_connection_check_interval default value
What would people here think about changing the default value of client_connection_check_interval to 2000 ms? Right now this is disabled by default. The background is that I recently saw an incident where a blocking-lock brownout escalated from a row-level problem to a complete system outage, due to a combination of factors including a bug in golang's pgx postgres client (PR 2481 has now been merged w a fix) and a pgbouncer setup that was missing peers configuration. As a result, cancel messages were getting dropped while postgres connections were waiting on a blocked lock, golang aggresively timed out on context deadlines and retried, and once the database reached max_connections the whole system ground to a halt. At the time I thought it was weird that postgres wasn't checking for dead connections while those conns were waiting for locks; I spent a bunch of time investigating this and reproduced it and wrote up what I was able to figure out. Then, yesterday, I saw a LinkedIn post from Marat at Data Egret who mentioned that client_connection_check_interval exists. Plugged this into my repro and confirmed it can prevent postgres from escalating the blocking-lock brownout into a complete outage due to connection exhaustion. While a fix has been merged in pgx for the most direct root cause of the incident I saw, this setting just seems like a good behavior to make Postgres more robust in general. 2000 ms seemed like a fairly safe/conservative starting point for discussion. Thoughts? -Jeremy PS. Some more details and graphs are at https://ardentperf.com/2026/02/04/postgres-client_connection_check_interval/ -- To know the thoughts and deeds that have marked man's progress is to feel the great heart throbs of humanity through the centuries; and if one does not feel in these pulsations a heavenward striving, one must indeed be deaf to the harmonies of life. Helen Keller, The Story Of My Life, 1902, 1903, 1905, introduction by Ralph Barton Perry (Garden City, NY: Doubleday & Company, 1954), p90.
Вложения
On Wed, 4 Feb 2026 21:30:32 -0800 Jeremy Schneider <schneider@ardentperf.com> wrote: > What would people here think about changing the default value of > client_connection_check_interval to 2000 ms? Right now this is > disabled by default. Forgot the doc update in the attached patch. Updated -Jeremy
Вложения
I'm a weak -1 on this. Certainly not 2s! That's a lot of context switching for a busy system for no real reason. Also see this past discussion:
--
Cheers,
Greg
On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider <schneider@ardentperf.com> wrote: > While a fix has been merged in pgx for the most direct root cause of > the incident I saw, this setting just seems like a good behavior to > make Postgres more robust in general. At the risk of making perfect the enemy of better, the protocol-level heartbeat mentioned in the original thread [1] would cover more use cases, which might give it a better chance of eventually becoming default behavior. It might also be a lot of work, though. --Jacob [1] https://postgr.es/m/CA%2BhUKGLyj5Aqt6ojYfSc%2BqSeB1x%3D3RbU61hnus5sL0BKqEBsLw%40mail.gmail.com
One interesting thing to me - it seems like all of the past mail threads were focused on a situation different from mine. Lots of discussion about freeing resources like CPU. In the outage I saw, the system was idle and we completely ran out of max_connections because all sessions were waiting on a row lock. Importantly, the app was closing these conns but we had sockets stacking up on the server in CLOSE-WAIT state - and postgres simply never cleaned them up until we had an outage. The processes were completely idle waiting for a row lock that was not going to be released. Impact could have been isolated to sessions hitting that row (with this GUC), but it escalated to a system outage. It's pretty simple to reproduce this: https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion On Thu, 5 Feb 2026 09:26:34 -0800 Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider > <schneider@ardentperf.com> wrote: > > While a fix has been merged in pgx for the most direct root cause of > > the incident I saw, this setting just seems like a good behavior to > > make Postgres more robust in general. > > At the risk of making perfect the enemy of better, the protocol-level > heartbeat mentioned in the original thread [1] would cover more use > cases, which might give it a better chance of eventually becoming > default behavior. It might also be a lot of work, though. It seems like a fair bit of discussion is around OS coverage - even Thomas' message there references keepalive working as expected on Linux. Tom objects in 2023 that "the default behavior would then be platform-dependent and that's a documentation problem we could do without." But it's been five years - has there been further work on implementing a postgres-level heartbeat? And I see other places in the docs where we note platform differences, is it really such a big problem to change the default here? On Thu, 5 Feb 2026 10:00:29 -0500 Greg Sabino Mullane <htamfids@gmail.com> wrote: > I'm a weak -1 on this. Certainly not 2s! That's a lot of context > switching for a busy system for no real reason. Also see this past > discussion: In the other thread I see larger perf concerns with some early implementations before they refactored the patch? Konstantin's message on 2019-08-02 said he didn't see much difference, and the value of the timeout didn't seem to matter, and if anything the marginal effect was simply from the presence of any timer (same effect as setting statement_timeout) - and later on the thread it seems like Thomas also saw minimal performance concern here. I did see a real system outage that could have been prevented by an appropriate default value here, since I didn't yet know to change it. -Jeremy
Вложения
On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider <schneider@ardentperf.com> wrote: > > One interesting thing to me - it seems like all of the past mail > threads were focused on a situation different from mine. Lots of > discussion about freeing resources like CPU. > > In the outage I saw, the system was idle and we completely ran out of > max_connections because all sessions were waiting on a row lock. > > Importantly, the app was closing these conns but we had sockets stacking > up on the server in CLOSE-WAIT state - and postgres simply never > cleaned them up until we had an outage. The processes were completely > idle waiting for a row lock that was not going to be released. > > Impact could have been isolated to sessions hitting that row (with this > GUC), but it escalated to a system outage. It's pretty simple to > reproduce this: > https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion > > > On Thu, 5 Feb 2026 09:26:34 -0800 > Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > > On Wed, Feb 4, 2026 at 9:30 PM Jeremy Schneider > > <schneider@ardentperf.com> wrote: > > > While a fix has been merged in pgx for the most direct root cause of > > > the incident I saw, this setting just seems like a good behavior to > > > make Postgres more robust in general. > > > > At the risk of making perfect the enemy of better, the protocol-level > > heartbeat mentioned in the original thread [1] would cover more use > > cases, which might give it a better chance of eventually becoming > > default behavior. It might also be a lot of work, though. > > It seems like a fair bit of discussion is around OS coverage - even > Thomas' message there references keepalive working as expected on > Linux. Tom objects in 2023 that "the default behavior would then be > platform-dependent and that's a documentation problem we could do > without." > > But it's been five years - has there been further work on implementing > a postgres-level heartbeat? And I see other places in the docs where we > note platform differences, is it really such a big problem to change > the default here? > > > On Thu, 5 Feb 2026 10:00:29 -0500 > Greg Sabino Mullane <htamfids@gmail.com> wrote: > > > I'm a weak -1 on this. Certainly not 2s! That's a lot of context > > switching for a busy system for no real reason. Also see this past > > discussion: > > In the other thread I see larger perf concerns with some early > implementations before they refactored the patch? Konstantin's message > on 2019-08-02 said he didn't see much difference, and the value of the > timeout didn't seem to matter, and if anything the marginal effect was > simply from the presence of any timer (same effect as setting > statement_timeout) - and later on the thread it seems like Thomas also > saw minimal performance concern here. > > I did see a real system outage that could have been prevented by an > appropriate default value here, since I didn't yet know to change it. I'm not sure that client_connection_check_interval needs to be enabled by default. However, if we do agree to change the default and apply it, I think we should first address the related issue: with log_lock_waits enabled by default, setting client_connection_check_interval to 2s would cause "still waiting" messages to be logged every 2 seconds during waiting on the lock. That could result in a lot of noisy logging under default settings. The issue is that backends blocked in ProcSleep() are woken up every client_connection_check_interval and may emit a "still waiting" message each time if log_lock_waits is enabled. To mitigate this, just one idea is to add a flag to track whether the "still waiting" message has already been emitted during a call to ProcSleep(), and suppress further messages once it has been logged. Regards, -- Fujii Masao
Fujii Masao <masao.fujii@gmail.com> writes:
> On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider
> <schneider@ardentperf.com> wrote:
>> I did see a real system outage that could have been prevented by an
>> appropriate default value here, since I didn't yet know to change it.
> I'm not sure that client_connection_check_interval needs to be enabled
> by default.
I think enabling it by default is a nonstarter, because it changes
behavior in a significant way. Specifically, it's always been the
case that if the client disconnects during a non-SELECT query (or
anything that doesn't produce output), the backend would complete that
query before ending the session. I think it's very likely that there
are users depending on that behavior. Jeremy is describing an
application that evidently was built on the assumption that
disconnecting early would abort a wait, but that assumption was not
based on any testing. I think it's good that we now have an option
to make such an assumption hold, but that does not translate to
"we should force that behavior on everybody". Whether or not the
overhead is insignificant, there's a good chance that the change
would make more people unhappy than happy.
> The issue is that backends blocked in ProcSleep() are woken up every
> client_connection_check_interval and may emit a "still waiting" message
> each time if log_lock_waits is enabled. To mitigate this, just one idea is
> to add a flag to track whether the "still waiting" message has already been
> emitted during a call to ProcSleep(), and suppress further messages
> once it has been logged.
Independently of what's the default, it seems like it'd be valuable to
make that interaction better. I think it is reasonable to keep on
emitting "still waiting" every so often, but we could probably
rate-limit that to a lot less than every 2 seconds.
regards, tom lane
On Thu, Feb 5, 2026 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think enabling it by default is a nonstarter, because it changes
> behavior in a significant way. Specifically, it's always been the
> case that if the client disconnects during a non-SELECT query (or
> anything that doesn't produce output), the backend would complete that
> query before ending the session.
Ha, I hadn't even thought about the possibility of fire-and-forget...
> I think it's very likely that there
> are users depending on that behavior.
From a quick search, yeah, probably:
https://dba.stackexchange.com/questions/247206/will-query-continue-to-run-even-after-network-times-out
--Jacob
On Thu, 05 Feb 2026 19:01:52 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think enabling it by default is a nonstarter, because it changes > behavior in a significant way. Specifically, it's always been the > case that if the client disconnects during a non-SELECT query (or > anything that doesn't produce output), the backend would complete that > query before ending the session. I think it's very likely that there > are users depending on that behavior. Jeremy is describing an > application that evidently was built on the assumption that > disconnecting early would abort a wait, but that assumption was not > based on any testing. I think it's good that we now have an option > to make such an assumption hold, but that does not translate to > "we should force that behavior on everybody". Whether or not the > overhead is insignificant, there's a good chance that the change > would make more people unhappy than happy. This application was trying to kill the connection, the postgres client attempted to send a cancel message but the cancel message was lost and the client naturally falls back to just hard killing the network connection. I totally understand the cancel messages should not have gotten lost. But I'm surprised that we'd assume most users who kill -9 their hung client want their 10 hour query to keep running forever? Yes we always try to gracefully stop things first and let cancel messages get through, but I'd think if server received a FIN over the network then we could interpret that similar to a cancel message. This would be a change in a new major version; we wouldn't backpatch a GUC default. I see the concern around the behavior change, but I don't feel like this difference between CTRL-C and kill-9 should be sancrosact, if having them behave similarly seems generally reasonable. Postgres has made more significant changes than this before in major versions. -Jeremy
Some space cadet thoughts I recall wondering about back then: If one process had all the sockets, I suppose something could watch for POLLRDHUP/KEV_EOF from all of them and send an interrupt to the owner to handle in its next CFI, so that we don't have to have every backend polling its socket. Once we have a multithreaded server we could try that, but it might become more reasonable to consider with hypothetical models of socket management where there are executor threads that are not tried to a session and socket, which implies a new bit of architecture handling network events on behalf of sessions. I suppose in theory it could be the postmaster today (just don't close the client sockets after fork, and teach the postmaster's main loop to do that), and I vaguely recall something along those lines being shot down in some other thread... it might cause some extra contention in the kernel depending how it's implemented (IDK), but it'd obviously eat a lot of fd table entries which is probably what killed the notion... though we'll need that large socket table eventually (and perhaps we'll pay for it by sharing file fds somehow...). But either way, heartbeats might indeed make more sense, whatever the architecture.
On Thu, 2026-02-05 at 19:01 -0500, Tom Lane wrote: > Fujii Masao <masao.fujii@gmail.com> writes: > > On Fri, Feb 6, 2026 at 8:05 AM Jeremy Schneider > > <schneider@ardentperf.com> wrote: > > > I did see a real system outage that could have been prevented by an > > > appropriate default value here, since I didn't yet know to change it. > > > I'm not sure that client_connection_check_interval needs to be enabled > > by default. > > I think enabling it by default is a nonstarter, because it changes > behavior in a significant way. Specifically, it's always been the > case that if the client disconnects during a non-SELECT query (or > anything that doesn't produce output), the backend would complete that > query before ending the session. I think it's very likely that there > are users depending on that behavior. *Perhaps* there are some users who depend on the current behavior, but my experience is that the vast majority of users don't want that statements started by a connection that went dead should keep running. I mean, it would be a change in behavior, but that is normal during a major upgrade, and users who actively want the current behavior can disable client_connection_check_interval. I think that enabling client_connection_check_interval would be a net win, as far as the core functionality is concerned. Fujii Masao's concern that log_lock_waits would issue a message every client_connection_check_interval is much more serious in my opinion, now that log_lock_waits is enabled by default (at - erm - my insistence). Why does the deadlock detector kick in every client_connection_check_interval? Yours, Laurenz Albe
On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The issue is that backends blocked in ProcSleep() are woken up every > > client_connection_check_interval and may emit a "still waiting" message > > each time if log_lock_waits is enabled. To mitigate this, just one idea is > > to add a flag to track whether the "still waiting" message has already been > > emitted during a call to ProcSleep(), and suppress further messages > > once it has been logged. > > Independently of what's the default, it seems like it'd be valuable to > make that interaction better. I think it is reasonable to keep on > emitting "still waiting" every so often, but we could probably > rate-limit that to a lot less than every 2 seconds. Attached is a patch that rate-limits the "still waiting on lock" message to at most once every 10s. I chose 10s instead of the suggested 2s, since 2s felt too short. But we can discuss the appropriate interval and adjust it if needed. The value is currently hard-coded, as making it configurable does not seem necessary. Thoughts? -- Fujii Masao
Вложения
On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote: > On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > The issue is that backends blocked in ProcSleep() are woken up every > > > client_connection_check_interval and may emit a "still waiting" message > > > each time if log_lock_waits is enabled. To mitigate this, just one idea is > > > to add a flag to track whether the "still waiting" message has already been > > > emitted during a call to ProcSleep(), and suppress further messages > > > once it has been logged. > > > > Independently of what's the default, it seems like it'd be valuable to > > make that interaction better. I think it is reasonable to keep on > > emitting "still waiting" every so often, but we could probably > > rate-limit that to a lot less than every 2 seconds. > > Attached is a patch that rate-limits the "still waiting on lock" message > to at most once every 10s. > > I chose 10s instead of the suggested 2s, since 2s felt too short. But we can > discuss the appropriate interval and adjust it if needed. The value is > currently hard-coded, as making it configurable does not seem necessary. I think that 10 seconds is good. Yours, Laurenz Albe
On Wed, 18 Feb 2026 at 10:03, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > On Wed, 2026-02-18 at 14:30 +0900, Fujii Masao wrote: > > On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > The issue is that backends blocked in ProcSleep() are woken up every > > > > client_connection_check_interval and may emit a "still waiting" message > > > > each time if log_lock_waits is enabled. To mitigate this, just one idea is > > > > to add a flag to track whether the "still waiting" message has already been > > > > emitted during a call to ProcSleep(), and suppress further messages > > > > once it has been logged. > > > > > > Independently of what's the default, it seems like it'd be valuable to > > > make that interaction better. I think it is reasonable to keep on > > > emitting "still waiting" every so often, but we could probably > > > rate-limit that to a lot less than every 2 seconds. > > > > Attached is a patch that rate-limits the "still waiting on lock" message > > to at most once every 10s. > > > > I chose 10s instead of the suggested 2s, since 2s felt too short. But we can > > discuss the appropriate interval and adjust it if needed. The value is > > currently hard-coded, as making it configurable does not seem necessary. > > I think that 10 seconds is good. I think 10 seconds is way too small. Having one long locker blocking a couple hundred backends is something that happens somewhat regularly. A 10s interval would result in tens of "still waiting" per second. It will just make it harder to sift out what is actually going on between all the waiters squawking "are we there yet?" in a loop. I think something above 5 minutes would be more appropriate. Regards, Ants Aasma
> On Feb 18, 2026, at 13:30, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Fri, Feb 6, 2026 at 9:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The issue is that backends blocked in ProcSleep() are woken up every >>> client_connection_check_interval and may emit a "still waiting" message >>> each time if log_lock_waits is enabled. To mitigate this, just one idea is >>> to add a flag to track whether the "still waiting" message has already been >>> emitted during a call to ProcSleep(), and suppress further messages >>> once it has been logged. >> >> Independently of what's the default, it seems like it'd be valuable to >> make that interaction better. I think it is reasonable to keep on >> emitting "still waiting" every so often, but we could probably >> rate-limit that to a lot less than every 2 seconds. > > Attached is a patch that rate-limits the "still waiting on lock" message > to at most once every 10s. > > I chose 10s instead of the suggested 2s, since 2s felt too short. But we can > discuss the appropriate interval and adjust it if needed. The value is > currently hard-coded, as making it configurable does not seem necessary. > > Thoughts? > > -- > Fujii Masao > <v1-0001-Rate-limit-repeated-still-waiting-on-lock-log-mes.patch> I feel 10 seconds is good. The other thinking is that, the message will only be printed after the first deadlock check is fired. So, if someone setsdeadlock_timeout to a large value, say 30 or 60 seconds, then any waiting log would already be very delayed. In thatcase, the user might not want to log more often than deadlock checks anyway. From this perspective, the rate limit timeoutcould be max(10s, deadlock_timeout). Anyway, this is not a strong opinion. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Wed, Feb 18, 2026 at 6:00 PM Ants Aasma <ants.aasma@cybertec.at> wrote: > I think 10 seconds is way too small. Having one long locker blocking a > couple hundred backends is something that happens somewhat regularly. > A 10s interval would result in tens of "still waiting" per second. It > will just make it harder to sift out what is actually going on between > all the waiters squawking "are we there yet?" in a loop. > > I think something above 5 minutes would be more appropriate. For that scenario, wouldn't it be better to emit the "still waiting" message only once per lock wait (i.e., use the same behavior) regardless of the client_connection_check_interval setting, rather than repeating it every several minutes? Also logging it again after a long delay like 5min could be confusing to users. I used 10s in the patch, on the other hand I'm also ok with preserving the existing behavior (emit once per wait) whether client_connection_check_interval is set or not. Even without periodic log messages, ongoing lock waits can still be monitored via pg_locks. Regards, -- Fujii Masao
Hi Fujii, Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts: 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. 2) The hardcoded 10s constant — could we define it as a named constant in a header file? That way it's easier to find andreason about if it ever needs to change. 3) Would it make sense to add a regression test for this? Something that verifies the rate limiting actually suppresses therepeated messages at the expected interval. Regards,
On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote: > > Hi Fujii, > > Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts: > > 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. Or perhaps they expect the log message to be emitted only once, just after deadlock_timeout, similar to the current behavior when client_connection_check_interval is not set, I guess. I'm now starting thinking it might be better to preserve the existing behavior (emitting the message once per wait) regardless of whether client_connection_check_interval is set, and implement that first. If there is a need to emit the message periodically, we could add that as a separate feature later so that it works independently of the client_connection_check_interval setting. Thought? Regards, -- Fujii Masao
Fujii Masao <masao.fujii@gmail.com>, 9 Mar 2026 Pzt, 15:12 tarihinde şunu yazdı:
Or perhaps they expect the log message to be emitted only once,
just after deadlock_timeout, similar to the current behavior when
client_connection_check_interval is not set, I guess.
I'm now starting thinking it might be better to preserve the existing
behavior (emitting the message once per wait) regardless of whether
client_connection_check_interval is set, and implement that first.
If there is a need to emit the message periodically, we could add that
as a separate feature later so that it works independently of
the client_connection_check_interval setting.
+1 to this idea. It would be a better approach in the future if we need to change the behaviour of emitting logs about these topics.
I do see the trade-off. Put simply with only one message, we can lose visibility into long lock waits. But I think that's a separate concern. If there's a real need for periodic "still waiting" messages in the future, we could introduce a dedicated GUC (something like log_lock_waits_interval) or even a simple constant to control that independently of client_connection_check_interval. That way deadlock detection, connection checking, and lock-wait logging each have their own rules and don't interfere with each other.
Regards.
> On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote: >> >> Hi Fujii, >> >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts: >> >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. If someonehas set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. > > Or perhaps they expect the log message to be emitted only once, > just after deadlock_timeout, similar to the current behavior when > client_connection_check_interval is not set, I guess. > > I'm now starting thinking it might be better to preserve the existing > behavior (emitting the message once per wait) regardless of whether > client_connection_check_interval is set, and implement that first. > > If there is a need to emit the message periodically, we could add that > as a separate feature later so that it works independently of > the client_connection_check_interval setting. > > Thought? Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today, havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Tue, Mar 10, 2026 at 10:42 AM Chao Li <li.evan.chao@gmail.com> wrote: > > > > > On Mar 9, 2026, at 22:12, Fujii Masao <masao.fujii@gmail.com> wrote: > > > > On Mon, Mar 9, 2026 at 6:03 PM Hüseyin Demir <huseyin.d3r@gmail.com> wrote: > >> > >> Hi Fujii, > >> > >> Thanks for the patch. The rate-limiting approach makes sense to me. A couple of thoughts: > >> > >> 1) I think Chao Li's suggestion of using max(10s, deadlock_timeout) as the rate limit interval is worth adopting. Ifsomeone has set deadlock_timeout to, say, 30s or 60s, they've already signaled they don't need frequent lock-wait feedback.Logging every 10s after a 60s deadlock_timeout feels inconsistent with that intent. > > > > Or perhaps they expect the log message to be emitted only once, > > just after deadlock_timeout, similar to the current behavior when > > client_connection_check_interval is not set, I guess. > > > > I'm now starting thinking it might be better to preserve the existing > > behavior (emitting the message once per wait) regardless of whether > > client_connection_check_interval is set, and implement that first. > > > > If there is a need to emit the message periodically, we could add that > > as a separate feature later so that it works independently of > > the client_connection_check_interval setting. > > > > Thought? > > Yeah, IMHO, preserving the existing behavior is preferable. Logically, client_connection_check_interval and log_lock_waitsbelongto two different departments. Even though they cross paths at the implementation level today, havingthe behavior of log_lock_waits change just because client_connection_check_interval is adjusted seems surprising. So, attached is a patch that ensures the "still waiting on lock" message is reported at most once during a lock wait, even if the wait is interrupted. Regards, -- Fujii Masao