Обсуждение: statement_timeout has no effect if sync standby is unavailable
Hi Team,
I am writing a monitoring tool to find if the database is hung.
One of the scenarios is to stop the standby but keep 'synchronous_standby_names=standby1' and 'synchronous_commit=remote_apply' unchanged on primary.
This way the DML queries on primary will be hung. I tried to use 'statement_timeout=5s' to timeout the query but it does not work.
Is there another way to timeout query and report an error ? and why is statement_timeout not working?
Thanks,
Nikhil
Hi Team,
We have also seen this scenario when we froze the data mount point and run a DML
Freeze mountpoint
date;fsfreeze --freeze /postgres
The statement is just stuck even thought statement_timeout is set
datid | 14175 datname | postgres pid | 5789 leader_pid | usesysid | 10 usename | postgres application_name | hang_monitor client_addr | 127.0.0.1 client_hostname | client_port | 37440 backend_start| 2023-11-01 10:05:51.090023+00 xact_start | 2023-11-01 10:05:51.167816+00 query_start | 2023-11-01 10:05:51.167816+00 state_change | 2023-11-01 10:05:51.167817+00 wait_event_type | IO wait_event | WALWrite state | active backend_xid | 43242183 backend_xmin | query | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1; backend_type | client backend
Thanks,
Nikhil
On Fri, Oct 27, 2023 at 3:43 PM Nikhil Shetty <nikhil.dba04@gmail.com> wrote:
Hi Team,I am writing a monitoring tool to find if the database is hung.One of the scenarios is to stop the standby but keep 'synchronous_standby_names=standby1' and 'synchronous_commit=remote_apply' unchanged on primary.This way the DML queries on primary will be hung. I tried to use 'statement_timeout=5s' to timeout the query but it does not work.Is there another way to timeout query and report an error ? and why is statement_timeout not working?Thanks,Nikhil
On Wed, 2023-11-01 at 15:44 +0530, Nikhil Shetty wrote: > We have also seen this scenario when we froze the data mount point and run a DML > > Freeze mountpoint > date;fsfreeze --freeze /postgres > The statement is just stuck even thought statement_timeout is set > > datid | 14175 > datname | postgres > pid | 5789 > leader_pid | > usesysid | 10 > usename | postgres > application_name | hang_monitor > client_addr | 127.0.0.1 > client_hostname | > client_port | 37440 > backend_start| 2023-11-01 10:05:51.090023+00 > xact_start | 2023-11-01 10:05:51.167816+00 > query_start | 2023-11-01 10:05:51.167816+00 > state_change | 2023-11-01 10:05:51.167817+00 > wait_event_type | IO > wait_event | WALWrite > state | active > backend_xid | 43242183 > backend_xmin | > query | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1; > backend_type | client backend Check with "ps": perhaps the backend process is in uninterruptible sleep, waiting for I/O. Yours, Laurenz Albe
Hi Laurenz
I need to check the process state but statement_timeout should timeout such queries, no?
Thanks,
Nikhil
On Wed, 1 Nov 2023 at 22:09, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2023-11-01 at 15:44 +0530, Nikhil Shetty wrote:
> We have also seen this scenario when we froze the data mount point and run a DML
>
> Freeze mountpoint
> date;fsfreeze --freeze /postgres
> The statement is just stuck even thought statement_timeout is set
>
> datid | 14175
> datname | postgres
> pid | 5789
> leader_pid |
> usesysid | 10
> usename | postgres
> application_name | hang_monitor
> client_addr | 127.0.0.1
> client_hostname |
> client_port | 37440
> backend_start| 2023-11-01 10:05:51.090023+00
> xact_start | 2023-11-01 10:05:51.167816+00
> query_start | 2023-11-01 10:05:51.167816+00
> state_change | 2023-11-01 10:05:51.167817+00
> wait_event_type | IO
> wait_event | WALWrite
> state | active
> backend_xid | 43242183
> backend_xmin |
> query | update HEARTBEAT set last_updated_time = timezone('UTC', now()) where ID = 1;
> backend_type | client backend
Check with "ps": perhaps the backend process is in uninterruptible sleep, waiting
for I/O.
Yours,
Laurenz Albe
Nikhil Shetty <nikhil.dba04@gmail.com> writes: > I need to check the process state but statement_timeout should timeout such > queries, no? No, I don't think that should be the policy, and if it doesn't do so now I'm content to leave it like that. Once we have committed locally and started to wait for a sync standby, we are between a rock and a hard place: we can't back out the commit. If we were to allow a timeout error to occur, we'd have a choice of reporting that the commit failed (a lie) or that it succeeded (also a lie, given that the promise of sync commit is that we don't report commit until it's persisted on the standby too). Neither of these are preferable to ignoring the timeout. tl;dr: if your standby is not 100% reliable, enabling sync standby is a poor choice. regards, tom lane
Got it but in my second scenario there was no sync standby
We froze the data mount point and ran queries and all of them were hung. It is waiting for WalWrite - in this case, can it not timeout without committing?
We used statement_timeout of 2s for testing
Thanks,
Nikhil
On Thu, 2 Nov 2023 at 21:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nikhil Shetty <nikhil.dba04@gmail.com> writes:
> I need to check the process state but statement_timeout should timeout such
> queries, no?
No, I don't think that should be the policy, and if it doesn't do so
now I'm content to leave it like that. Once we have committed locally
and started to wait for a sync standby, we are between a rock and a
hard place: we can't back out the commit. If we were to allow a
timeout error to occur, we'd have a choice of reporting that the
commit failed (a lie) or that it succeeded (also a lie, given that
the promise of sync commit is that we don't report commit until it's
persisted on the standby too). Neither of these are preferable to
ignoring the timeout.
tl;dr: if your standby is not 100% reliable, enabling sync standby
is a poor choice.
regards, tom lane