We found on ETCD1 such errors in the syslog:
Feb 1 16:12:01 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:12:02 etcd1 etcd: got unexpected response error (etcdserver: request timed out) [merged 1 repeated lines in
1.21s]
Feb 1 16:12:03 etcd1 etcd: got unexpected response error (etcdserver: request timed out) [merged 1 repeated lines in
1s]
Feb 1 16:12:20 etcd1 etcd: sync duration of 29.69857369s, expected less than 1s
Feb 1 16:26:55 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:27:03 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:27:17 etcd1 etcd: sync duration of 1m0.745329542s, expected less than 1s
So, this problem related to SAN => i\o freeze many VMs including DB's, ETCD nodes => etcd i\o long delay affect =>
patronireaction to demoted self postgresql instance
Sergey, Thank you much for your support!
Best regards,
Ulaev Alexander
-----Original Message-----
From: Sergei Kornilov [mailto:sk@zsrv.org]
Sent: Thursday, February 3, 2022 1:06 PM
To: Улаев Александр Сергеевич <alexander.ulaev@rtlabs.ru>
Cc: pgsql-bugs@lists.postgresql.org; PG Bug reporting form <noreply@postgresql.org>
Subject: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request
Hello
> 2022-02-01 16:12:24,928 ERROR: failed to update leader lock
> 2022-02-01 16:12:27,063 INFO: demoted self because failed to update leader lock in DCS
Between these two messages, an immediate shutdown is called:
https://github.com/zalando/patroni/blob/v2.1.2/patroni/ha.py#L1045
Regards, Sergei