At Wed, 26 Jan 2022 18:45:53 -0500, Andrew Dunstan <andrew@dunslane.net> wrote in
> It's very unlikely any of this is your fault. In any case, intermittent
> failures are very hard to nail down.
The primary starts at 2022-01-26 16:30:06.613 the accepted a connectin
from the standby at 2022-01-26 16:30:09.911.
P: 2022-01-26 16:30:06.613 UTC [74668:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe
(Rev2,Built by MSYS2 project) 10.3.0, 64-bit
S: 2022-01-26 16:30:09.637 UTC [72728:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe
(Rev2,Built by MSYS2 project) 10.3.0, 64-bit
P: 2022-01-26 16:30:09.911 UTC [74096:3] [unknown] LOG: replication connection authorized: user=pgrunner
application_name=standby
S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
After this the primary restarts.
P: 2022-01-26 16:30:11.832 UTC [74668:7] LOG: database system is shut down
P: 2022-01-26 16:30:12.010 UTC [72140:1] LOG: starting PostgreSQL 15devel on x86_64-w64-mingw32, compiled by gcc.exe
(Rev2,Built by MSYS2 project) 10.3.0, 64-bit
But the standby doesn't notice the disconnection and continue with the
poll_query_until() on the stale connection. But the query should have
executed after reconnection finally. The following log lines are not
thinned out.
S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
S: 2022-01-26 16:30:12.825 UTC [72760:1] [unknown] LOG: connection received: host=127.0.0.1 port=60769
S: 2022-01-26 16:30:12.830 UTC [72760:2] [unknown] LOG: connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner"
method=sspi
(C:/tools/msys64/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/modules/commit_ts/tmp_check/t_003_standby_2_standby_data/pgdata/pg_hba.conf:2)
S: 2022-01-26 16:30:12.830 UTC [72760:3] [unknown] LOG: connection authorized: user=pgrunner database=postgres
application_name=003_standby_2.pl
S: 2022-01-26 16:30:12.838 UTC [72760:4] 003_standby_2.pl LOG: statement: SELECT '0/303C7D0'::pg_lsn <=
pg_last_wal_replay_lsn()
Since the standby dones't receive WAL from the restarted server so the
poll_query_until() doesn't return.
I'm not sure why walsender of the standby continued running not knowing the primary has been once dead for such a long
time.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center