Michael Paquier wrote on 2/8/22 5:35 PM:
> On Tue, Feb 08, 2022 at 01:23:34PM -0800, Peter Geoghegan wrote:
>> I find the idea that we'd fail to WAL-log information that is needed
>> during Hot Standby (to prevent this race condition) plausible.
>> Michael?
> Yeah, REINDEX relies on some existing index definition, so it feels
> like we are missing a piece related to invalid indexes in all that. A
> main difference is the lock level, as exclusive locks are getting
> logged so the standby can react and wait on that. The 30-minute mark
> is interesting. Ben, did you change any replication-related GUCs that
> could influence that? Say, wal_receiver_timeout, hot_standby_feedback
> or max_standby_streaming_delay?
Oh, to be clear, the 30 minute mark is more "the loop has always failed
this far into it" and sometimes that's 5 minutes and sometimes it's
more, but I've never seen it take more than somewhere in the 20s. I was
thinking it was just because of the race condition, but, to answer your
question, yes, we have tuned some replication parameters. Here are the
ones you asked about; did you want to see the value of any others?
=# show wal_receiver_timeout ;
wal_receiver_timeout
──────────────────────
1min
(1 row)
04:26:27 db: postgres@postgres, pid:29507
=# show hot_standby_feedback ;
hot_standby_feedback
──────────────────────
on
(1 row)
04:26:40 db: postgres@postgres, pid:29507
=# show max_standby_streaming_delay ;
max_standby_streaming_delay
─────────────────────────────
10min
(1 row)