Hello
Seems I have testcase for this report:
primary + streaming replica with such settings:
log_lock_waits = 'on'
max_standby_streaming_delay = '-1'
fsync = off
synchronous_commit = off
(seems this is rare race condition and I can't catch on slow fsync)
Create some table for queries:
create table tablename as select generate_series(1,100) as i;
Run on primary:
pgbench -f primary.sql -c 1 -t 100000 --port 5555 postgres
primary.sql is:
vacuum full pg_statistic;
vacuum full tablename;
(some activity with AccessExclusiveLock)
On replica:
pgbench -f ro.sql --time=300 -n -c 20 --port 5556 postgres
Script content:
\set i random(1,100)
select * from tablename where i = :i;
Usually replica kills queries with ERROR: deadlock detected and pgbench stops. But sometimes (usually less than 5-10
timeson my host) both startup and backend with running query will start waiting something. Also new connections may
becomein "startup waiting" status indefinitely.
With another max_standby_streaming_delay queries are killed, but only after this timeout (as reported in this bug
report).I think this should be detected as deadlock but not happens for some reason.
regards, Sergei