BUG #17690: Nonresponsive client on replica can halt replication indefinitely

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17690: Nonresponsive client on replica can halt replication indefinitely
Дата
Msg-id 17690-d4f1a1944550b801@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17690: Nonresponsive client on replica can halt replication indefinitely
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17690
Logged by:          Jacob Baskin
Email address:      jacob.baskin@gmail.com
PostgreSQL version: 13.0
Operating system:   Linux (CentOS 7)
Description:

We have discovered that a badly-behaved client connected to a database hot
replica can indefinitely block replication from progressing. The client's
back-end gets into a state where it does not stop when the recovery process
tries to cancel conflicting queries, as long as there is still pending data
to be written.

To trigger this, the client needs to be:
- Actively running a query which conflicts with recovery
- Not reading data from its socket about query results (e.g., run "select *
from large_table" in psql and then Ctrl-Z as results are being streamed)

We believe this failure mode is deterministic. This bug definitely affects
postgres 13, and we believe it is still present in HEAD. We are running
Linux (Centos 7).

The sequence of events is as follows:

1. Postmaster tries to kill a connection that conflicts with recovery
(standby.c:393)
2. The connection process gets SIGUSR1.
3. This invokes RecoveryConflictInterrupt, which sets QueryCancelPending,
but NOT (generally) ProcDiePending (postgres.c:3039)
4. The connection process repeatedly processes ProcessClientWriteInterrupt,
which will handle interrupts if ProcDiePending is set but not otherwise
(postgres.c:526)

We believe the appropriate fix is to check for RecoveryConflictPending in
addition to ProcDiePending on postgres.c:526.

This fix would be a one-line patch which we are happy to submit but first
want to make sure that this is the correct approach.

Thanks!


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17689: Two UPDATE operators in common table expressions (CTE) perform not as expected
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #17689: Two UPDATE operators in common table expressions (CTE) perform not as expected