pgsql: Prevent excessive delays before launching new logrep workers.
От | Tom Lane |
---|---|
Тема | pgsql: Prevent excessive delays before launching new logrep workers. |
Дата | |
Msg-id | E1uU8A9-003A9y-03@gemulon.postgresql.org обсуждение исходный текст |
Список | pgsql-committers |
Prevent excessive delays before launching new logrep workers. The logical replication launcher process would sometimes sleep for as much as 3 minutes before noticing that it is supposed to launch a new worker. This could happen if (1) WaitForReplicationWorkerAttach absorbed a process latch wakeup that was meant to cause ApplyLauncherMain to do work, or (2) logicalrep_worker_launch reported failure, either because of resource limits or because the new worker terminated immediately. In case (2), the expected behavior is that we retry the launch after wal_retrieve_retry_interval, but that didn't reliably happen. It's not clear how often such conditions would occur in the field, but in our subscription test suite they are somewhat common, especially in tests that exercise cases that cause quick worker failure. That causes the tests to take substantially longer than they ought to do on typical setups. To fix (1), make WaitForReplicationWorkerAttach re-set the latch before returning if it cleared it while looping. To fix (2), ensure that we reduce wait_time to no more than wal_retrieve_retry_interval when logicalrep_worker_launch reports failure. In passing, fix a couple of perhaps-hypothetical race conditions, e.g. examining worker->in_use without a lock. Backpatch to v16. Problem (2) didn't exist before commit 5a3a95385 because the previous code always set wait_time to wal_retrieve_retry_interval when launching a worker, regardless of success or failure of the launch. That behavior also greatly mitigated problem (1), so I'm not excited about adapting the remainder of the patch to the substantially-different code in older branches. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/817604.1750723007@sss.pgh.pa.us Backpatch-through: 16 Branch ------ REL_17_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/9f33300e69b8a926b5168cd9be3cd7cd11396bbd Modified Files -------------- src/backend/replication/logical/launcher.c | 40 +++++++++++++++++++++++------ src/backend/replication/logical/tablesync.c | 19 +++++++++----- 2 files changed, 44 insertions(+), 15 deletions(-)
В списке pgsql-committers по дате отправления: