Some thoughts about the TAP tests' wait_for_catchup()

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Some thoughts about the TAP tests' wait_for_catchup()
Дата	29 сентября 2021 г. 01:17:08
Msg-id	2919102.1632867428@sss.pgh.pa.us обсуждение исходный текст
Ответы	Re: Some thoughts about the TAP tests' wait_for_catchup() (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

I noticed that some test scripts, instead of using wait_for_catchup
to wait for replication catchup, use ad-hoc code like

my $primary_lsn =
  $primary->safe_psql('postgres', 'select pg_current_wal_lsn()');
$standby->poll_query_until('postgres',
    qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
  or die "standby never caught up";

This does not look much like what wait_for_catchup does, which
typically ends up issuing queries like

SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming'
FROM pg_catalog.pg_stat_replication WHERE application_name = 'standby';

It seems to me that for most purposes wait_for_catchup's approach is
strictly worse, for two reasons:

1. It continually recomputes the primary's pg_current_wal_lsn().
Thus, if anything is happening on the primary (e.g. autovacuum),
we're moving the goalposts as to how much the standby is required
to replay before we continue.  That slows down the tests, makes
them less reproducible, and could help to mask actual bugs of the
form this-hasn't-been-done-when-it-should-have.

2. It's querying the primary's view of the standby's state, which
introduces a reporting delay.  This has exactly the same three
problems as the other point.

So I think we ought to change wait_for_catchup to do things more
like the first way, where feasible.  This seems pretty easy for
the call sites that provide the standby's PostgresNode; but it's
not so easy for the ones that just provide a subscription name.

So my first question is: to what extent are these tests specifically
intended to exercise the replay reporting mechanisms?  Would we lose
important coverage if we changed *all* the call sites to use the
query-the-standby approach?  If so, which ones might be okay to
change?

The second question is what do we want the API to look like?  This
is simple if we can change all wait_for_catchup callers to pass
the standby node.  Otherwise, the choices are to split wait_for_catchup
into two implementations, or to keep its current API and have it
do significantly different things depending on isa("PostgresNode").

Thoughts?

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Daniel Fone
Дата: 29 сентября 2021 г., 00:58:49
Сообщение: Re: pgcrypto support for bcrypt $2b$ hashes

Следующее

От: Michael Paquier
Дата: 29 сентября 2021 г., 01:25:38
Сообщение: Re: Incorrect fd handling in syslogger.c for Win64 under EXEC_BACKEND

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Some thoughts about the TAP tests' wait_for_catchup()

Предыдущее

Следующее