Re: Implement waiting for wal lsn replay: reloaded

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Implement waiting for wal lsn replay: reloaded
Дата
Msg-id CAPpHfdtw3_6JtR4S5E1J6RUirqZaJJRr1f63AKNKAjvVJMzffA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Implement waiting for wal lsn replay: reloaded  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Jan 7, 2026, 02:32 Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> Could this be causing the recent flapping failures on CI/macOS in
> recovery/031_recovery_conflict?  I didn't have time to dig personally
> but f30848cb looks relevant:
>
> Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on primary
> error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> conflict with recovery
> DETAIL:  User was or might have been using tablespace that must be dropped.'
> while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> --dbname port=25195
> host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> line 2300.
>
> https://cirrus-ci.com/task/5771274900733952
>
> The master branch in time-descending order, macOS tasks only:
>
>      task_id      | substring |  status
> ------------------+-----------+-----------
>  6460882231754752 | c970bdc0  | FAILED
>  5771274900733952 | 6ca8506e  | FAILED
>  6217757068361728 | 63ed3bc7  | FAILED
>  5980650261446656 | ae283736  | FAILED
>  6585898394976256 | 5f13999a  | COMPLETED
>  4527474786172928 | 7f9acc9b  | COMPLETED
>  4826100842364928 | e8d4e94a  | COMPLETED
>  4540563027918848 | b9ee5f2d  | FAILED
>  6358528648019968 | c5af141c  | FAILED
>  5998005284765696 | e212a0f8  | COMPLETED
>  6488580526178304 | b85d5dc0  | FAILED
>  5034091344560128 | 7dc95cc3  | ABORTED
>  5688692477526016 | bb048e31  | COMPLETED
>  5481187977723904 | d351063e  | COMPLETED
>  5101831568752640 | f30848cb  | COMPLETED <-- the change
>  6395317408497664 | 3f33b63d  | COMPLETED
>  6741325208354816 | 877ae5db  | COMPLETED
>  4594007789010944 | de746e0d  | COMPLETED
>  6497208998035456 | 461b8cc9  | COMPLETED

The failure rates of this are very high - the majority of the CI runs on the
postgres/postgres repos failed since the change went in. Which then also means
cfbot has a very high spurious failure rate. I think we need to revert this
change until the problem has been verified as fixed.

This is fair. I will revert the commit causing the failures in the next few hours.

------
Regards,
Alexander Korotkov

В списке pgsql-hackers по дате отправления: