RE: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id OS3PR01MB9882FED1F0060468FB01B9DAF583A@OS3PR01MB9882.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
RE: [PoC] pg_upgrade: allow to upgrade publisher node  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
Dear hackers,

> > >
> > > Pushed!
> >
> > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > the last patch committed. Is there further work that needs to be
> > re-attached and/or rebased?
> >
> 
> No. I have marked it as committed.
>

I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.

How do you think?


[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-11-27%2020%3A52%3A10
[2]:
```
...
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  fatal

Your installation contains logical replication slots that can't be upgraded.
You can remove invalid slots and/or consume the pending WAL for other slots,
and then restart the upgrade.
A list of the problematic slots is in the file:

/home/bf/bf-build/skink-master/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231127T220024.480/invalid_logical_slots.txt
Failure, exiting
[22:01:20.362](86.645s) not ok 10 - run of pg_upgrade of old cluster
...
```
[3]:
```
...
2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG:  released logical replication slot "regress_sub"
2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG:  received fast shutdown request
2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG:  aborting any active transactions
*2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL:  terminating autovacuum process due to
administratorcommand*
 
2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG:  background worker "logical replication launcher" (PID
3560674)exited with exit code 1
 
...
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: pg_upgrade and logical replication