RE: Stronger safeguard for archive recovery not to miss data

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: Stronger safeguard for archive recovery not to miss data
Дата
Msg-id OSBPR01MB48889E884ABE377AE505F543EDCD0@OSBPR01MB4888.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Stronger safeguard for archive recovery not to miss data  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Stronger safeguard for archive recovery not to miss data  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-hackers
Hi


On Thursday, November 26, 2020 4:29 PM
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com"
> <osumi.takamichi@fujitsu.com> wrote in
> > The attached patch is intended to prevent a scenario that archive
> > recovery hits WALs which come from wal_level=minimal and the server
> > continues to work, which was discussed in the thread of [1].
> > The motivation is to protect that user ends up with both getting
> > replica that could miss data and getting the server to miss data in targeted
> recovery mode.
> >
> > About how to modify this, we reached the consensus in the thread.
> > It is by changing the ereport's level from WARNING to FATAL in
> CheckRequiredParameterValues().
> >
> > In order to test this fix, what I did is
> > 1 - get a base backup during wal_level is replica
> > 2 - stop the server and change the wal_level from replica to minimal
> > 3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
> > 4 - stop the server and make the wal_level back to replica
> > 5 - start the server again
> > 6 - execute archive recoveries in both cases
> >     (1) by editing the postgresql.conf and
> >     touching recovery.signal in the base backup from 1th step
> >     (2) by making a replica with standby.signal
> > * During wal_level is replica, I enabled archive_mode in this test.
> >
> > First of all, I confirmed the server started up without this patch.
> > After applying this safeguard patch, I checked that the server cannot
> > start up any more in the scenario case.
> > I checked the log and got the result below with this patch.
> >
> > 2020-11-26 06:49:46.003 UTC [19715] FATAL:  WAL was generated with
> > wal_level=minimal, data may be missing
> > 2020-11-26 06:49:46.003 UTC [19715] HINT:  This happens if you
> temporarily set wal_level=minimal without taking a new base backup.
> >
> > Lastly, this should be backpatched.
> > Any comments ?
>
> Perhaps we need the TAP test that conducts the above steps.
I added the TAP tests to reproduce and share the result,
using the case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files in
src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.

When the attached patch is applied,
my TAP tests are executed like other ones like below.

t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok
t/022_archive_recovery.pl ............ ok
All tests successful.

Also, I confirmed that there's no regression by make check-world.
Any comments ?

Best,
    Takamichi Osumi


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Huge memory consumption on partitioned table with FKs
Следующее
От: "tsunakawa.takay@fujitsu.com"
Дата:
Сообщение: RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently