Re: pg_rewind WAL segments deletion pitfall
От | torikoshia |
---|---|
Тема | Re: pg_rewind WAL segments deletion pitfall |
Дата | |
Msg-id | 8b385bb6d5f87e54c1c6333fece0444a@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: pg_rewind WAL segments deletion pitfall (torikoshia <torikoshia@oss.nttdata.com>) |
Ответы |
Re: pg_rewind WAL segments deletion pitfall
|
Список | pgsql-hackers |
On 2022-09-29 17:18, Polina Bungina wrote: > I agree with your suggestions, so here is the updated version of > patch. Hope I haven't missed anything. Thanks for the patch, I've marked this as ready-for-committer. BTW, this issue can be considered a bug, right? I think it would be appropriate to provide backpatch. On 2023-06-29 18:42, torikoshia wrote: > On 2023-06-29 10:25, Kyotaro Horiguchi wrote: > Thanks for the comment! > >> At Wed, 28 Jun 2023 22:28:13 +0900, torikoshia >> <torikoshia@oss.nttdata.com> wrote in >>> >>> On 2022-09-29 17:18, Polina Bungina wrote: >>> > I agree with your suggestions, so here is the updated version of >>> > patch. Hope I haven't missed anything. >>> > Regards, >>> > Polina Bungina >>> >>> Thanks for working on this! >>> It seems like we are also facing the same issue. >> >> Thanks for looking this. >> >>> I tested the v3 patch under our condition, old primary has succeeded >>> to become new standby. >>> >>> >>> BTW when I used pg_rewind-removes-wal-segments-reproduce.sh attached >>> in [1], old primary also failed to become standby: >>> >>> FATAL: could not receive data from WAL stream: ERROR: requested WAL >>> segment 000000020000000000000007 has already been removed >>> >>> However, I think this is not a problem: just adding restore_command >>> like below fixed the situation. >>> >>> echo "restore_command = '/bin/cp `pwd`/newarch/%f %p'" >> >>> oldprim/postgresql.conf >> >> I thought on the same line at first, but that's not the point >> here. > > Yes. I don't think adding restore_command solves the problem and > modification to prevent deleting necessary WAL like proposed > patch is necessary. > > I added restore_command since > pg_rewind-removes-wal-segments-reproduce.sh failed to catch up > even after applying v3 patch and prevent pg_rewind from delete > WALs(*), because some necessary WALs were archived. > > It's not a problem we are discussing here, but I wanted to get > the script to work to the point where old primary could > successfully catch up to new primary. > > (*)Specifically, running the script without apply the patch, > recovery failed because 000000010000000000000003 which has > already been removed. This file was deleted by pg_rewind as > we know. > OTHO without the restore_command, recovery failed because > 000000020000000000000007 has already been removed even after > applying the patch. > >> The problem we want ot address is that pg_rewind ultimately >> removes certain crucial WAL files required for the new primary to >> start, despite them being present previously. > > I thought it's not "new primary", but "old primary". > >> In other words, that >> restore_command works, but it only undoes what pg_rewind wrongly did, >> resulting in unnecessary consupmtion of I/O and/or network bandwidth >> that essentially serves no purpose. > > As far as I tested using the script and the situation we are facing, > after promoting newprim necessary WAL(000000010000000000000003..) were > not available and just adding restore_command did not solve the > problem. > >> pg_rewind already has a feature that determines how each file should >> be handled, but it is currently making wrong dicisions for WAL >> files. The goal here is to rectify this behavior and ensure that >> pg_rewind makes the right decisions. > > +1 -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter EisentrautДата:
Сообщение: Re: dubious warning: FORMAT JSON has no effect for json and jsonb types