RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

Поиск
Список
Период
Сортировка
От B Ganesh Kishan
Тема RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER
Дата
Msg-id SA1PR19MB5088936CE8AE66E9AD09429FB7219@SA1PR19MB5088.namprd19.prod.outlook.com
обсуждение исходный текст
Ответ на RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER  (B Ganesh Kishan <bkishan@commvault.com>)
Список pgsql-bugs
Adding Peter/Leif , who worked on this change.

Hi Peter/Lief,

Could you please help here?

Server start ended with FATAL error and specified the last completed transaction time.

2022-01-27 15:35:33.962 IST [29209] LOG:  starting point-in-time recovery to 2022-01-27 12:56:34+05:30
2022-01-27 15:35:33.966 IST [29209] LOG:  restored log file "00000009.history" from archive
2022-01-27 15:35:33.981 IST [29209] LOG:  restored log file "000000090000000000000061" from archive
2022-01-27 15:35:42.121 IST [29209] LOG:  redo starts at 0/61000060
2022-01-27 15:35:42.122 IST [29209] LOG:  consistent recovery state reached at 0/61000170
2022-01-27 15:35:42.123 IST [29207] LOG:  database system is ready to accept read only connections
2022-01-27 15:35:42.139 IST [29209] LOG:  restored log file "000000090000000000000062" from archive
2022-01-27 15:35:42.456 IST [29209] LOG:  restored log file "000000090000000000000063" from archive
2022-01-27 15:35:44.923 IST [29209] LOG:  restored log file "000000090000000000000064" from archive
2022-01-27 15:35:45.897 IST [29209] LOG:  restored log file "000000090000000000000065" from archive
cp: cannot stat '/var/lib/pgsql/13/wal/000000090000000000000066': No such file or directory
2022-01-27 15:35:46.280 IST [29209] LOG:  redo done at 0/65000148
2022-01-27 15:35:46.280 IST [29209] LOG:  last completed transaction was at log time 2022-01-25 15:05:34.132546+05:30
2022-01-27 15:35:46.280 IST [29209] FATAL:  recovery ended before configured recovery target was reached
2022-01-27 15:35:46.281 IST [29207] LOG:  startup process (PID 29209) exited with exit code 1
2022-01-27 15:35:46.281 IST [29207] LOG:  terminating any other active server processes
2022-01-27 15:35:46.326 IST [29207] LOG:  database system is shut down

Provided the recovery target time as the last completed transaction time specified in the above log. But the server
startis still failing with same error. 

2022-01-27 19:23:30.148 IST [7598] LOG:  starting PostgreSQL 13.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623(Red Hat 4.8.5-44), 64-bit 
2022-01-27 19:23:30.148 IST [7598] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-01-27 19:23:30.148 IST [7598] LOG:  listening on IPv6 address "::", port 5432
2022-01-27 19:23:30.149 IST [7598] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-01-27 19:23:30.150 IST [7598] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-27 19:23:30.153 IST [7600] LOG:  database system was interrupted while in recovery at log time 2022-01-25
14:59:53IST 
2022-01-27 19:23:30.153 IST [7600] HINT:  If this has occurred more than once some data might be corrupted and you
mightneed to choose an earlier recovery target. 
cp: cannot stat '/var/lib/pgsql/13/wal/0000000A.history': No such file or directory
2022-01-27 19:23:30.206 IST [7600] LOG:  starting point-in-time recovery to 2022-01-25 15:05:34.132546+05:30
2022-01-27 19:23:30.209 IST [7600] LOG:  restored log file "00000009.history" from archive
2022-01-27 19:23:30.247 IST [7600] LOG:  restored log file "000000090000000000000061" from archive
2022-01-27 19:23:30.502 IST [7600] LOG:  redo starts at 0/61000060
2022-01-27 19:23:30.528 IST [7600] LOG:  restored log file "000000090000000000000062" from archive
2022-01-27 19:23:31.222 IST [7600] LOG:  restored log file "000000090000000000000063" from archive
2022-01-27 19:23:31.942 IST [7600] LOG:  restored log file "000000090000000000000064" from archive
2022-01-27 19:23:33.007 IST [7600] LOG:  consistent recovery state reached at 0/640016A0
2022-01-27 19:23:33.008 IST [7598] LOG:  database system is ready to accept read only connections
2022-01-27 19:23:33.025 IST [7600] LOG:  restored log file "000000090000000000000065" from archive
cp: cannot stat '/var/lib/pgsql/13/wal/000000090000000000000066': No such file or directory
2022-01-27 19:23:33.409 IST [7600] LOG:  redo done at 0/65000148
2022-01-27 19:23:33.409 IST [7600] LOG:  last completed transaction was at log time 2022-01-25 15:05:34.132546+05:30
2022-01-27 19:23:33.409 IST [7600] FATAL:  recovery ended before configured recovery target was reached
2022-01-27 19:23:33.411 IST [7598] LOG:  startup process (PID 7600) exited with exit code 1
2022-01-27 19:23:33.411 IST [7598] LOG:  terminating any other active server processes
2022-01-27 19:23:33.415 IST [7598] LOG:  database system is shut down

Thanks and Regards,
Ganesh Kishan

-----Original Message-----
From: B Ganesh Kishan
Sent: 24 January 2022 11:25
To: Tom Lane <tgl@sss.pgh.pa.us>; David G. Johnston <david.g.johnston@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org; Meera Nair <mnair@commvault.com>
Subject: RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

Hi Tom/David,

I understand the situation now.

The problem here for us is as below:
1) We run a Full backup using pg_start_backup()--> backup data directory --> pg_stop_backup() --> backup all archive
logfiles. 
Call this as FULL_BACKUP1

2) After full, log only backups are run periodically --> Using pg_switch_wal() and backup only archive log files.
During the log backups, there is a chance that no transaction is run in-between, but we there is no way to know about
thatat this point and we backup the switched logs. 
Example: In below 3 log only backups, say we do not have any transactions run in between.
    a. LOG_BACKUP_1
    b. LOG_BACKUP_2
    c. LOG_BACKUP_3

3) Now in our case if the user wants to restore to the time when LOG_BACKUP_2 is run, we specify the
recovery_target_timeas time when LOG_BACKUP_2 was run and try to start the server. But it fails because in PG13 we
cannotspecify time later than the last available transaction and we need to specify the time when the last transaction
isavailable(Is there any way to get this time?). 

To handle this, we would need any of these resolution/Workarounds:
1) During restore, is there any way we can get the last available transaction near to the specified retore time?
2) Is there any way we can make sure archive logs are generated only when there is a transaction available?
3) It would be very helpful if there are any settings to override old feature(Stop and bring up the server with latest
availablelogs) over the new one.  

Thanks and Regards,
Ganesh Kishan

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 21 January 2022 20:51
To: David G. Johnston <david.g.johnston@gmail.com>
Cc: B Ganesh Kishan <bkishan@commvault.com>; pgsql-bugs@lists.postgresql.org; Meera Nair <mnair@commvault.com>
Subject: Re: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

External email. Inspect before opening.



"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Fri, Jan 21, 2022 at 4:20 AM B Ganesh Kishan
> <bkishan@commvault.com>
> wrote:
>> The problem is that we are providing a time target that Postgres does
>> not know how to reach. This is because there are no transactions in
>> between the backups.

> I don't quite follow the overall situation but given your observation
> and apparent acceptance of the pre-v13 behavior just don't specify a
> restore point and let WAL replay everything.

Yeah.  If I'm understanding the situation, when you specify a target time that is later than the last transaction
availablefrom WAL, older versions silently assumed that stopping with the last available transaction is OK. 
Newer ones complain because it's not clear whether that's OK --- in particular, there's no good way to be sure that no
WALis missing. 

On the whole I think that's a good change.  I can sympathize with the complaint that it creates additional complexity
forrestore scripts, but I'm a little dubious that this is something you'd be wanting to script anyway. 

                        regards, tom lane



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Christophe Courtois
Дата:
Сообщение: No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER