Re: recovery_target_time ignored ?
От | David Steele |
---|---|
Тема | Re: recovery_target_time ignored ? |
Дата | |
Msg-id | 55117496.1010501@pgmasters.net обсуждение исходный текст |
Ответ на | Re: recovery_target_time ignored ? (Venkata Balaji N <nag1010@gmail.com>) |
Ответы |
Re: recovery_target_time ignored ?
(Venkata Balaji N <nag1010@gmail.com>)
|
Список | pgsql-hackers |
On 3/24/15 6:12 AM, Venkata Balaji N wrote: > > On Tue, Mar 24, 2015 at 9:54 AM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > On 3/23/15 12:42 AM, Venkata Balaji N wrote: > > Hi, > > > > Assuming that this might require a patch, i am posting this in > > pgsql-hackers. Apologies, if this is not the appropriate mailing list to > > start this discussion. > > > > I performed a PITR and saw the below message in the log file is a bit > > confusing. > > > > 2015-03-23 13:49:09.816 GMT-10 DB= PID=4707 LOG: *database system was > > interrupted; last known up at 2015-03-23 10:30:26 GMT-10* > > 2015-03-23 13:49:09.817 GMT-10 DB= PID=4707 LOG: *starting > point-in-time > > recovery to 2015-03-23 10:00:26+10* > > 2015-03-23 13:49:09.827 GMT-10 DB= PID=4707 LOG: restored log file > > "000000010000000B00000020" from archive > > 2015-03-23 13:49:09.888 GMT-10 DB= PID=4707 LOG: redo starts at B/20000090 > > 2015-03-23 13:49:09.937 GMT-10 DB= PID=4707 LOG: consistent recovery > > state reached at B/200000B8 > > 2015-03-23 13:49:09.947 GMT-10 DB= PID=4707 LOG: restored log file > > "000000010000000B00000021" from archive > > 2015-03-23 13:49:09.950 GMT-10 DB= PID=4707 LOG: *recovery stopping > > before commit of transaction 16267, time 2015-03-23 13:22:37.53007+10* > > > > > > By mistake i gave "recovery_target_time" as 10:00 GMT which is 25/30 > > minutes behind the backup start/end time registered in the backup_label. > > > > The parameter recovery_target_time is ignored and recovery proceeds > > further applying all the available WAL Archive files finally ends up > > bringing up the database. > > > > I think it would make sense if the recovery does not proceed any further > > and error out with a message like "recovery_target_time is behind the > > backup time.. please consider using the backup taken prior to the > > recovery_target_time" > > I just tried it with 9.3.5 and I do get an error: > > LOG: starting point-in-time recovery to 2015-03-23 17:26:02.721307-04 > LOG: restored log file "000000010000000000000003" from archive > LOG: redo starts at 0/30000C8 > LOG: recovery stopping before commit of transaction 1001, time > 2015-03-23 18:26:01.012593-04 > LOG: redo done at 0/3000228 > FATAL: requested recovery stop point is before consistent recovery > point > > > That makes more sense. This is what i was expecting to happen. Then, i > think it is something to do with the timestamp format. > > > Here's my recovery.conf file: > > restore_command = '/usr/bin/pg_backrest.pl <http://pg_backrest.pl> > --stanza=db archive-get %f "%p"' > recovery_target_time = '2015-03-23 17:26:02.721307 EDT' > > > recovery.conf file is as follows : > > > > restore_command='cp /data/pgdata9400backup/pgwalarch9400backup/%f %p ' > > recovery_target_time='2015-03-23 10:00:26 GMT-10' > > recovery_target_inclusive='true' > > You have '2015-03-23 10:00:26 GMT-10' in recovery.conf but the log says > 'starting point-in-time recovery to 2015-03-23 10:00:26+10'. Note the - > vs +. > > > This is my confusion too. I picked up the time format from the backup label. > > Could you check your log and recovery.conf and make sure the timezone > offsets are actually different? > > > I am not sure why the timestamp is taken as "2015-03-23 10:00:26+10" > for "2015-03-23 10:00:26 GMT-10'". > > My another system's timestamp format is also AEDT. I did another test > and I get the same problem. > > Below is the text from the log file. I gave a recovery_target_time > almost a day behind the consistent recovery point. Still, the > recovery_target_time is taken as "*2015-03-23 11:07:10+11*" for > "*2015-03-23 11:07:10 AEDT*" > > 2015-03-24 18:42:44.608 AEDT LOG: database system was interrupted; > last known up at *2015-03-24 18:20:53 AEDT* > > 2015-03-24 18:42:44.608 AEDT LOG: starting point-in-time recovery > to *2015-03-23 11:07:10+11* > > cp: /disk3/pgwalarch9401/0000000100000003000000FE: No such file or > directory > > 2015-03-24 18:42:44.626 AEDT LOG: record with zero length at 3/FE000090 > > Below is my recovery.conf file > > restore_command='cp /disk3/pgwalarch9401/%f %p' > > recovery_target_time='*2015-03-23 11:07:10 AEDT*' > > recovery_target_inclusive=true > > I am checking if this has something to do with my system timestamp format. > > Not sure what am i missing. Do i need to give any special time format ? This is a weird one. I can reproduce the problem by setting timezone=timezone='Australia/Sydney'. I also tried setting log_timezone=timezone='Australia/Sydney' and my system clock to 'Australia/Sydney' but I still saw the same issue. I'm testing this using unit tests for some backup software I'm working on. 1) Postgres is inited and started: initdb -D /test/db/common -A trust;pg_ctl start -o "-c port=6543 -c checkpoint_segments=1 -c log_timezone='Australia/Sydney' -c timezone='Australia/Sydney' -c wal_level=hot_standby -c archive_mode=on -c archive_command='/test/bin/pg_backrest.pl --stanza=db archive-push %p' -c unix_socket_directories='/test/db'" -D /test/db/common -l /test/db/common/postgresql.log -w -s 2) The backup/restore unit tests run 2015-03-24 09:56:19 T00 INFO: run 001 - rmt 0, arc_async 0, cmp 0 2015-03-24 09:56:23 T00 INFO: full backup (insert during backup) 2015-03-24 09:56:26 T00 INFO: time target is 2015-03-25 00:56:26.783117 EST 2015-03-24 09:56:26 T00 INFO: incr backup (update during backup) 2015-03-24 09:56:30 T00 INFO: xid target is 1007 2015-03-24 09:56:30 T00 INFO: name target is backrest 2015-03-24 09:56:30 T00 INFO: testing recovery type = default 2015-03-24 09:56:30 T00 INFO: restore, backup '20150324-095626F', type 'default', expect exit 113 (postmaster running) 2015-03-24 09:56:33 T00 INFO: restore, backup '20150324-095626F', type 'default', expect exit 115 (path not empty) 2015-03-24 09:56:33 T00 INFO: restore, backup '20150324-095626F', type 'default' 2015-03-24 09:56:40 T00 INFO: testing recovery type = xid 2015-03-24 09:56:41 T00 INFO: restore, force, backup '20150324-095626F', type 'xid', target '1007', resume 2015-03-24 09:56:47 T00 INFO: testing recovery type = preserve 2015-03-24 09:56:48 T00 INFO: restore delta, backup '20150324-095626F', type 'preserve' 2015-03-24 09:56:54 T00 INFO: testing recovery type = time 2015-03-24 09:56:55 T00 INFO: restore delta, backup '20150324-095626F', type 'time', target '2015-03-25 00:56:26.783117 EST' expected value 'time' from 'select message from test' but actual was 'name' at lib/BackRestTest/BackupTest.pm line 178. 3) The "recovery type = time" test is failing because it's running to the end of the WAL stream rather than stopping at the appropriate time. Here's the recovery.conf: restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f "%p"' recovery_target_time = '2015-03-25 00:56:26.783117 EST' And the log file: LOG: database system was interrupted; last known up at 2015-03-25 00:56:24 EST LOG: creating missing WAL directory "pg_xlog/archive_status" LOG: starting point-in-time recovery to 2015-03-25 16:56:26.783117+11 LOG: restored log file "000000010000000000000003" from archive LOG: redo starts at 0/30000C8 LOG: consistent recovery state reached at 0/3000280 LOG: restored log file "000000010000000000000004" from archive LOG: restored log file "000000010000000000000005" from archive FATAL: the database system is starting up LOG: restored log file "000000010000000000000006" from archive LOG: restored log file "000000010000000000000007" from archive LOG: restored log file "000000010000000000000008" from archive LOG: restored log file "000000010000000000000009" from archive LOG: restored log file "00000001000000000000000A" from archive FATAL: the database system is starting up LOG: restored log file "00000001000000000000000B" from archive LOG: restored log file "00000001000000000000000C" from archive LOG: redo done at 0/C000130 LOG: last completed transaction was at log time 2015-03-25 00:56:30.658752+11 LOG: restored log file "00000001000000000000000C" from archive FATAL: the database system is starting up LOG: restored log file "00000002.history" from archive LOG: restored log file "00000003.history" from archive LOG: restored log file "00000004.history" from archive LOG: selected new timeline ID: 5 LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started There's definitely some kind of multiple TZ conversion going on here, but I'm not see why. However, I did get the test to work again by removing the TZ from the timestamp in recovery.conf: restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f "%p"' recovery_target_time = '2015-03-25 01:08:45.765919' Here's the log: LOG: database system was interrupted; last known up at 2015-03-25 01:08:43 EST LOG: creating missing WAL directory "pg_xlog/archive_status" LOG: starting point-in-time recovery to 2015-03-25 01:08:45.765919+11 LOG: restored log file "000000010000000000000003" from archive LOG: redo starts at 0/30000C8 LOG: consistent recovery state reached at 0/3000280 LOG: restored log file "000000010000000000000004" from archive LOG: restored log file "000000010000000000000005" from archive LOG: recovery stopping before commit of transaction 1003, time 2015-03-25 01:08:45.770163+11 LOG: redo done at 0/5013900 LOG: last completed transaction was at log time 2015-03-25 01:08:45.765465+11 LOG: restored log file "00000002.history" from archive FATAL: the database system is starting up LOG: restored log file "00000003.history" from archive LOG: restored log file "00000004.history" from archive LOG: selected new timeline ID: 5 LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started You can see that the timestamps now match up and the test was successful. I know this doesn't completely solve the problem, but hopefully it will get you going again and might provide a clue to the underlying behavior. There may also be some confusion because EST is used in both Australia and the U.S (though it is EDT here now). I wasn't able to set the timezone on Postgres to AEDT or get any output that said AEDT. I'd be curious to know how you did that. These tests were run on my OSX laptop which might introduce even more variables. In production I recommend running the OS and DB on UTC to avoid such confusion (though it does introduce a different set of problems). -- - David Steele david@pgmasters.net
В списке pgsql-hackers по дате отправления: