Re: recovery_target_time ignored ?

Поиск
Список
Период
Сортировка
От Venkata Balaji N
Тема Re: recovery_target_time ignored ?
Дата
Msg-id CAEyp7J-T8hP0H9nDWwyWR2dcugiHCo-GA66L7uZd7F+1im9ZHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: recovery_target_time ignored ?  (David Steele <david@pgmasters.net>)
Список pgsql-hackers

On Wed, Mar 25, 2015 at 1:28 AM, David Steele <david@pgmasters.net> wrote:
On 3/24/15 6:12 AM, Venkata Balaji N wrote:
>
> On Tue, Mar 24, 2015 at 9:54 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     On 3/23/15 12:42 AM, Venkata Balaji N wrote:
>     > Hi,
>     >
>     > Assuming that this might require a patch, i am posting this in
>     > pgsql-hackers. Apologies, if this is not the appropriate mailing list to
>     > start this discussion.
>     >
>     > I performed a PITR and saw the below message in the log file is a bit
>     > confusing.
>     >
>     > 2015-03-23 13:49:09.816 GMT-10 DB= PID=4707 LOG: *database system was
>     > interrupted; last known up at 2015-03-23 10:30:26 GMT-10*
>     > 2015-03-23 13:49:09.817 GMT-10 DB= PID=4707 LOG: *starting
>     point-in-time
>     > recovery to 2015-03-23 10:00:26+10*
>     > 2015-03-23 13:49:09.827 GMT-10 DB= PID=4707 LOG:  restored log file
>     > "000000010000000B00000020" from archive
>     > 2015-03-23 13:49:09.888 GMT-10 DB= PID=4707 LOG:  redo starts at B/20000090
>     > 2015-03-23 13:49:09.937 GMT-10 DB= PID=4707 LOG:  consistent recovery
>     > state reached at B/200000B8
>     > 2015-03-23 13:49:09.947 GMT-10 DB= PID=4707 LOG:  restored log file
>     > "000000010000000B00000021" from archive
>     > 2015-03-23 13:49:09.950 GMT-10 DB= PID=4707 LOG:  *recovery stopping
>     > before commit of transaction 16267, time 2015-03-23 13:22:37.53007+10*
>     >
>     >
>     > By mistake i gave "recovery_target_time" as 10:00 GMT which is 25/30
>     > minutes behind the backup start/end time registered in the backup_label.
>     >
>     > The parameter recovery_target_time is ignored and recovery proceeds
>     > further applying all the available WAL Archive files finally ends up
>     > bringing up the database.
>     >
>     > I think it would make sense if the recovery does not proceed any further
>     > and error out with a message like "recovery_target_time is behind the
>     > backup time.. please consider using the backup taken prior to the
>     > recovery_target_time"
>
>     I just tried it with 9.3.5 and I do get an error:
>
>     LOG:  starting point-in-time recovery to 2015-03-23 17:26:02.721307-04
>     LOG:  restored log file "000000010000000000000003" from archive
>     LOG:  redo starts at 0/30000C8
>     LOG:  recovery stopping before commit of transaction 1001, time
>     2015-03-23 18:26:01.012593-04
>     LOG:  redo done at 0/3000228
>     FATAL:  requested recovery stop point is before consistent recovery
>     point
>
>
> That makes more sense. This is what i was expecting to happen. Then, i
> think it is something to do with the timestamp format.
>
>
>     Here's my recovery.conf file:
>
>     restore_command = '/usr/bin/pg_backrest.pl <http://pg_backrest.pl>
>     --stanza=db archive-get %f "%p"'
>     recovery_target_time = '2015-03-23 17:26:02.721307 EDT'
>
>     > recovery.conf file is as follows :
>     >
>     > restore_command='cp /data/pgdata9400backup/pgwalarch9400backup/%f %p '
>     > recovery_target_time='2015-03-23 10:00:26 GMT-10'
>     > recovery_target_inclusive='true'
>
>     You have '2015-03-23 10:00:26 GMT-10' in recovery.conf but the log says
>     'starting point-in-time recovery to 2015-03-23 10:00:26+10'.  Note the -
>     vs +.
>
>
> This is my confusion too. I picked up the time format from the backup label.
>
>     Could you check your log and recovery.conf and make sure the timezone
>     offsets are actually different?
>
>
> I am not sure why the timestamp is taken as  "2015-03-23 10:00:26+10"
> for "2015-03-23 10:00:26 GMT-10'".
>
> My another system's timestamp format is also AEDT. I did another test
> and I get the same problem.
>
> Below is the text from the log file. I gave a recovery_target_time
> almost a day behind the consistent recovery point. Still, the
> recovery_target_time is taken as "*2015-03-23 11:07:10+11*" for
> "*2015-03-23 11:07:10 AEDT*"
>
>     2015-03-24 18:42:44.608 AEDT LOG:  database system was interrupted;
>     last known up at *2015-03-24 18:20:53 AEDT*
>
>     2015-03-24 18:42:44.608 AEDT LOG:  starting point-in-time recovery
>     to *2015-03-23 11:07:10+11*
>
>     cp: /disk3/pgwalarch9401/0000000100000003000000FE: No such file or
>     directory
>
>     2015-03-24 18:42:44.626 AEDT LOG:  record with zero length at 3/FE000090
>
> Below is my recovery.conf file
>
>     restore_command='cp /disk3/pgwalarch9401/%f %p'
>
>     recovery_target_time='*2015-03-23 11:07:10 AEDT*'
>
>     recovery_target_inclusive=true
>
> I am checking if  this has something to do with my system timestamp format.
>
> Not sure what am i missing. Do i need to give any special time format ?

This is a weird one.  I can reproduce the problem by setting
timezone=timezone='Australia/Sydney'.  I also tried setting
log_timezone=timezone='Australia/Sydney' and my system clock to
'Australia/Sydney' but I still saw the same issue.

I'm testing this using unit tests for some backup software I'm working on.

1) Postgres is inited and started:

initdb -D /test/db/common -A trust;pg_ctl start -o "-c port=6543 -c
checkpoint_segments=1 -c log_timezone='Australia/Sydney' -c
timezone='Australia/Sydney' -c wal_level=hot_standby -c archive_mode=on
-c archive_command='/test/bin/pg_backrest.pl --stanza=db archive-push
%p' -c unix_socket_directories='/test/db'" -D /test/db/common -l
/test/db/common/postgresql.log -w -s

2) The backup/restore unit tests run

2015-03-24 09:56:19 T00   INFO: run 001 - rmt 0, arc_async 0, cmp 0
2015-03-24 09:56:23 T00   INFO:     full backup (insert during backup)
2015-03-24 09:56:26 T00   INFO:         time target is 2015-03-25
00:56:26.783117 EST
2015-03-24 09:56:26 T00   INFO:     incr backup (update during backup)
2015-03-24 09:56:30 T00   INFO:         xid target is 1007
2015-03-24 09:56:30 T00   INFO:         name target is backrest
2015-03-24 09:56:30 T00   INFO:     testing recovery type = default
2015-03-24 09:56:30 T00   INFO:         restore, backup
'20150324-095626F', type 'default', expect exit 113 (postmaster running)
2015-03-24 09:56:33 T00   INFO:         restore, backup
'20150324-095626F', type 'default', expect exit 115 (path not empty)
2015-03-24 09:56:33 T00   INFO:         restore, backup
'20150324-095626F', type 'default'
2015-03-24 09:56:40 T00   INFO:     testing recovery type = xid
2015-03-24 09:56:41 T00   INFO:         restore, force, backup
'20150324-095626F', type 'xid', target '1007', resume
2015-03-24 09:56:47 T00   INFO:     testing recovery type = preserve
2015-03-24 09:56:48 T00   INFO:         restore delta, backup
'20150324-095626F', type 'preserve'
2015-03-24 09:56:54 T00   INFO:     testing recovery type = time
2015-03-24 09:56:55 T00   INFO:         restore delta, backup
'20150324-095626F', type 'time', target '2015-03-25 00:56:26.783117 EST'
expected value 'time' from 'select message from test' but actual was
'name' at lib/BackRestTest/BackupTest.pm line 178.

3) The "recovery type = time" test is failing because it's running to
the end of the WAL stream rather than stopping at the appropriate time.

This is the problem i am facing.
 
Here's the recovery.conf:

restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f "%p"'
recovery_target_time = '2015-03-25 00:56:26.783117 EST'

And the log file:

LOG:  database system was interrupted; last known up at 2015-03-25
00:56:24 EST
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  starting point-in-time recovery to 2015-03-25 16:56:26.783117+11
LOG:  restored log file "000000010000000000000003" from archive
LOG:  redo starts at 0/30000C8
LOG:  consistent recovery state reached at 0/3000280
LOG:  restored log file "000000010000000000000004" from archive
LOG:  restored log file "000000010000000000000005" from archive
FATAL:  the database system is starting up
LOG:  restored log file "000000010000000000000006" from archive
LOG:  restored log file "000000010000000000000007" from archive
LOG:  restored log file "000000010000000000000008" from archive
LOG:  restored log file "000000010000000000000009" from archive
LOG:  restored log file "00000001000000000000000A" from archive
FATAL:  the database system is starting up
LOG:  restored log file "00000001000000000000000B" from archive
LOG:  restored log file "00000001000000000000000C" from archive
LOG:  redo done at 0/C000130
LOG:  last completed transaction was at log time 2015-03-25
00:56:30.658752+11
LOG:  restored log file "00000001000000000000000C" from archive
FATAL:  the database system is starting up
LOG:  restored log file "00000002.history" from archive
LOG:  restored log file "00000003.history" from archive
LOG:  restored log file "00000004.history" from archive
LOG:  selected new timeline ID: 5
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

There's definitely some kind of multiple TZ conversion going on here,
but I'm not see why.  However, I did get the test to work again by
removing the TZ from the timestamp in recovery.conf:
  
restore_command = '/test/bin/pg_backrest.pl --stanza=db archive-get %f "%p"'
recovery_target_time = '2015-03-25 01:08:45.765919'

Here's the log:

LOG:  database system was interrupted; last known up at 2015-03-25
01:08:43 EST
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  starting point-in-time recovery to 2015-03-25 01:08:45.765919+11
LOG:  restored log file "000000010000000000000003" from archive
LOG:  redo starts at 0/30000C8
LOG:  consistent recovery state reached at 0/3000280
LOG:  restored log file "000000010000000000000004" from archive
LOG:  restored log file "000000010000000000000005" from archive
LOG:  recovery stopping before commit of transaction 1003, time
2015-03-25 01:08:45.770163+11
LOG:  redo done at 0/5013900
LOG:  last completed transaction was at log time 2015-03-25
01:08:45.765465+11
LOG:  restored log file "00000002.history" from archive
FATAL:  the database system is starting up
LOG:  restored log file "00000003.history" from archive
LOG:  restored log file "00000004.history" from archive
LOG:  selected new timeline ID: 5
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

You can see that the timestamps now match up and the test was
successful.  I know this doesn't completely solve the problem, but
hopefully it will get you going again and might provide a clue to the
underlying behavior.

Your logs above also show that the timestamp is taken as recovery_target_time+11. 

Yeah, i am trying to understand the conversion mechanism here. Need to do a bit more testing. 
 
There may also be some confusion because EST is used in both Australia
and the U.S (though it is EDT here now).  I wasn't able to set the
timezone on Postgres to AEDT or get any output that said AEDT.  I'd be
curious to know how you did that.

I am testing in 9.4.1. You mentioned that, you are using PG Version 9.3.5. There are lot of timezone related fixes in 9.3.6. 
In my OSX laptop which is has AEDT timezone settings, version 9.3.6 by default picks up 'Australia/ACT' timezone at the time of initdb and the same does not happen in 9.3.5. That might be the reason you are unable to set Australian timezone.


Regards,
Venkata Balaji N

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: printing table in asciidoc with psql
Следующее
От: Shigeru HANADA
Дата:
Сообщение: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)