Обсуждение: recovery_target_time ignored or recovery always recovers to end of WAL

Поиск

Список

Период

Сортировка

recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

02 июля 2007 г., 04:49:37

I am trying to learn/practice the administrative steps that would need
to be taken in a 'fat finger' scenario, and I am running into problems.
I am trying to use 'recovery.conf' to set the database state to about 15
minutes ago in order to recover from accidentally deleting important
data. However, each time I restart the database in recovery mode, it
seems to always return me to the state it was in when I shut it down,
ignoring my 'recovery_target_time' setting.

For example:

1. I have a production 8.2.4 database running with WAL archiving enabled.
2. Thinking I am logged into a development database I  issue the commands:

start transaction;
delete from billing_info;
delete from customer_account;
commit;

3. I suddenly realize I was logged into the production database.
4. I fall out of my chair, then regain consciousness 10 minutes later.
5. I shutdown the database, and create a 'recovery.conf' file as follows:

# pretend that 2007-07-01 20:50:00 PDT was 15 minutes ago.
recovery_target_time = '2007-07-01 20:50:00 PDT'
restore_command = 'cp /pgdata/archive_logs/%f %p'
recovery_target_inclusive = 'false'

6. I start the database, and I see the following log messages:

LOG:  starting archive recovery
LOG:  recovery_target_time = 2007-07-01 20:50:00-07
LOG:  restore_command = "cp /pgdata/archive_logs/%f %p"
LOG:  recovery_target_inclusive = false
LOG:  checkpoint record is at F/7E0DD5A4
LOG:  redo record is at F/7E0DD5A4; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/693577; next OID: 35828734
LOG:  next MultiXactId: 28; next MultiXactOffset: 55
LOG:  automatic recovery in progress
LOG:  record with zero length at F/7E0DD5EC
LOG:  redo is not required
LOG:  archive recovery complete
LOG:  database system is ready

7. I log back in to the database, expecting to see all of my
billing_info an customer_account records in place. But instead, the
tables are empty - just as they were when the db was shutdown.

What have I don't wrong? Or is there some other procedure to use in
these situations?

Thanks,
jason

Re: recovery_target_time ignored or recovery alwaysrecovers to end of WAL

От

"Simon Riggs"

Дата:

02 июля 2007 г., 09:31:44

On Sun, 2007-07-01 at 21:41 -0700, Jason L. Buberel wrote:
> I am trying to learn/practice the administrative steps that would need
> to be taken in a 'fat finger' scenario, and I am running into problems.
> I am trying to use 'recovery.conf' to set the database state to about 15
> minutes ago in order to recover from accidentally deleting important
> data. However, each time I restart the database in recovery mode, it
> seems to always return me to the state it was in when I shut it down,
> ignoring my 'recovery_target_time' setting.
>
> For example:
>
> 1. I have a production 8.2.4 database running with WAL archiving enabled.
> 2. Thinking I am logged into a development database I  issue the commands:
>
> start transaction;
> delete from billing_info;
> delete from customer_account;
> commit;
>
> 3. I suddenly realize I was logged into the production database.
> 4. I fall out of my chair, then regain consciousness 10 minutes later.
> 5. I shutdown the database, and create a 'recovery.conf' file as follows:
>
> # pretend that 2007-07-01 20:50:00 PDT was 15 minutes ago.
> recovery_target_time = '2007-07-01 20:50:00 PDT'
> restore_command = 'cp /pgdata/archive_logs/%f %p'
> recovery_target_inclusive = 'false'
>
> 6. I start the database, and I see the following log messages:
>
> LOG:  starting archive recovery
> LOG:  recovery_target_time = 2007-07-01 20:50:00-07
> LOG:  restore_command = "cp /pgdata/archive_logs/%f %p"
> LOG:  recovery_target_inclusive = false
> LOG:  checkpoint record is at F/7E0DD5A4
> LOG:  redo record is at F/7E0DD5A4; undo record is at 0/0; shutdown TRUE
> LOG:  next transaction ID: 0/693577; next OID: 35828734
> LOG:  next MultiXactId: 28; next MultiXactOffset: 55
> LOG:  automatic recovery in progress
> LOG:  record with zero length at F/7E0DD5EC
> LOG:  redo is not required
> LOG:  archive recovery complete
> LOG:  database system is ready
>
> 7. I log back in to the database, expecting to see all of my
> billing_info an customer_account records in place. But instead, the
> tables are empty - just as they were when the db was shutdown.
>
> What have I don't wrong? Or is there some other procedure to use in
> these situations?

Your example transactions are so large that going back 15 minutes is not
enough. You'll need to go back further.

recovery_target_time can only stop on a COMMIT or ABORT record. This is
because it makes no sense to recover half a transaction, only whole
transactions have meaning for recovery. So if the transactions are very
large, you need to go back further.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: recovery_target_time ignored or recovery alwaysrecovers to end of WAL

От

"Jason L. Buberel"

Дата:

02 июля 2007 г., 13:20:37

Simon,

Thanks for the tip. I had assumed that so long as I set
'recovery_target_time' to a value that occurred before the 'fatal
commit' and set the 'inclusive' flag to false that I would be able to
return to just before the deletion occurred.

I'll play with it a bit more and see. I just want to know what to do in
the future should a real emergency like this occur.

Thanks,
jason

Simon Riggs wrote:
> On Sun, 2007-07-01 at 21:41 -0700, Jason L. Buberel wrote:
>
> Your example transactions are so large that going back 15 minutes is not
> enough. You'll need to go back further.
>
> recovery_target_time can only stop on a COMMIT or ABORT record. This is
> because it makes no sense to recover half a transaction, only whole
> transactions have meaning for recovery. So if the transactions are very
> large, you need to go back further.
>
>

Re: recovery_target_time ignored or recovery alwaysrecovers to end of WAL

От

Tom Lane

Дата:

02 июля 2007 г., 15:06:46

"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Sun, 2007-07-01 at 21:41 -0700, Jason L. Buberel wrote:
>> I am trying to learn/practice the administrative steps that would need
>> to be taken in a 'fat finger' scenario, and I am running into problems.

> Your example transactions are so large that going back 15 minutes is not
> enough. You'll need to go back further.
> recovery_target_time can only stop on a COMMIT or ABORT record. This is
> because it makes no sense to recover half a transaction, only whole
> transactions have meaning for recovery. So if the transactions are very
> large, you need to go back further.

No, that doesn't explain it.  As long as he set the stop time before the
commit of the unwanted transaction, it should do what he's expecting.
It might uselessly replay a lot of the actions of a long-running
transaction, but it will stop before the COMMIT xlog record when it
reaches it, and thus the transaction will not be committed.

What's actually happening according to the log output is that it's
running all the way to the end of WAL.  I can't really think of an
explanation for that other than a mistake in choosing the stop time,
ie, it's later than the commit of the unwanted transaction.  Or maybe
the WAL file is a stale copy that doesn't even contain the unwanted
commit?

Jason, if you can't figure it out you might grab xlogviewer
http://pgfoundry.org/projects/xlogviewer/
and see what it says the timestamps of the commit records in your WAL
files are.

            regards, tom lane

Re: recovery_target_time ignored or recovery alwaysrecovers to end of WAL

От

"Jason L. Buberel"

Дата:

02 июля 2007 г., 16:21:20

Harrumph -

I downloaded the latest xlogdump source, and built/installed it against
my 8.2.4 source tree. When I execute it however, I am informed that all
of my WAL files (either the 'active' copies in pg_xlog or the 'archived'
copies in my /pgdata/archive_logs dir) appear to be malformed:

$ /opt/postgres-8.2.4/bin/xlogdump --port 54824 --host 127.0.0.1 --user
postgres  ../../../archive_logs/*
../../../archive_logs/000000010000000F0000007C:
Bogus page magic number D05E at offset 0
invalid record length at F/7C00001C
../../../archive_logs/000000010000000F0000007C.00550700.backup:
Partial page of 241 bytes ignored
../../../archive_logs/000000010000000F0000007D:
Bogus page magic number D05E at offset 0
invalid record length at F/7D00001C
../../../archive_logs/000000010000000F0000007D.0006C01C.backup:
Partial page of 241 bytes ignored

Which does not help particularly much :)

I'll keep plugging away at this - perhaps my problem in setting the
database state to a PITR is related to timezones or timestamp formatting?

-jason

Tom Lane wrote:
> Jason, if you can't figure it out you might grab xlogviewer
> http://pgfoundry.org/projects/xlogviewer/
> and see what it says the timestamps of the commit records in your WAL
> files are.
>

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

02 июля 2007 г., 17:19:55

Some more bits on this:

And playing with the date format does not seem to change the outcome
(the db is always recovered to the most current state). In this case, I
removed the timezone designation 'PDT' from my timestamp, and the db
properly figured out that it is running in GMT-7 (pacific) time (see
syslog ouptput below).

What worries me is the 'record with zero length', combined with my
issues (in previous email) with the xlogdump not finding the right magic
bits. Perhaps that (or problems related to it) are causing the recovery
process to ignore my PITR information leading it to simply recover the
database to the most recent state?

LOG:  database system was shut down at 2007-07-02 10:12:06 PDT
LOG:  starting archive recovery
LOG:  recovery_target_time = 2007-06-29 00:00:00-07
LOG:  restore_command = "cp /pgdata/archive_logs/%f %p"
LOG:  recovery_target_inclusive = false
LOG:  checkpoint record is at F/7E0DDA60
LOG:  redo record is at F/7E0DDA60; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/695227; next OID: 35828734
LOG:  next MultiXactId: 28; next MultiXactOffset: 55
LOG:  automatic recovery in progress
LOG:  record with zero length at F/7E0DDAA8
LOG:  redo is not required
LOG:  archive recovery complete
LOG:  database system is ready

-jason

Jason L. Buberel wrote:
> Harrumph -
>
> I downloaded the latest xlogdump source, and built/installed it
> against my 8.2.4 source tree. When I execute it however, I am informed
> that all of my WAL files (either the 'active' copies in pg_xlog or the
> 'archived' copies in my /pgdata/archive_logs dir) appear to be malformed:
>
> $ /opt/postgres-8.2.4/bin/xlogdump --port 54824 --host 127.0.0.1
> --user postgres  ../../../archive_logs/*
> ../../../archive_logs/000000010000000F0000007C:
> Bogus page magic number D05E at offset 0
> invalid record length at F/7C00001C
> ../../../archive_logs/000000010000000F0000007C.00550700.backup:
> Partial page of 241 bytes ignored
> ../../../archive_logs/000000010000000F0000007D:
> Bogus page magic number D05E at offset 0
> invalid record length at F/7D00001C
> ../../../archive_logs/000000010000000F0000007D.0006C01C.backup:
> Partial page of 241 bytes ignored
>
> Which does not help particularly much :)
>
> I'll keep plugging away at this - perhaps my problem in setting the
> database state to a PITR is related to timezones or timestamp formatting?
>
> -jason
>
> Tom Lane wrote:
>> Jason, if you can't figure it out you might grab xlogviewer
>> http://pgfoundry.org/projects/xlogviewer/
>> and see what it says the timestamps of the commit records in your WAL
>> files are.
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org/

Re: recovery_target_time ignored or recoveryalwaysrecovers to end of WAL

От

"Simon Riggs"

Дата:

02 июля 2007 г., 18:29:02

On Mon, 2007-07-02 at 09:21 -0700, Jason L. Buberel wrote:

> I downloaded the latest xlogdump source, and built/installed it against
> my 8.2.4 source tree. When I execute it however, I am informed that all
> of my WAL files (either the 'active' copies in pg_xlog or the 'archived'
> copies in my /pgdata/archive_logs dir) appear to be malformed:
>
> $ /opt/postgres-8.2.4/bin/xlogdump --port 54824 --host 127.0.0.1 --user
> postgres  ../../../archive_logs/*
> ../../../archive_logs/000000010000000F0000007C:
> Bogus page magic number D05E at offset 0
> invalid record length at F/7C00001C
> ../../../archive_logs/000000010000000F0000007C.00550700.backup:
> Partial page of 241 bytes ignored
> ../../../archive_logs/000000010000000F0000007D:
> Bogus page magic number D05E at offset 0
> invalid record length at F/7D00001C
> ../../../archive_logs/000000010000000F0000007D.0006C01C.backup:
> Partial page of 241 bytes ignored
>
> Which does not help particularly much :)
>
> I'll keep plugging away at this - perhaps my problem in setting the
> database state to a PITR is related to timezones or timestamp formatting?

For now, remove these lines from xlogdump.c, l.82-86
  if (((XLogPageHeader) pageBuffer)->xlp_magic != XLOG_PAGE_MAGIC)
  {
    printf("Bogus page magic number %04X at offset %X\n",
       ((XLogPageHeader) pageBuffer)->xlp_magic, logPageOff);
  }

The program is unfortunately release specific, which is not very useful
for you now. D05E is the correct magic number for 8.2.4.

I'll update that program once we have the main software done for 8.3. I
was hoping that Diogo would continue to support it.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: recovery_target_time ignored or recoveryalwaysrecovers to end of WAL

От

"Simon Riggs"

Дата:

02 июля 2007 г., 18:54:19

On Mon, 2007-07-02 at 11:06 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > On Sun, 2007-07-01 at 21:41 -0700, Jason L. Buberel wrote:
> >> I am trying to learn/practice the administrative steps that would need
> >> to be taken in a 'fat finger' scenario, and I am running into problems.
>
> > Your example transactions are so large that going back 15 minutes is not
> > enough. You'll need to go back further.
> > recovery_target_time can only stop on a COMMIT or ABORT record. This is
> > because it makes no sense to recover half a transaction, only whole
> > transactions have meaning for recovery. So if the transactions are very
> > large, you need to go back further.
>
> No, that doesn't explain it.  As long as he set the stop time before the
> commit of the unwanted transaction, it should do what he's expecting.
> It might uselessly replay a lot of the actions of a long-running
> transaction, but it will stop before the COMMIT xlog record when it
> reaches it, and thus the transaction will not be committed.
>
> What's actually happening according to the log output is that it's
> running all the way to the end of WAL.  I can't really think of an
> explanation for that other than a mistake in choosing the stop time,
> ie, it's later than the commit of the unwanted transaction.  Or maybe
> the WAL file is a stale copy that doesn't even contain the unwanted
> commit?
>
> Jason, if you can't figure it out you might grab xlogviewer
> http://pgfoundry.org/projects/xlogviewer/
> and see what it says the timestamps of the commit records in your WAL
> files are.

There's a patch awaiting review that adds the time of the last committed
transaction into the LOG output. That should help in cases like this.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: recovery_target_time ignored or recoveryalwaysrecovers to end of WAL

От

Tom Lane

Дата:

02 июля 2007 г., 20:32:31

"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Mon, 2007-07-02 at 09:21 -0700, Jason L. Buberel wrote:
>> I downloaded the latest xlogdump source, and built/installed it against
>> my 8.2.4 source tree. When I execute it however, I am informed that all
>> of my WAL files (either the 'active' copies in pg_xlog or the 'archived'
>> copies in my /pgdata/archive_logs dir) appear to be malformed:
>> Bogus page magic number D05E at offset 0

> For now, remove these lines from xlogdump.c, l.82-86
>   if (((XLogPageHeader) pageBuffer)->xlp_magic != XLOG_PAGE_MAGIC)

I don't think that's a very good solution; the reason the magic number
changed is that some of the record formats changed.  Jason needs a copy
that's actually appropriate to 8.2.

Howver there is something odd here, because the value of XLOG_PAGE_MAGIC
comes from src/include/access/xlog_internal.h and not from the text of
xlogdump.c itself.  What it looks like to me is that Jason compiled it
against the wrong set of Postgres header files.

            regards, tom lane

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

Tom Lane

Дата:

02 июля 2007 г., 20:33:31

"Jason L. Buberel" <jason@buberel.org> writes:
> What worries me is the 'record with zero length',

That's just the normal way of detecting end of WAL.

            regards, tom lane

Re: recovery_target_time ignored orrecoveryalwaysrecovers to end of WAL

От

"Simon Riggs"

Дата:

02 июля 2007 г., 21:01:27

On Mon, 2007-07-02 at 16:32 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > On Mon, 2007-07-02 at 09:21 -0700, Jason L. Buberel wrote:
> >> I downloaded the latest xlogdump source, and built/installed it against
> >> my 8.2.4 source tree. When I execute it however, I am informed that all
> >> of my WAL files (either the 'active' copies in pg_xlog or the 'archived'
> >> copies in my /pgdata/archive_logs dir) appear to be malformed:
> >> Bogus page magic number D05E at offset 0
>
> > For now, remove these lines from xlogdump.c, l.82-86
> >   if (((XLogPageHeader) pageBuffer)->xlp_magic != XLOG_PAGE_MAGIC)
>
> I don't think that's a very good solution; the reason the magic number
> changed is that some of the record formats changed.  Jason needs a copy
> that's actually appropriate to 8.2.

That was the hack for Jason, not the longterm solution. I've said I'll
work on that once other core software is done.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

03 июля 2007 г., 04:06:48

I now have a working xlogdump, which has allowed me to put together the
following steps which I believe demonstrate that the recovery process
insists on recovering to the most recent state.

Here is the sequence of events shown below:

1. Display contents of 'account_note' table
2. Update note field to be 'foobar'.
3. Dump trxnlog, note the new xid
4. Update note fied to be 'foobaz'.
5. Dump trxnlog, note the new xid
6. Update note field to be 'foobing'
7. Dump trxnlog, note the new xid
8. Create recovery.conf file with xid corresponding to 'foobar' update.
9. Stop/start database.
10. Display contents of 'account_note' table
11. Gasp in surpise at seeing 'note' field with value 'foobing'.
12. Expected to see original value 'foobar' (xid 696493, inclusive = true)
13. Scratch head in puzzlement.

altos_research=# select * from account_note;
 account_note_id | customer_account_id | user_id_of_author |
creation_date |             note
-----------------+---------------------+-------------------+---------------+-------------------------------
          410805 |              410795 |            258460 |
2006-02-03    | Ratel Investments
          441835 |                 552 |            258459 |
2006-02-16    | testing new account note fix.
         2540171 |             2540085 |            258460 |
2006-09-16    | requested to be removed
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me

altos_research=# begin transaction; update account_note set note =
'foobar'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.

altos_research=# begin transaction; update account_note set note =
'foobaz'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.
xid: 696498 total length: 824 status: COMMITED - foobaz trxn

altos_research=# begin transaction; update account_note set note =
'foobing'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.
xid: 696498 total length: 824 status: COMMITED - foobaz trxn
xid: 696502 total length: 2672 status: COMMITED - foobing trxn

## created recovery.conf file:
recovery_target_xid = '696493'
restore_command = 'cp /pgdata/archive_logs/%f %p'
recovery_target_inclusive = 'true'

## stopped and started postgres, following syslog output:
Jul  2 20:51:10 localhost postgres-8.2[9125]: [3-1] LOG:  starting
archive recovery
Jul  2 20:51:10 localhost postgres-8.2[9125]: [4-1] LOG:
recovery_target_xid = 696493
Jul  2 20:51:10 localhost postgres-8.2[9125]: [5-1] LOG:
restore_command = "cp /pgdata/archive_logs/%f %p"
Jul  2 20:51:10 localhost postgres-8.2[9125]: [6-1] LOG:
recovery_target_inclusive = true
Jul  2 20:51:10 localhost postgres-8.2[9125]: [7-1] LOG:  checkpoint
record is at F/7E0DF258
Jul  2 20:51:10 localhost postgres-8.2[9125]: [8-1] LOG:  redo record is
at F/7E0DF258; undo record is at 0/0; shutdown TRUE
Jul  2 20:51:10 localhost postgres-8.2[9125]: [9-1] LOG:  next
transaction ID: 0/696512; next OID: 35828734
Jul  2 20:51:10 localhost postgres-8.2[9125]: [10-1] LOG:  next
MultiXactId: 28; next MultiXactOffset: 55
Jul  2 20:51:10 localhost postgres-8.2[9125]: [11-1] LOG:  automatic
recovery in progress
Jul  2 20:51:10 localhost postgres-8.2[9125]: [12-1] LOG:  record with
zero length at F/7E0DF2A0
Jul  2 20:51:10 localhost postgres-8.2[9125]: [13-1] LOG:  redo is not
required
Jul  2 20:51:10 localhost postgres-8.2[9125]: [14-1] LOG:  archive
recovery complete
Jul  2 20:51:10 localhost postgres-8.2[9125]: [15-1] LOG:  database
system is ready

altos_research=# select * from account_note;
 account_note_id | customer_account_id | user_id_of_author |
creation_date |             note
-----------------+---------------------+-------------------+---------------+-------------------------------
          410805 |              410795 |            258460 |
2006-02-03    | Ratel Investments
          441835 |                 552 |            258459 |
2006-02-16    | testing new account note fix.
         2540171 |             2540085 |            258460 |
2006-09-16    | requested to be removed
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me
         9999999 |                 552 |            258460 |
2007-06-29    | help me

So now can someone tell me what I'm doing incorrectly :) ?

-jason

Simon Riggs wrote:
> On Mon, 2007-07-02 at 16:32 -0400, Tom Lane wrote:
>
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>
>>> On Mon, 2007-07-02 at 09:21 -0700, Jason L. Buberel wrote:
>>>
>>>> I downloaded the latest xlogdump source, and built/installed it against
>>>> my 8.2.4 source tree. When I execute it however, I am informed that all
>>>> of my WAL files (either the 'active' copies in pg_xlog or the 'archived'
>>>> copies in my /pgdata/archive_logs dir) appear to be malformed:
>>>> Bogus page magic number D05E at offset 0
>>>>
>>> For now, remove these lines from xlogdump.c, l.82-86
>>>   if (((XLogPageHeader) pageBuffer)->xlp_magic != XLOG_PAGE_MAGIC)
>>>
>> I don't think that's a very good solution; the reason the magic number
>> changed is that some of the record formats changed.  Jason needs a copy
>> that's actually appropriate to 8.2.
>>
>
> That was the hack for Jason, not the longterm solution. I've said I'll
> work on that once other core software is done.
>
>

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

03 июля 2007 г., 04:17:46

Minor correction to the output below - the final table dump actually contained the following - my apologies for the copy/paste error:

altos_research=# select * from account_note;
account_note_id | customer_account_id | user_id_of_author | creation_date | note
-----------------+---------------------+-------------------+---------------+---------
          410805 |              410795 |            258460 | 2006-02-03    | foobing
          441835 |                 552 |            258459 | 2006-02-16    | foobing
         2540171 |             2540085 |            258460 | 2006-09-16    | foobing
         9999999 |                 552 |            258460 | 2007-06-29    | foobing
         9999999 |                 552 |            258460 | 2007-06-29    | foobing
         9999999 |                 552 |            258460 | 2007-06-29    | foobing
         9999999 |                 552 |            258460 | 2007-06-29    | foobing
         9999999 |                 552 |            258460 | 2007-06-29    | foobing

Which is the most recent transaction update.

-jason

Jason L. Buberel wrote:

I now have a working xlogdump, which has allowed me to put together the following steps which I believe demonstrate that the recovery process insists on recovering to the most recent state.

Here is the sequence of events shown below:

1. Display contents of 'account_note' table
2. Update note field to be 'foobar'.
3. Dump trxnlog, note the new xid
4. Update note fied to be 'foobaz'.
5. Dump trxnlog, note the new xid
6. Update note field to be 'foobing'
7. Dump trxnlog, note the new xid
8. Create recovery.conf file with xid corresponding to 'foobar' update.
9. Stop/start database.
10. Display contents of 'account_note' table
11. Gasp in surpise at seeing 'note' field with value 'foobing'.
12. Expected to see original value 'foobar' (xid 696493, inclusive = true)
13. Scratch head in puzzlement.

altos_research=# select * from account_note;
account_note_id | customer_account_id | user_id_of_author | creation_date |             note
-----------------+---------------------+-------------------+---------------+-------------------------------
         410805 |              410795 |            258460 | 2006-02-03    | Ratel Investments
         441835 |                 552 |            258459 | 2006-02-16    | testing new account note fix.
        2540171 |             2540085 |            258460 | 2006-09-16    | requested to be removed
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me

altos_research=# begin transaction; update account_note set note = 'foobar'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.

altos_research=# begin transaction; update account_note set note = 'foobaz'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.
xid: 696498 total length: 824 status: COMMITED - foobaz trxn

altos_research=# begin transaction; update account_note set note = 'foobing'; commit transaction;

##xlogdump of most recent file in pg_xlog:
xid: 695073 total length: 466 status: COMMITED
xid: 695081 total length: 131 status: COMMITED
xid: 695082 total length: 131 status: COMMITED
xid: 695083 total length: 131 status: COMMITED
xid: 695084 total length: 131 status: COMMITED
xid: 696493 total length: 1520 status: COMMITED - foobar trxn.
xid: 696498 total length: 824 status: COMMITED - foobaz trxn
xid: 696502 total length: 2672 status: COMMITED - foobing trxn

## created recovery.conf file:
recovery_target_xid = '696493'
restore_command = 'cp /pgdata/archive_logs/%f %p'
recovery_target_inclusive = 'true'

## stopped and started postgres, following syslog output:
Jul 2 20:51:10 localhost postgres-8.2[9125]: [3-1] LOG: starting archive recovery
Jul 2 20:51:10 localhost postgres-8.2[9125]: [4-1] LOG: recovery_target_xid = 696493
Jul 2 20:51:10 localhost postgres-8.2[9125]: [5-1] LOG: restore_command = "cp /pgdata/archive_logs/%f %p"
Jul 2 20:51:10 localhost postgres-8.2[9125]: [6-1] LOG: recovery_target_inclusive = true
Jul 2 20:51:10 localhost postgres-8.2[9125]: [7-1] LOG: checkpoint record is at F/7E0DF258
Jul 2 20:51:10 localhost postgres-8.2[9125]: [8-1] LOG: redo record is at F/7E0DF258; undo record is at 0/0; shutdown TRUE
Jul 2 20:51:10 localhost postgres-8.2[9125]: [9-1] LOG: next transaction ID: 0/696512; next OID: 35828734
Jul 2 20:51:10 localhost postgres-8.2[9125]: [10-1] LOG: next MultiXactId: 28; next MultiXactOffset: 55
Jul 2 20:51:10 localhost postgres-8.2[9125]: [11-1] LOG: automatic recovery in progress
Jul 2 20:51:10 localhost postgres-8.2[9125]: [12-1] LOG: record with zero length at F/7E0DF2A0
Jul 2 20:51:10 localhost postgres-8.2[9125]: [13-1] LOG: redo is not required
Jul 2 20:51:10 localhost postgres-8.2[9125]: [14-1] LOG: archive recovery complete
Jul 2 20:51:10 localhost postgres-8.2[9125]: [15-1] LOG: database system is ready

altos_research=# select * from account_note;
account_note_id | customer_account_id | user_id_of_author | creation_date |             note
-----------------+---------------------+-------------------+---------------+-------------------------------
         410805 |              410795 |            258460 | 2006-02-03    | Ratel Investments
         441835 |                 552 |            258459 | 2006-02-16    | testing new account note fix.
        2540171 |             2540085 |            258460 | 2006-09-16    | requested to be removed
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me
        9999999 |                 552 |            258460 | 2007-06-29    | help me

So now can someone tell me what I'm doing incorrectly :) ?

-jason

Simon Riggs wrote:
On Mon, 2007-07-02 at 16:32 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Mon, 2007-07-02 at 09:21 -0700, Jason L. Buberel wrote:

I downloaded the latest xlogdump source, and built/installed it against my 8.2.4 source tree. When I execute it however, I am informed that all of my WAL files (either the 'active' copies in pg_xlog or the 'archived' copies in my /pgdata/archive_logs dir) appear to be malformed:
Bogus page magic number D05E at offset 0

For now, remove these lines from xlogdump.c, l.82-86
if (((XLogPageHeader) pageBuffer)->xlp_magic != XLOG_PAGE_MAGIC)

I don't think that's a very good solution; the reason the magic number
changed is that some of the record formats changed. Jason needs a copy
that's actually appropriate to 8.2.


That was the hack for Jason, not the longterm solution. I've said I'll
work on that once other core software is done.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

Tom Lane

Дата:

03 июля 2007 г., 04:41:20

"Jason L. Buberel" <jason@buberel.org> writes:
> ## stopped and started postgres, following syslog output:

You seem to have omitted all the interesting details about what you did
here; but "stopping and starting" postgres is certainly not intended to
cause it to discard data.  There would need to have been some steps
involving restoring a previous base backup and rolling forward through
archived xlog files.

            regards, tom lane

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

03 июля 2007 г., 04:58:27

I am now learning that fact, but recall the original scenario that I am trying to mimic:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll back db state to just prior to that deletion.
3. (Me) Assumes that by creating a recovery.conf file and setting the target to a an earlier trxn id and restarting the database would simply do the trick.

So now I think that my scenario should look more like:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll db state back to just prior to that delete.
3. (Me) Performs steps, in addition to creating the recovery.conf with the selected xid, that will cause the DB to restart at that PITR.

Now all I need to lock down are those 'additional steps needed to force the recovery process to only recover up to the specified xid and no further'.

Such as:

- Remove from pg_xlog all of the log files containing transactions that come after the selected xid?
- Other?

-jason

Tom Lane wrote:

"Jason L. Buberel" <jason@buberel.org> writes:

## stopped and started postgres, following syslog output:

You seem to have omitted all the interesting details about what you did
here; but "stopping and starting" postgres is certainly not intended to
cause it to discard data.  There would need to have been some steps
involving restoring a previous base backup and rolling forward through
archived xlog files.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's datatypes do not      match

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

03 июля 2007 г., 05:49:40

I think that I now see the error of my ways.

When I shutdown my server, the files under the ./data/ directory still all point to 'now' and not 'a week ago when the backups were taken'. So the recover process insists on bringing the database to a PITR equal to 'now'.

Instead, in order to achieve my goal I would have to restore to that backup, and rely on the contents of the archive_logs to have the recovery process return me to the selected xid PITR.

So is there any way to 'trick' or force the server to forget what it thinks 'now' is and instead to step back to the selected xid and make that the new version of 'now'?

-jason

Jason L. Buberel wrote:

I am now learning that fact, but recall the original scenario that I am trying to mimic:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll back db state to just prior to that deletion.
3. (Me) Assumes that by creating a recovery.conf file and setting the target to a an earlier trxn id and restarting the database would simply do the trick.

So now I think that my scenario should look more like:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll db state back to just prior to that delete.
3. (Me) Performs steps, in addition to creating the recovery.conf with the selected xid, that will cause the DB to restart at that PITR.

Now all I need to lock down are those 'additional steps needed to force the recovery process to only recover up to the specified xid and no further'.

Such as:

- Remove from pg_xlog all of the log files containing transactions that come after the selected xid?
- Other?

-jason

Tom Lane wrote:
"Jason L. Buberel" <jason@buberel.org> writes: 
## stopped and started postgres, following syslog output:   
You seem to have omitted all the interesting details about what you did
here; but "stopping and starting" postgres is certainly not intended to
cause it to discard data.  There would need to have been some steps
involving restoring a previous base backup and rolling forward through
archived xlog files.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's datatypes do not      match 

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

Tom Lane

Дата:

03 июля 2007 г., 15:01:02

"Jason L. Buberel" <jason@buberel.org> writes:
> Instead, in order to achieve my goal I would have to restore to that
> backup, and rely on the contents of the archive_logs to have the
> recovery process return me to the selected xid PITR.

Correct.

> So is there any way to 'trick' or force the server to forget what it
> thinks 'now' is and instead to step back to the selected xid and make
> that the new version of 'now'?

No.  As an example, would you expect such a trick to reverse the effects
of a DROP TABLE?  Or even just a single DELETE?  That would mean we
could *never* free any storage.

            regards, tom lane

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

Erik Jones

Дата:

03 июля 2007 г., 15:05:29

On Jul 2, 2007, at 11:58 PM, Jason L. Buberel wrote:

> I am now learning that fact, but recall the original scenario that
> I am trying to mimic:
>
> 1. Person accidentally deletes contents of important table.
> 2. Admin (me) wants to roll back db state to just prior to that
> deletion.
> 3. (Me) Assumes that by creating a recovery.conf file and setting
> the target to a an earlier trxn id and restarting the database
> would simply do the trick.
>
> So now I think that my scenario should look more like:
>
> 1. Person accidentally deletes contents of important table.
> 2. Admin (me) wants to roll db state back to just prior to that
> delete.
> 3. (Me) Performs steps, in addition to creating the recovery.conf
> with the selected xid, that will cause the DB to restart at that PITR.
>
> Now all I need to lock down are those 'additional steps needed to
> force the recovery process to only recover up to the specified xid
> and no further'.
>
> Such as:
>
> - Remove from pg_xlog all of the log files containing transactions
> that come after the selected xid?
> - Other?
>
> -jason

Whoa.  If what you're asking is for a database level rollback or undo
type of function, I'm pretty sure that's not doable in the way you're
asking.  Once a postgres cluster (data/*) has committed or rolled
back transactions, you can not roll that particular cluster instance
back, i.e. you can not stop the database and have it start from some
previous transaction state and stop at an arbitrary point.  The only
way to do this is if you have a filesystem level backup from a point
in time previous to the point to which you wish to return along with
all of the transaction logs from just before the point where the
backup was taken up to the point to which you wish to return, which
you can then bring up in recovery mode and have it play up until a
transaction id you specify.

The main point here is that PITR requires and, is run on, a base backup.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: recovery_target_time ignored or recovery always recovers to end of WAL

От

"Jason L. Buberel"

Дата:

03 июля 2007 г., 15:18:48

Understood all around - thanks to everyone for helping me clarify (in my head/understanding) how these pieces fit together.

One last request - can you glance over the plan below and let me know if it sounds sane? The goal again is to be able to recover to a PITR record in the very recent past after a mistake (accidental deletion) occurs. This is not meant to be a substitute for a real backup strategy:

1. On a daily basis, run a hot backup more/less as described in this Wiki article on postgresqlforums.com: http://shorl.com/naprofamynone

2. But instead of creating a .tar.gz file, simply 'rsync' the entire $PGDATA dir to another directory stored on a local disk.

3. When an accidental deletion occurs, shutdown the database, restore the $PGDATA dir from yesterday's rsync snapshot (after saving any unarchived pg_xlogs), and use the normal recovery process to effectively 'roll back' the database state to what it was just prior to the accident.

Thanks again for all the helpful comments and clarifications. I am now a more clueful person as a result ;)
-jason

Erik Jones wrote:

On Jul 2, 2007, at 11:58 PM, Jason L. Buberel wrote:

I am now learning that fact, but recall the original scenario that I am trying to mimic:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll back db state to just prior to that deletion.
3. (Me) Assumes that by creating a recovery.conf file and setting the target to a an earlier trxn id and restarting the database would simply do the trick.

So now I think that my scenario should look more like:

1. Person accidentally deletes contents of important table.
2. Admin (me) wants to roll db state back to just prior to that delete.
3. (Me) Performs steps, in addition to creating the recovery.conf with the selected xid, that will cause the DB to restart at that PITR.

Now all I need to lock down are those 'additional steps needed to force the recovery process to only recover up to the specified xid and no further'.

Such as:

- Remove from pg_xlog all of the log files containing transactions that come after the selected xid?
- Other?

-jason

Whoa. If what you're asking is for a database level rollback or undo type of function, I'm pretty sure that's not doable in the way you're asking. Once a postgres cluster (data/*) has committed or rolled back transactions, you can not roll that particular cluster instance back, i.e. you can not stop the database and have it start from some previous transaction state and stop at an arbitrary point. The only way to do this is if you have a filesystem level backup from a point in time previous to the point to which you wish to return along with all of the transaction logs from just before the point where the backup was taken up to the point to which you wish to return, which you can then bring up in recovery mode and have it play up until a transaction id you specify.

The main point here is that PITR requires and, is run on, a base backup.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: recovery_target_time ignored or recovery always recovers to end of WAL