Обсуждение: Postgres base Backup fails to recover all logs fails from archive

Поиск
Список
Период
Сортировка

Postgres base Backup fails to recover all logs fails from archive

От
Lars Aksel Opsahl
Дата:
Hi


This is postgreSQL 9.2.1


By accident I was running the the following command

«delete from org_oks.oks_kartobjekt_flate where id = 53»

on the wrong database.


- I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by
using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file
systembackup and then SELECT pg_stop_backup(); 


The row I deleted was added after the base backup was created.

- Her is the cat from 0000000100000023000000F4.00000020.backup

START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)

STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)

CHECKPOINT LOCATION: 23/F4000020

BACKUP METHOD: pg_start_backup

BACKUP FROM: master

START TIME: 2012-11-22 09:57:03 CET

LABEL: /backup/db01/base_backup_2012_11_22

STOP TIME: 2012-11-22 10:38:44 CET

- I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the
samebehavior) 

restore_command ='cp /data01/new_archive_files/%f %p'

recovery_target_time = '2013-01-04 04:00:00'

- Here is the log from startup

LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET

LOG: starting archive recovery

LOG: restored log file "0000000100000023000000F4" from archive

LOG: redo starts at 23/F4000080

LOG: consistent recovery state reached at 23/F5000000

LOG: restored log file "0000000100000023000000F5" from archive

LOG: restored log file "0000000100000023000000F6" from archive

LOG: restored log file "0000000100000023000000F7" from archive

LOG: contrecord is requested by 23/F7000020

LOG: redo done at 23/F6000080

LOG: restored log file "0000000100000023000000F6" from archive

cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory

LOG: selected new timeline ID: 2

cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory

LOG: archive recovery complete

LOG: database system is ready to accept connections

LOG: autovacuum launcher started


- Here is the list some the archive files.

-rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008

-rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007

-rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006

-rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005

-rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004

-rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003

-rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002

-rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001

-rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000

-rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE

-rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD

-rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC

-rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB

-rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA

-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9

-rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8

-rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7

-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6

-rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5

-rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4

-rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup


I have two questions

  1.  Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before without
anyproblems.) 

  2.  Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the row
Ideleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) 

Thanks in advance

Lars



Re: Postgres base Backup fails to recover all logs fails from archive

От
Magnus Hagander
Дата:
On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl
<lop@skogoglandskap.no> wrote:
> Hi
>
>
> This is postgreSQL 9.2.1
>
>
> By accident I was running the the following command
>
> «delete from org_oks.oks_kartobjekt_flate where id = 53»
>
> on the wrong database.
>
>
> - I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by
using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file
systembackup and then SELECT pg_stop_backup(); 
>
>
> The row I deleted was added after the base backup was created.
>
> - Her is the cat from 0000000100000023000000F4.00000020.backup
>
> START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)
>
> STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)
>
> CHECKPOINT LOCATION: 23/F4000020
>
> BACKUP METHOD: pg_start_backup
>
> BACKUP FROM: master
>
> START TIME: 2012-11-22 09:57:03 CET
>
> LABEL: /backup/db01/base_backup_2012_11_22
>
> STOP TIME: 2012-11-22 10:38:44 CET
>
> - I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the
samebehavior) 
>
> restore_command ='cp /data01/new_archive_files/%f %p'
>
> recovery_target_time = '2013-01-04 04:00:00'
>
> - Here is the log from startup
>
> LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET
>
> LOG: starting archive recovery
>
> LOG: restored log file "0000000100000023000000F4" from archive
>
> LOG: redo starts at 23/F4000080
>
> LOG: consistent recovery state reached at 23/F5000000
>
> LOG: restored log file "0000000100000023000000F5" from archive
>
> LOG: restored log file "0000000100000023000000F6" from archive
>
> LOG: restored log file "0000000100000023000000F7" from archive
>
> LOG: contrecord is requested by 23/F7000020
>
> LOG: redo done at 23/F6000080
>
> LOG: restored log file "0000000100000023000000F6" from archive
>
> cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory
>
> LOG: selected new timeline ID: 2
>
> cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory
>
> LOG: archive recovery complete
>
> LOG: database system is ready to accept connections
>
> LOG: autovacuum launcher started
>
>
> - Here is the list some the archive files.
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4
>
> -rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup
>
>
> I have two questions
>
>   1.  Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before
withoutany problems.) 

What actually went wrong? Because the log looks like a perfectly
normal restore? (Though you should really add timestamps to your log)


>   2.  Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the
rowI deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) 

No, the archive files only contain the binary changes to the disk
blocks, not the SQL statements or data.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Postgres base Backup fails to recover all logs fails from archive

От
Lars Aksel Opsahl
Дата:
Hi

Thanks for your help.

The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.

The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.

I don't under stand why it stops to read the rest of the log files.

Lars
________________________________________
Fra: Magnus Hagander [magnus@hagander.net]
Sendt: 5. januar 2013 22:45
Til: Lars Aksel Opsahl
Kopi: pgsql-admin@postgresql.org
Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive

On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl
<lop@skogoglandskap.no> wrote:
> Hi
>
>
> This is postgreSQL 9.2.1
>
>
> By accident I was running the the following command
>
> «delete from org_oks.oks_kartobjekt_flate where id = 53»
>
> on the wrong database.
>
>
> - I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by
using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file
systembackup and then SELECT pg_stop_backup(); 
>
>
> The row I deleted was added after the base backup was created.
>
> - Her is the cat from 0000000100000023000000F4.00000020.backup
>
> START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4)
>
> STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4)
>
> CHECKPOINT LOCATION: 23/F4000020
>
> BACKUP METHOD: pg_start_backup
>
> BACKUP FROM: master
>
> START TIME: 2012-11-22 09:57:03 CET
>
> LABEL: /backup/db01/base_backup_2012_11_22
>
> STOP TIME: 2012-11-22 10:38:44 CET
>
> - I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the
samebehavior) 
>
> restore_command ='cp /data01/new_archive_files/%f %p'
>
> recovery_target_time = '2013-01-04 04:00:00'
>
> - Here is the log from startup
>
> LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET
>
> LOG: starting archive recovery
>
> LOG: restored log file "0000000100000023000000F4" from archive
>
> LOG: redo starts at 23/F4000080
>
> LOG: consistent recovery state reached at 23/F5000000
>
> LOG: restored log file "0000000100000023000000F5" from archive
>
> LOG: restored log file "0000000100000023000000F6" from archive
>
> LOG: restored log file "0000000100000023000000F7" from archive
>
> LOG: contrecord is requested by 23/F7000020
>
> LOG: redo done at 23/F6000080
>
> LOG: restored log file "0000000100000023000000F6" from archive
>
> cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory
>
> LOG: selected new timeline ID: 2
>
> cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory
>
> LOG: archive recovery complete
>
> LOG: database system is ready to accept connections
>
> LOG: autovacuum launcher started
>
>
> - Here is the list some the archive files.
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8
>
> -rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5
>
> -rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4
>
> -rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup
>
>
> I have two questions
>
>   1.  Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before
withoutany problems.) 

What actually went wrong? Because the log looks like a perfectly
normal restore? (Though you should really add timestamps to your log)


>   2.  Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the
rowI deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) 

No, the archive files only contain the binary changes to the disk
blocks, not the SQL statements or data.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Postgres base Backup fails to recover all logs fails from archive

От
Magnus Hagander
Дата:
On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl
<lop@skogoglandskap.no> wrote:
> Hi
>
> Thanks for your help.
>
> The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.
>
> The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.
>
> I don't under stand why it stops to read the rest of the log files.

Ah, I see that now.

Well, it's claiming that the F7 logfile is corrupt.

There's also an interesting jump in timestamps - the F7 logfile is
from *much* later than F6. Almost two whole days. Is that actually
normal, or did something possibly go wrong with your archiving?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Postgres base Backup fails to recover all logs fails from archive

От
Lars Aksel Opsahl
Дата:
Hi again

I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick glance
at Nagios and Munin logs but I could not find anything there. 

This database is mainly used for reading data as it is now, but it's a couple of applications which also store data.
The2 days with out activity in the archive logs is strange when I checked back on the activity reports for
applications.I will set the archive_timeout, but the problem  is that we have small updates and the archive files has
thesame size and is not depending on the amount of changed data. 

We use NFS to copy archive  files to the backup server because that is simple, but maybe we should have used rsyn or
somethingelse, but independent off how we do it, we need a method to check that the archive file is OK after it's copy.


If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK
whenit's created on the database server. 

Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files are
equalbefore before it is deleted on the database server ? 

Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if
thefile is OK  ? 

Thanks Lars
________________________________________
Fra: Magnus Hagander [magnus@hagander.net]
Sendt: 7. januar 2013 21:47
Til: Lars Aksel Opsahl
Kopi: pgsql-admin@postgresql.org
Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive

On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl
<lop@skogoglandskap.no> wrote:
> Hi
>
> Thanks for your help.
>
> The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7.
>
> The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files.
>
> I don't under stand why it stops to read the rest of the log files.

Ah, I see that now.

Well, it's claiming that the F7 logfile is corrupt.

There's also an interesting jump in timestamps - the F7 logfile is
from *much* later than F6. Almost two whole days. Is that actually
normal, or did something possibly go wrong with your archiving?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Postgres base Backup fails to recover all logs fails from archive

От
Magnus Hagander
Дата:
On Tue, Jan 8, 2013 at 11:00 AM, Lars Aksel Opsahl
<lop@skogoglandskap.no> wrote:
> Hi again
>
> I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick
glanceat  Nagios and Munin logs but I could not find anything there. 
>
> This database is mainly used for reading data as it is now, but it's a couple of applications which also store data.
The2 days with out activity in the archive logs is strange when I checked back on the activity reports for
applications.I will set the archive_timeout, but the problem  is that we have small updates and the archive files has
thesame size and is not depending on the amount of changed data. 

Yeah, that is indeed a problem. pg_receivexlog can help you arund that
issue, but then your receiving end needs to be a "real box" and not
just an NFS share.


> We use NFS to copy archive  files to the backup server because that is simple, but maybe we should have used rsyn or
somethingelse, but independent off how we do it, we need a method to check that the archive file is OK after it's copy. 

I would generally recommend using somehting that writes the file
atomically. For example, rsync. But you can use rsync with your target
being the NFS share - you don't need an rsync server. Or just a manual
copy+rename. As long as your NFS isn't mounted asynchronously, that
should be safe.


> If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK
whenit's created on the database server. 
>
> Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files
areequal before before it is deleted on the database server ? 

Copy+rename should take care of it for you - I doubt it's an actual
failure in contents wrong, it's more that something crashed in the
middle of a file and therefor it ended up being corrupt. If that was
written to a temp file and then renamed into place, that should not
happen.

> Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if
thefile is OK  ? 

I don't believe there is such a tool. You could make some very most
basic verification manually, but to verify the full contents you
really need to fully parse them - and the tool to do that is
"postgres".

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/