Обсуждение: Unable to recovery due missing wal_file

Поиск
Список
Период
Сортировка

Unable to recovery due missing wal_file

От
Patrick B
Дата:
Hi guys,

I'm currently using PostgreSQL 9.2.

One of my backup servers went down, and I had to re-sync the all the DB.
I used pg_basebackup and, of course, at the same time wal_archive.

Steps:

1 - Confirm the wal_files are being copied into the new server.
2 - Delete /var/lib/pgsql/9.2/data/*
3 - Double check the wal_files are ok into the server
4 - Start pg_basebackup
5 - pg_basebackup took 10 days, as our DB is 2.0TB.
6 - Start postgres with the recovery.conf.restore_comand
7 - Wait for the wal_files being processed

I got an error here: 00000002000013B40000001A` not found
How is that possible? If I double checked!! :(
I can't find that file anywhere.. and now I'll have to re-do all the work =(

Is there any tips you guys could give me on this situation?

Should I check the backup_label file when the pg_basebackup starts, and check if the wal_file inside the backup_label file exists?

Thanks
Patrick
 

Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:
One more question:

Could I use pg_basebackup (or another tool like RSYNC) and re-sync the data folder only with the missing data? for example... incremental? So I wouldn't need to copy 2TB again?

Re: Unable to recovery due missing wal_file

От
Adrian Klaver
Дата:
On 07/03/2016 02:48 PM, Patrick B wrote:
> Hi guys,
>
> I'm currently using PostgreSQL 9.2.
>
> One of my backup servers went down, and I had to re-sync the all the DB.
> I used pg_basebackup and, of course, at the same time wal_archive.
>
> Steps:
>
> 1 - Confirm the wal_files are being copied into the new server.
> 2 - Delete /var/lib/pgsql/9.2/data/*
> 3 - Double check the wal_files are ok into the server
> 4 - Start pg_basebackup
> 5 - pg_basebackup took 10 days, as our DB is 2.0TB.
> 6 - Start postgres with the recovery.conf.restore_comand
> 7 - Wait for the wal_files being processed
>
> I got an error here: 00000002000013B40000001A` not found
> How is that possible? If I double checked!! :(

Did you read this?:
https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

-X method
--xlog-method=method

     Includes the required transaction log files (WAL files) in the
backup. This will include all transaction logs generated during the
backup. If this option is specified, it is possible to start a
postmaster directly in the extracted directory without the need to
consult the log archive, thus making this a completely standalone backup.

     The following methods for collecting the transaction logs are
supported:

     f
     fetch

         The transaction log files are collected at the end of the
backup. Therefore, it is necessary for the wal_keep_segments parameter
to be set high enough that the log is not removed before the end of the
backup. If the log has been rotated when it's time to transfer it, the
backup will fail and be unusable.
     s
     stream

         Stream the transaction log while the backup is created. This
will open a second connection to the server and start streaming the
transaction log in parallel while running the backup. Therefore, it will
use up two connections configured by the max_wal_senders parameter. As
long as the client can keep up with transaction log received, using this
mode requires no extra transaction logs to be saved on the master.




> I can't find that file anywhere.. and now I'll have to re-do all the work =(
>
> Is there any tips you guys could give me on this situation?
>
> Should I check the backup_label file when the pg_basebackup starts, and
> check if the wal_file inside the backup_label file exists?
>
> Thanks
> Patrick
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unable to recovery due missing wal_file

От
Adrian Klaver
Дата:
On 07/03/2016 03:17 PM, Patrick B wrote:
> One more question:
>
> Could I use pg_basebackup (or another tool like RSYNC) and re-sync the
> data folder only with the missing data? for example... incremental? So I
> wouldn't need to copy 2TB again?

That assumes the needed WAL files are still on the original server,
which seemed to be not the case per your previous post.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:
Yes, I read that!

However, I store the wal_files manually into three different servers. I've double checked the files! I got over 500GB of wal_files when the pg_basebackup finished, and still, wasn't enough.

I'll re-do the steps but now using the STREAM option.

Re: Unable to recovery due missing wal_file

От
Adrian Klaver
Дата:
On 07/03/2016 05:23 PM, Patrick B wrote:
> Yes, I read that!
>
> However, I store the wal_files manually into three different servers.
> I've double checked the files! I got over 500GB of wal_files when the
> pg_basebackup finished, and still, wasn't enough.

Yes, but did you have the 16MB that are 00000002000013B40000001A?

>
> I'll re-do the steps but now using the STREAM option.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:
I don't have it now!

But I didn't know that postgres would need that file! If I knew it, I'd have checked just after pg_basebackup started....

Re: Unable to recovery due missing wal_file

От
Adrian Klaver
Дата:
On 07/03/2016 05:36 PM, Patrick B wrote:
> I don't have it now!
>
> But I didn't know that postgres would need that file! If I knew it, I'd
> have checked just after pg_basebackup started....

Not sure that would have mattered for the reasons below.

You might want to take a look at the below:

https://www.postgresql.org/docs/9.5/static/wal.html

In particular:

https://www.postgresql.org/docs/9.5/static/wal-intro.html

Short version WAL files are essential to restoring and on the
originating server are recycled, subject to the configuration parameters
explained here:

https://www.postgresql.org/docs/9.5/static/wal-configuration.html

So the wal archiving you have set up is not storing everything, it
removes older files over time?

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:


Not sure that would have mattered for the reasons below.

You might want to take a look at the below:

https://www.postgresql.org/docs/9.5/static/wal.html

In particular:

https://www.postgresql.org/docs/9.5/static/wal-intro.html

Short version WAL files are essential to restoring and on the originating server are recycled, subject to the configuration parameters explained here:

https://www.postgresql.org/docs/9.5/static/wal-configuration.html

So the wal archiving you have set up is not storing everything, it removes older files over time?

Yes... it removes... I've changed to store them for up to 72h.

I'll restart the process now, by using stream instead fletch.

Let's see.... =\ 

Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:
pg_basebackup --pgdata=- --format=tar --label=bkp_server --progress --host=localhost --port=5432 --username=replicator --xlog-method=stream

Is that right? Once is finished, just need to restart postgres and set the recovery.conf.restored.command?

Cheers

Re: Unable to recovery due missing wal_file

От
Patrick B
Дата:
slave_new: server that needs a new copy of the DB
slave01: streaming replication slave

My steps are:

1. ssh slave_new
2. Stop postgres
3. rm -rf /var/lib/pgsql/9.2/data/*
4. ssh postgres@slave01 'pg_basebackup --pgdata=- --format=tar --label=slave_new --progress --host=localhost --port=5432 --username=replicator --xlog-method=stream' | tar -x --no-same-owner
5. Once the STEP 4 is done, copy the original postgresql.conf pg_hba.conf recovery.conf into /var/lib/pgsql/9.2/data
6. Set recovery,conf with restored command.
7. Start Postgres
I should be able to see the DB recovering itself from the wal_files via LOGS

Is that right?

Re: Unable to recovery due missing wal_file

От
Adrian Klaver
Дата:
On 07/03/2016 06:21 PM, Patrick B wrote:
>
>
>     Not sure that would have mattered for the reasons below.
>
>     You might want to take a look at the below:
>
>     https://www.postgresql.org/docs/9.5/static/wal.html
>
>     In particular:
>
>     https://www.postgresql.org/docs/9.5/static/wal-intro.html
>
>     Short version WAL files are essential to restoring and on the
>     originating server are recycled, subject to the configuration
>     parameters explained here:
>
>     https://www.postgresql.org/docs/9.5/static/wal-configuration.html
>
>     So the wal archiving you have set up is not storing everything, it
>     removes older files over time?
>
>
> Yes... it removes... I've changed to store them for up to 72h.

You say it took 10 days to run the pg_basebackup, so I am not keeping
the last 72 hrs is going to help.

>
> I'll restart the process now, by using stream instead fletch.

Remember:
https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

--xlog-method=method

"This will include all transaction logs generated during the backup."

Given that you WAL archiving for a period less then 72 hrs generated
over 500GB of files, do you have the space to store 10 days worth?

>
> Let's see.... =\
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Unable to recovery due missing wal_file

От
Martín Marqués
Дата:
El 04/07/16 a las 01:06, Adrian Klaver escribió:
> On 07/03/2016 06:21 PM, Patrick B wrote:
>>
>>
>>     Not sure that would have mattered for the reasons below.
>>
>>     You might want to take a look at the below:
>>
>>     https://www.postgresql.org/docs/9.5/static/wal.html
>>
>>     In particular:
>>
>>     https://www.postgresql.org/docs/9.5/static/wal-intro.html
>>
>>     Short version WAL files are essential to restoring and on the
>>     originating server are recycled, subject to the configuration
>>     parameters explained here:
>>
>>     https://www.postgresql.org/docs/9.5/static/wal-configuration.html
>>
>>     So the wal archiving you have set up is not storing everything, it
>>     removes older files over time?
>>
>>
>> Yes... it removes... I've changed to store them for up to 72h.
>
> You say it took 10 days to run the pg_basebackup, so I am not keeping
> the last 72 hrs is going to help.

Not only that, if you add up another 500GB to transfer over the WALs,
that might mean another 2 to 3 days to finish the transfer (and that if
it's only 500GB of WALs)

BTW, 2TB in 10 days means an avg speed of 2.3Mb/s. I guess this must be
some standby in a DR site without a dedicated network bandwidth.

Wouldn't it be faster to clone the disk locally, detach it and send it
over with a Courier? ;)

Regards,

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services