Обсуждение: corrupted indexes when using base backups generated from hot standby

Поиск
Список
Период
Сортировка

corrupted indexes when using base backups generated from hot standby

От
Lonni J Friedman
Дата:
Greetings,
I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
several hot standby servers.  Since upgrading to 9.2.2 from 9.1.x a
few months ago, I switched from generating a base backup on the
master, to generating it on a dedicated slave/standby (to reduce the
load on the master).  The command that I've always used to generate
the base backup is:
pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres

However, I've noticed that whenever I use the base backup generated
from the standby to create a new standby server, many of the indexes
are corrupted.  This was never the case when I was generating the
basebackup directly from the master.  Now, I see errors similar to the
following when running queries against the tables that own the
indexes:
INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12
HINT:  Please REINDEX it.
INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
page at block 111
HINT:  Please REINDEX it.

I've confirmed that the errors/corruption doesn't exist on the server
that is generating the base backup (I can run the same SQL query which
fails on the new standby, successfully).  So it seems that I'm
potentially misunderstanding some part of the process.  My setup
process is to simply untar the basebackup in the $PGDATA directory,
and copy over all the WAL logs into $PGDATA/pg_xlog.

thanks for any pointers.


Re: corrupted indexes when using base backups generated from hot standby

От
Heikki Linnakangas
Дата:
On 09.01.2013 20:28, Lonni J Friedman wrote:
> Greetings,
> I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
> several hot standby servers.  Since upgrading to 9.2.2 from 9.1.x a
> few months ago, I switched from generating a base backup on the
> master, to generating it on a dedicated slave/standby (to reduce the
> load on the master).  The command that I've always used to generate
> the base backup is:
> pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres
>
> However, I've noticed that whenever I use the base backup generated
> from the standby to create a new standby server, many of the indexes
> are corrupted.  This was never the case when I was generating the
> basebackup directly from the master.  Now, I see errors similar to the
> following when running queries against the tables that own the
> indexes:
> INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12
> HINT:  Please REINDEX it.
> INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
> page at block 111
> HINT:  Please REINDEX it.
>
> I've confirmed that the errors/corruption doesn't exist on the server
> that is generating the base backup (I can run the same SQL query which
> fails on the new standby, successfully).  So it seems that I'm
> potentially misunderstanding some part of the process.  My setup
> process is to simply untar the basebackup in the $PGDATA directory,
> and copy over all the WAL logs into $PGDATA/pg_xlog.

That process sounds correct. Since you're using pg_basebackup -x option,
you don't even need to copy the WAL logs, although it shouldn't do any
harm either . The tar file should contain everything needed to restore
the backup.

Can you provide more information? The log output would be nice. How
large is the database? What kind of activity is there in the master
while the backup is taken?

- Heikki


Re: corrupted indexes when using base backups generated from hot standby

От
Lonni J Friedman
Дата:
On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 09.01.2013 20:28, Lonni J Friedman wrote:
>>
>> Greetings,
>> I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and
>> several hot standby servers.  Since upgrading to 9.2.2 from 9.1.x a
>> few months ago, I switched from generating a base backup on the
>> master, to generating it on a dedicated slave/standby (to reduce the
>> load on the master).  The command that I've always used to generate
>> the base backup is:
>> pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres
>>
>> However, I've noticed that whenever I use the base backup generated
>> from the standby to create a new standby server, many of the indexes
>> are corrupted.  This was never the case when I was generating the
>> basebackup directly from the master.  Now, I see errors similar to the
>> following when running queries against the tables that own the
>> indexes:
>> INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block
>> 12
>> HINT:  Please REINDEX it.
>> INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero
>> page at block 111
>> HINT:  Please REINDEX it.
>>
>> I've confirmed that the errors/corruption doesn't exist on the server
>> that is generating the base backup (I can run the same SQL query which
>> fails on the new standby, successfully).  So it seems that I'm
>> potentially misunderstanding some part of the process.  My setup
>> process is to simply untar the basebackup in the $PGDATA directory,
>> and copy over all the WAL logs into $PGDATA/pg_xlog.
>
>
> That process sounds correct. Since you're using pg_basebackup -x option, you
> don't even need to copy the WAL logs, although it shouldn't do any harm
> either . The tar file should contain everything needed to restore the
> backup.
>
> Can you provide more information? The log output would be nice. How large is
> the database? What kind of activity is there in the master while the backup
> is taken?

Sorry for the delayed reply, I was out of the office.

The database is about 530GB uncompressed.  The master is quite busy
all the time, with inserts, updates & deletes.

I've attached all the recent errors.  I could send you the entire log
if you'd prefer, its about 800KB compressed.

Вложения

Re: corrupted indexes when using base backups generated from hot standby

От
Heikki Linnakangas
Дата:
On 26.01.2013 01:28, Lonni J Friedman wrote:
> On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> That process sounds correct. Since you're using pg_basebackup -x option, you
>> don't even need to copy the WAL logs, although it shouldn't do any harm
>> either . The tar file should contain everything needed to restore the
>> backup.
>>
>> Can you provide more information? The log output would be nice. How large is
>> the database? What kind of activity is there in the master while the backup
>> is taken?
>
> Sorry for the delayed reply, I was out of the office.
>
> The database is about 530GB uncompressed.  The master is quite busy
> all the time, with inserts, updates&  deletes.
>
> I've attached all the recent errors.  I could send you the entire log
> if you'd prefer, its about 800KB compressed.

Thanks. I'm afraid I didn't get any wiser from the log output. Since
this is a test system, could you reduce the test case into something
smaller and self-contained?

- Heikki


Re: corrupted indexes when using base backups generated from hot standby

От
Lonni J Friedman
Дата:
On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 26.01.2013 01:28, Lonni J Friedman wrote:
>>
>> On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com>  wrote:
>>>
>>> That process sounds correct. Since you're using pg_basebackup -x option,
>>> you
>>>
>>> don't even need to copy the WAL logs, although it shouldn't do any harm
>>> either . The tar file should contain everything needed to restore the
>>> backup.
>>>
>>> Can you provide more information? The log output would be nice. How large
>>> is
>>> the database? What kind of activity is there in the master while the
>>> backup
>>> is taken?
>>
>>
>> Sorry for the delayed reply, I was out of the office.
>>
>> The database is about 530GB uncompressed.  The master is quite busy
>> all the time, with inserts, updates&  deletes.
>>
>>
>> I've attached all the recent errors.  I could send you the entire log
>> if you'd prefer, its about 800KB compressed.
>
>
> Thanks. I'm afraid I didn't get any wiser from the log output. Since this is
> a test system, could you reduce the test case into something smaller and
> self-contained?

Sorry, I don't understand what you're requesting.  How can I reduce a
test case when all I'm doing to generate the corrupted data is running
pg_basebackup on the standby?  Do you mean using a smaller database?


Re: corrupted indexes when using base backups generated from hot standby

От
Heikki Linnakangas
Дата:
On 29.01.2013 18:36, Lonni J Friedman wrote:
> On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> Thanks. I'm afraid I didn't get any wiser from the log output. Since this is
>> a test system, could you reduce the test case into something smaller and
>> self-contained?
>
> Sorry, I don't understand what you're requesting.  How can I reduce a
> test case when all I'm doing to generate the corrupted data is running
> pg_basebackup on the standby?  Do you mean using a smaller database?

Yes, smaller database, and a simpler schema with e.g only one table and
index. And only do inserts while the backup is running, for example.

- Heikki


Re: corrupted indexes when using base backups generated from hot standby

От
Lonni J Friedman
Дата:
On Tue, Jan 29, 2013 at 8:38 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 29.01.2013 18:36, Lonni J Friedman wrote:
>>
>> On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com>  wrote:
>>>
>>> Thanks. I'm afraid I didn't get any wiser from the log output. Since this
>>> is
>>>
>>> a test system, could you reduce the test case into something smaller and
>>> self-contained?
>>
>>
>> Sorry, I don't understand what you're requesting.  How can I reduce a
>> test case when all I'm doing to generate the corrupted data is running
>> pg_basebackup on the standby?  Do you mean using a smaller database?
>
>
> Yes, smaller database, and a simpler schema with e.g only one table and
> index. And only do inserts while the backup is running, for example.

ok, i'll give that a try.