Обсуждение: corrupted indexes when using base backups generated from hot standby
Greetings, I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a few months ago, I switched from generating a base backup on the master, to generating it on a dedicated slave/standby (to reduce the load on the master). The command that I've always used to generate the base backup is: pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres However, I've noticed that whenever I use the base backup generated from the standby to create a new standby server, many of the indexes are corrupted. This was never the case when I was generating the basebackup directly from the master. Now, I see errors similar to the following when running queries against the tables that own the indexes: INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12 HINT: Please REINDEX it. INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero page at block 111 HINT: Please REINDEX it. I've confirmed that the errors/corruption doesn't exist on the server that is generating the base backup (I can run the same SQL query which fails on the new standby, successfully). So it seems that I'm potentially misunderstanding some part of the process. My setup process is to simply untar the basebackup in the $PGDATA directory, and copy over all the WAL logs into $PGDATA/pg_xlog. thanks for any pointers.
Re: corrupted indexes when using base backups generated from hot standby
От
Heikki Linnakangas
Дата:
On 09.01.2013 20:28, Lonni J Friedman wrote: > Greetings, > I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and > several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a > few months ago, I switched from generating a base backup on the > master, to generating it on a dedicated slave/standby (to reduce the > load on the master). The command that I've always used to generate > the base backup is: > pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres > > However, I've noticed that whenever I use the base backup generated > from the standby to create a new standby server, many of the indexes > are corrupted. This was never the case when I was generating the > basebackup directly from the master. Now, I see errors similar to the > following when running queries against the tables that own the > indexes: > INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block 12 > HINT: Please REINDEX it. > INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero > page at block 111 > HINT: Please REINDEX it. > > I've confirmed that the errors/corruption doesn't exist on the server > that is generating the base backup (I can run the same SQL query which > fails on the new standby, successfully). So it seems that I'm > potentially misunderstanding some part of the process. My setup > process is to simply untar the basebackup in the $PGDATA directory, > and copy over all the WAL logs into $PGDATA/pg_xlog. That process sounds correct. Since you're using pg_basebackup -x option, you don't even need to copy the WAL logs, although it shouldn't do any harm either . The tar file should contain everything needed to restore the backup. Can you provide more information? The log output would be nice. How large is the database? What kind of activity is there in the master while the backup is taken? - Heikki
On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 09.01.2013 20:28, Lonni J Friedman wrote: >> >> Greetings, >> I'm running postgres-9.2.2 in a Linux-x86_64 cluster with 1 master and >> several hot standby servers. Since upgrading to 9.2.2 from 9.1.x a >> few months ago, I switched from generating a base backup on the >> master, to generating it on a dedicated slave/standby (to reduce the >> load on the master). The command that I've always used to generate >> the base backup is: >> pg_basebackup -v -D /tmp/bb0 -x -Ft -U postgres >> >> However, I've noticed that whenever I use the base backup generated >> from the standby to create a new standby server, many of the indexes >> are corrupted. This was never the case when I was generating the >> basebackup directly from the master. Now, I see errors similar to the >> following when running queries against the tables that own the >> indexes: >> INDEX "debugger_2013_01_dacode_idx" contains unexpected zero page at block >> 12 >> HINT: Please REINDEX it. >> INDEX "smoke32on64tests_2013_01_suiteid_idx" contains unexpected zero >> page at block 111 >> HINT: Please REINDEX it. >> >> I've confirmed that the errors/corruption doesn't exist on the server >> that is generating the base backup (I can run the same SQL query which >> fails on the new standby, successfully). So it seems that I'm >> potentially misunderstanding some part of the process. My setup >> process is to simply untar the basebackup in the $PGDATA directory, >> and copy over all the WAL logs into $PGDATA/pg_xlog. > > > That process sounds correct. Since you're using pg_basebackup -x option, you > don't even need to copy the WAL logs, although it shouldn't do any harm > either . The tar file should contain everything needed to restore the > backup. > > Can you provide more information? The log output would be nice. How large is > the database? What kind of activity is there in the master while the backup > is taken? Sorry for the delayed reply, I was out of the office. The database is about 530GB uncompressed. The master is quite busy all the time, with inserts, updates & deletes. I've attached all the recent errors. I could send you the entire log if you'd prefer, its about 800KB compressed.
Вложения
Re: corrupted indexes when using base backups generated from hot standby
От
Heikki Linnakangas
Дата:
On 26.01.2013 01:28, Lonni J Friedman wrote: > On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> That process sounds correct. Since you're using pg_basebackup -x option, you >> don't even need to copy the WAL logs, although it shouldn't do any harm >> either . The tar file should contain everything needed to restore the >> backup. >> >> Can you provide more information? The log output would be nice. How large is >> the database? What kind of activity is there in the master while the backup >> is taken? > > Sorry for the delayed reply, I was out of the office. > > The database is about 530GB uncompressed. The master is quite busy > all the time, with inserts, updates& deletes. > > I've attached all the recent errors. I could send you the entire log > if you'd prefer, its about 800KB compressed. Thanks. I'm afraid I didn't get any wiser from the log output. Since this is a test system, could you reduce the test case into something smaller and self-contained? - Heikki
On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 26.01.2013 01:28, Lonni J Friedman wrote: >> >> On Tue, Jan 15, 2013 at 2:57 AM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> That process sounds correct. Since you're using pg_basebackup -x option, >>> you >>> >>> don't even need to copy the WAL logs, although it shouldn't do any harm >>> either . The tar file should contain everything needed to restore the >>> backup. >>> >>> Can you provide more information? The log output would be nice. How large >>> is >>> the database? What kind of activity is there in the master while the >>> backup >>> is taken? >> >> >> Sorry for the delayed reply, I was out of the office. >> >> The database is about 530GB uncompressed. The master is quite busy >> all the time, with inserts, updates& deletes. >> >> >> I've attached all the recent errors. I could send you the entire log >> if you'd prefer, its about 800KB compressed. > > > Thanks. I'm afraid I didn't get any wiser from the log output. Since this is > a test system, could you reduce the test case into something smaller and > self-contained? Sorry, I don't understand what you're requesting. How can I reduce a test case when all I'm doing to generate the corrupted data is running pg_basebackup on the standby? Do you mean using a smaller database?
Re: corrupted indexes when using base backups generated from hot standby
От
Heikki Linnakangas
Дата:
On 29.01.2013 18:36, Lonni J Friedman wrote: > On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> Thanks. I'm afraid I didn't get any wiser from the log output. Since this is >> a test system, could you reduce the test case into something smaller and >> self-contained? > > Sorry, I don't understand what you're requesting. How can I reduce a > test case when all I'm doing to generate the corrupted data is running > pg_basebackup on the standby? Do you mean using a smaller database? Yes, smaller database, and a simpler schema with e.g only one table and index. And only do inserts while the backup is running, for example. - Heikki
On Tue, Jan 29, 2013 at 8:38 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 29.01.2013 18:36, Lonni J Friedman wrote: >> >> On Tue, Jan 29, 2013 at 8:32 AM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> Thanks. I'm afraid I didn't get any wiser from the log output. Since this >>> is >>> >>> a test system, could you reduce the test case into something smaller and >>> self-contained? >> >> >> Sorry, I don't understand what you're requesting. How can I reduce a >> test case when all I'm doing to generate the corrupted data is running >> pg_basebackup on the standby? Do you mean using a smaller database? > > > Yes, smaller database, and a simpler schema with e.g only one table and > index. And only do inserts while the backup is running, for example. ok, i'll give that a try.