Обсуждение: Corrupt indexes on slave when using pg_bulkload on master

Поиск
Список
Период
Сортировка

Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
I'm using pg_bulkload to load large amounts of CSV data into a postgres database hourly.
 
This database is replicated to a second node.
 
Whenever a bulk load happens the indexes on the updated tables on the secondary node corrupt and are unusable until a reindex is run on the primary node.  I get the error below on node 2:
 
ERROR: index "tablename" contains unexpected zero page at block 0
SQL state: XX002
Hint: Please REINDEX it.
 
I'm assuming that this is because of the way pg_bulkload builds the index on the primary, and possibly has something to do with the way pg_bulkload overwrites rows in the event of a constraint violation, but at the same time if something works on the primary shouldn't the replicated node be able to process the WAL log?
 
I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build 3.1.1-1.pg91.rhel6 and it happens every time.
 
Does anyone have any experience in this area or advice they could give?  If you can point out something stupid I'm doing that would be very welcome :)
 
Thanks,
 
James

Re: Corrupt indexes on slave when using pg_bulkload on master

От
Jeff Janes
Дата:
On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.

pg_bulkload is fundamentally incompatible with PITR, streaming, and
forms of replication that depend on those things.

Cheers,

Jeff


Re: Corrupt indexes on slave when using pg_bulkload on master

От
"James Cowell"
Дата:
Hi Jeff,

Thanks for your reply.

I can understand that it would be incompatible if you bypassed the WAL logs,
but I've got it configured to use the buffered loader so that WAL logs are
written.  The data replicates fine, the indexes don't, yet a reindex on the
primary fixes the index on the secondary.  I'd like to understand why if
possible.

I've seen mention of bug(s) around corrupt indexes on the secondary node,
there was a fix in 9.1.6, but it doesn't appear to be the same problem I'm
having.

I've also seen other people with similar errors that aren't using
pg_bulkload, for example:
http://postgresql.1045698.n5.nabble.com/BUG-7562-could-not-read-block-0-in-f
ile-quot-base-16385-16585-quot-read-only-0-of-8192-bytes-td5724738.html

Cheers,

James

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: 05 December 2012 18:02
To: James Cowell
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on
master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a
> postgres database hourly.
>
> This database is replicated to a second node.

pg_bulkload is fundamentally incompatible with PITR, streaming, and forms of
replication that depend on those things.

Cheers,

Jeff



Re: Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
Hello,
 
Does anyone else have any insight or information around this issue?  I can't find anything out there on the web.
 
If it's a simple incompatability then fine, but I'd still like to understand why the indexes on the secondary node corrupt when the data seems to be replicated without issue.
 
Thanks,
 
James
 
From: James Cowell <jcowell@btinternet.com>
To: 'Jeff Janes' <jeff.janes@gmail.com>
Cc: pgsql-general@postgresql.org
Sent: Wednesday, 5 December 2012, 18:56
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,

Thanks for your reply.

I can understand that it would be incompatible if you bypassed the WAL logs,
but I've got it configured to use the buffered loader so that WAL logs are
written.  The data replicates fine, the indexes don't, yet a reindex on the
primary fixes the index on the secondary.  I'd like to understand why if
possible.

I've seen mention of bug(s) around corrupt indexes on the secondary node,
there was a fix in 9.1.6, but it doesn't appear to be the same problem I'm
having.

I've also seen other people with similar errors that aren't using
pg_bulkload, for example:
http://postgresql.1045698.n5.nabble.com/BUG-7562-could-not-read-block-0-in-f
ile-quot-base-16385-16585-quot-read-only-0-of-8192-bytes-td5724738.html

Cheers,

James

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: 05 December 2012 18:02
To: James Cowell
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on
master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a
> postgres database hourly.
>
> This database is replicated to a second node.

pg_bulkload is fundamentally incompatible with PITR, streaming, and forms of
replication that depend on those things.

Cheers,

Jeff



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
 
 
I'm using pg_bulkload to load large amounts of CSV data into a postgres database hourly.
This database is replicated to a second node.
Whenever a bulk load happens the indexes on the updated tables on the secondary node corrupt and are unusable until a reindex is run on the primary node. I get the error below on node 2:
ERROR: index "tablename" contains unexpected zero page at block 0
SQL state: XX002
Hint: Please REINDEX it.
I'm assuming that this is because of the way pg_bulkload builds the index on the primary, and possibly has something to do with the way pg_bulkload overwrites rows in the event of a constraint violation, but at the same time if something works on the primary shouldn't the replicated node be able to process the WAL log?
I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build 3.1.1-1.pg91.rhel6 and it happens every time.
Does anyone have any experience in this area or advice they could give? If you can point out something stupid I'm doing that would be very welcome :)
Thanks,
James


 

Re: Corrupt indexes on slave when using pg_bulkload on master

От
Jeff Janes
Дата:
On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff


Re: Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
Hi Jeff,
 
Thanks again for your reply.
 
>If there are no constraint violations, do you still see the problem?
 
Yes, I've stripped it down to an empty table with a 10 row load and the pk index on the secondary node still corrupts.
 
> Were there any older version on which it worked?
 
I'm afraid I started on 9.1.5, I upgraded to 9.1.6 when I had the initial problem due to the bugfix in the changelog to do with corrupt indexes on the secondary node but it hasn't resolved the issue.
 
> Can you post a minimal schema and control file to reproduce the problem?
 
I've attached a text file with details for table, load config file etc, is that everything you would need?
 
Cheers,
 
James

From: Jeff Janes <jeff.janes@gmail.com>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 10 December 2012, 16:53
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff


Вложения

Re: Corrupt indexes on slave when using pg_bulkload on master

От
wd
Дата:
We encounter the same problem, and have to change to use copy command


On Wed, Dec 5, 2012 at 9:17 PM, James Cowell <jcowell@btinternet.com> wrote:
I'm using pg_bulkload to load large amounts of CSV data into a postgres database hourly.
 
This database is replicated to a second node.
 
Whenever a bulk load happens the indexes on the updated tables on the secondary node corrupt and are unusable until a reindex is run on the primary node.  I get the error below on node 2:
 
ERROR: index "tablename" contains unexpected zero page at block 0
SQL state: XX002
Hint: Please REINDEX it.
 
I'm assuming that this is because of the way pg_bulkload builds the index on the primary, and possibly has something to do with the way pg_bulkload overwrites rows in the event of a constraint violation, but at the same time if something works on the primary shouldn't the replicated node be able to process the WAL log?
 
I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build 3.1.1-1.pg91.rhel6 and it happens every time.
 
Does anyone have any experience in this area or advice they could give?  If you can point out something stupid I'm doing that would be very welcome :)
 
Thanks,
 
James

Re: Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
I saw that 9.1.7 was out and it had another bugfix in for WAL playback.
 
I installed it and ran my reproduction case and it no longer corrupts the index.
 
I reindexed the database I load into and did a full data load and the indexes still corrupt on the slave.
 
It does not appear to be related to constraint violation as one table is new rows only.
 
I will try and put together a new reproduction case this week.
 
Cheers,
 
James
 
 
From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Thursday, 13 December 2012, 12:26
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,
 
Thanks again for your reply.
 
>If there are no constraint violations, do you still see the problem?
 
Yes, I've stripped it down to an empty table with a 10 row load and the pk index on the secondary node still corrupts.
 
> Were there any older version on which it worked?
 
I'm afraid I started on 9.1.5, I upgraded to 9.1.6 when I had the initial problem due to the bugfix in the changelog to do with corrupt indexes on the secondary node but it hasn't resolved the issue.
 
> Can you post a minimal schema and control file to reproduce the problem?
 
I've attached a text file with details for table, load config file etc, is that everything you would need?
 
Cheers,
 
James

From: Jeff Janes <jeff.janes@gmail.com>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 10 December 2012, 16:53
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff




Re: Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
Actually, scratch that.  The difference in behaviour seems to be on the optimiser which now table scans the 10 row table (which I guess it should always have done really) rather than use the index as it was in 9.1.6.
 
The same index corruption occurs, so the same reproduction case stands, it just needs a "set enable_seqscan=false" prior to running the selects on the slave.
 
Cheers,
 
James

From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 17 December 2012, 20:42
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

I saw that 9.1.7 was out and it had another bugfix in for WAL playback.
 
I installed it and ran my reproduction case and it no longer corrupts the index.
 
I reindexed the database I load into and did a full data load and the indexes still corrupt on the slave.
 
It does not appear to be related to constraint violation as one table is new rows only.
 
I will try and put together a new reproduction case this week.
 
Cheers,
 
James
 
 
From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Thursday, 13 December 2012, 12:26
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,
 
Thanks again for your reply.
 
>If there are no constraint violations, do you still see the problem?
 
Yes, I've stripped it down to an empty table with a 10 row load and the pk index on the secondary node still corrupts.
 
> Were there any older version on which it worked?
 
I'm afraid I started on 9.1.5, I upgraded to 9.1.6 when I had the initial problem due to the bugfix in the changelog to do with corrupt indexes on the secondary node but it hasn't resolved the issue.
 
> Can you post a minimal schema and control file to reproduce the problem?
 
I've attached a text file with details for table, load config file etc, is that everything you would need?
 
Cheers,
 
James

From: Jeff Janes <jeff.janes@gmail.com>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 10 December 2012, 16:53
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff






Re: Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
Hi Jeff (and group)
 
Was the reproduction information sufficient?
 
Do I need to submit this officially as a bug or something?
 
At the moment I'm considering rebuilding my cluster with 9.0 to see if that works and if not then reverting back to 9.1 but loading each DB seperately.  I would really like to understand why a load of 10 sequential rows with pg_bulkload produces a corrupt index on node 2 though, it just doesn't make sense to me.
 
Thanks and Happy New Year!
 
James

From: James Cowell <jcowell@btinternet.com>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Cc: Jeff Janes <jeff.janes@gmail.com>
Sent: Wednesday, 19 December 2012, 13:11
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Actually, scratch that.  The difference in behaviour seems to be on the optimiser which now table scans the 10 row table (which I guess it should always have done really) rather than use the index as it was in 9.1.6.
 
The same index corruption occurs, so the same reproduction case stands, it just needs a "set enable_seqscan=false" prior to running the selects on the slave.
 
Cheers,
 
James

From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 17 December 2012, 20:42
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

I saw that 9.1.7 was out and it had another bugfix in for WAL playback.
 
I installed it and ran my reproduction case and it no longer corrupts the index.
 
I reindexed the database I load into and did a full data load and the indexes still corrupt on the slave.
 
It does not appear to be related to constraint violation as one table is new rows only.
 
I will try and put together a new reproduction case this week.
 
Cheers,
 
James
 
 
From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Thursday, 13 December 2012, 12:26
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,
 
Thanks again for your reply.
 
>If there are no constraint violations, do you still see the problem?
 
Yes, I've stripped it down to an empty table with a 10 row load and the pk index on the secondary node still corrupts.
 
> Were there any older version on which it worked?
 
I'm afraid I started on 9.1.5, I upgraded to 9.1.6 when I had the initial problem due to the bugfix in the changelog to do with corrupt indexes on the secondary node but it hasn't resolved the issue.
 
> Can you post a minimal schema and control file to reproduce the problem?
 
I've attached a text file with details for table, load config file etc, is that everything you would need?
 
Cheers,
 
James

From: Jeff Janes <jeff.janes@gmail.com>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 10 December 2012, 16:53
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff








Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
I seem to have solved my problem, out of frustration I downloaded the source to see what I could work out (although it's a good while since I did C) and I found these comments in nbtsort-9.1.c:
 
 * Formerly the index pages being built were kept in shared buffers, but
 * that is of no value (since other backends have no interest in them yet)
 * and it created locking problems for CHECKPOINT, because the upper-level
 * pages were held exclusive-locked for long periods.  Now we just build
 * the pages in local memory and smgrwrite or smgrextend them as we finish
 * them.  They will need to be re-read into shared buffers on first use after
 * the build finishes.
 *
 * Since the index will never be used unless it is completely built,
 * from a crash-recovery point of view there is no need to WAL-log the
 * steps of the build. After completing the index build, we can just sync
 * the whole file to disk using smgrimmedsync() before exiting this module.
 * This can be seen to be sufficient for crash recovery by considering that
 * it's effectively equivalent to what would happen if a CHECKPOINT occurred
 * just after the index build. However, it is clearly not sufficient if the
 * DBA is using the WAL log for PITR or replication purposes, since another
 * machine would not be able to reconstruct the index from WAL.  Therefore,
 * we log the completed index pages to WAL if and only if WAL archiving is
 * active.
 
I enabled archive mode (which I didn't care about before as the database only holds 36 hours of data) and the indexes seem to replicate over fine.  I suppose the problem here is lack of documentation, but at least the code is well commented :)
It looks like pg_bulkload works just fine with replication so long as it's set up right.
 
Cheers,
 
James
 
From: James Cowell <jcowell@btinternet.com>
To: James Cowell <jcowell@btinternet.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Cc: Jeff Janes <jeff.janes@gmail.com>
Sent: Thursday, 3 January 2013, 16:12
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff (and group)
 
Was the reproduction information sufficient?
 
Do I need to submit this officially as a bug or something?
 
At the moment I'm considering rebuilding my cluster with 9.0 to see if that works and if not then reverting back to 9.1 but loading each DB seperately.  I would really like to understand why a load of 10 sequential rows with pg_bulkload produces a corrupt index on node 2 though, it just doesn't make sense to me.
 
Thanks and Happy New Year!
 
James

From: James Cowell <jcowell@btinternet.com>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Cc: Jeff Janes <jeff.janes@gmail.com>
Sent: Wednesday, 19 December 2012, 13:11
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Actually, scratch that.  The difference in behaviour seems to be on the optimiser which now table scans the 10 row table (which I guess it should always have done really) rather than use the index as it was in 9.1.6.
 
The same index corruption occurs, so the same reproduction case stands, it just needs a "set enable_seqscan=false" prior to running the selects on the slave.
 
Cheers,
 
James

From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 17 December 2012, 20:42
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

I saw that 9.1.7 was out and it had another bugfix in for WAL playback.
 
I installed it and ran my reproduction case and it no longer corrupts the index.
 
I reindexed the database I load into and did a full data load and the indexes still corrupt on the slave.
 
It does not appear to be related to constraint violation as one table is new rows only.
 
I will try and put together a new reproduction case this week.
 
Cheers,
 
James
 
 
From: James Cowell <jcowell@btinternet.com>
To: Jeff Janes <jeff.janes@gmail.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Thursday, 13 December 2012, 12:26
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,
 
Thanks again for your reply.
 
>If there are no constraint violations, do you still see the problem?
 
Yes, I've stripped it down to an empty table with a 10 row load and the pk index on the secondary node still corrupts.
 
> Were there any older version on which it worked?
 
I'm afraid I started on 9.1.5, I upgraded to 9.1.6 when I had the initial problem due to the bugfix in the changelog to do with corrupt indexes on the secondary node but it hasn't resolved the issue.
 
> Can you post a minimal schema and control file to reproduce the problem?
 
I've attached a text file with details for table, load config file etc, is that everything you would need?
 
Cheers,
 
James

From: Jeff Janes <jeff.janes@gmail.com>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, 10 December 2012, 16:53
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a postgres
> database hourly.
>
> This database is replicated to a second node.
>
> Whenever a bulk load happens the indexes on the updated tables on the
> secondary node corrupt and are unusable until a reindex is run on the
> primary node.  I get the error below on node 2:
>
> ERROR: index "tablename" contains unexpected zero page at block 0
> SQL state: XX002
> Hint: Please REINDEX it.
>
> I'm assuming that this is because of the way pg_bulkload builds the index on
> the primary, and possibly has something to do with the way pg_bulkload
> overwrites rows in the event of a constraint violation,

If there are no constraint violations, do you still see the problem?

> but at the same time
> if something works on the primary shouldn't the replicated node be able to
> process the WAL log?
>
> I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build
> 3.1.1-1.pg91.rhel6 and it happens every time.

Were there any older version on which it worked?  Can you post a
minimal schema and control file to reproduce the problem?

Cheers,

Jeff










Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
Tom Lane
Дата:
James Cowell <jcowell@btinternet.com> writes:
> I enabled archive mode (which I didn't care about before as the database only holds 36 hours of data) and the indexes
seemto replicate over fine.� I suppose the problem here is lack of documentation, but at least the code is well
commented:) 

> It looks like pg_bulkload works just fine with replication so long as it's set up right.

Hm.  I had thought we had interlocks in there to prevent turning on
replication unless the WAL level was sufficiently high.  It sounds like
you managed to dodge that sanity check.  Could you be more specific about
what your replication configuration looks like?

            regards, tom lane


Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
James Cowell
Дата:
Hi Tom,
 
I had replication enabled and it was working fine but I hadn't turned archive mode on yet (it was on the todo list but needed a log location).
 
I had WAL settings:
 
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
wal_buffers = 32MB
 
And checkpoint settings:
 
checkpoint_segments = 64
checkpoint_completion_target = 0.7
 
But pg_bulkload only puts the index updates into WAL if you also have
 
archive_mode = on
 
I guess it needs to test wal_level rather than archive mode now?  It looks like changes to the project have been minimal for some time, which is a shame because it's a very useful tool. 
 
Cheers,
 
James
 
 
From: Tom Lane <tgl@sss.pgh.pa.us>
To: James Cowell <jcowell@btinternet.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>; Jeff Janes <jeff.janes@gmail.com>; "wd@wdicc.com" <wd@wdicc.com>
Sent: Tuesday, 8 January 2013, 18:02
Subject: Re: [GENERAL] [Solved] Corrupt indexes on slave when using pg_bulkload on master

James Cowell <jcowell@btinternet.com> writes:
> I enabled archive mode (which I didn't care about before as the database only holds 36 hours of data) and the indexes seem to replicate over fine.  I suppose the problem here is lack of documentation, but at least the code is well commented :)

> It looks like pg_bulkload works just fine with replication so long as it's set up right.

Hm.  I had thought we had interlocks in there to prevent turning on
replication unless the WAL level was sufficiently high.  It sounds like
you managed to dodge that sanity check.  Could you be more specific about
what your replication configuration looks like?

            regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
Tom Lane
Дата:
James Cowell <jcowell@btinternet.com> writes:
> But pg_bulkload only puts the index updates into WAL if you also have
>
> archive_mode = on
>
> I guess it needs to test wal_level rather than archive mode now?  It looks like changes to the project have been
minimalfor some time, which is a shame because it's a very useful tool. 

Oh, the code in question is in pg_bulkload not the backend?  Yeah, it
sounds like it hasn't tracked some core-code changes.  In particular
you might point the author at
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40f908bdcdc726fc11912cd95dfd2f89603d1f37#patch10
which shows how it needs to be done in 9.0 and later.

            regards, tom lane


Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
Fujii Masao
Дата:
On Thu, Jan 10, 2013 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> James Cowell <jcowell@btinternet.com> writes:
>> But pg_bulkload only puts the index updates into WAL if you also have
>>
>> archive_mode = on
>>
>> I guess it needs to test wal_level rather than archive mode now?  It looks like changes to the project have been
minimalfor some time, which is a shame because it's a very useful tool. 
>
> Oh, the code in question is in pg_bulkload not the backend?  Yeah, it
> sounds like it hasn't tracked some core-code changes.  In particular
> you might point the author at
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40f908bdcdc726fc11912cd95dfd2f89603d1f37#patch10
> which shows how it needs to be done in 9.0 and later.

I reported this problem to the author in offlist. Thanks for the bug report!

Regards,

--
Fujii Masao


Re: [Solved] Corrupt indexes on slave when using pg_bulkload on master

От
"Takashi Ohnishi"
Дата:
Hi, James.

I'm a member of pg_bulkload developer community.
Thank you for the bug report.

Sorry to become late...
We have released the new version of pg_bulkload , 3.1.3 where the reported
bug is  fixed.

I hope this new version will help you.

Regards.
++++++++++++
Takashi Ohnishi <onishi_takashi_d5@lab.ntt.co.jp>

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Fujii Masao
Sent: Thursday, January 10, 2013 11:51 PM
To: Tom Lane
Cc: James Cowell; pgsql-general@postgresql.org
Subject: Re: [GENERAL] [Solved] Corrupt indexes on slave when using
pg_bulkload on master

On Thu, Jan 10, 2013 at 6:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> James Cowell <jcowell@btinternet.com> writes:
>> But pg_bulkload only puts the index updates into WAL if you also have
>>
>> archive_mode = on
>>
>> I guess it needs to test wal_level rather than archive mode now?  It
looks like changes to the project have been minimal for some time, which is
a shame because it's a very useful tool.
>
> Oh, the code in question is in pg_bulkload not the backend?  Yeah, it
> sounds like it hasn't tracked some core-code changes.  In particular
> you might point the author at
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=40f9
> 08bdcdc726fc11912cd95dfd2f89603d1f37#patch10
> which shows how it needs to be done in 9.0 and later.

I reported this problem to the author in offlist. Thanks for the bug report!

Regards,

--
Fujii Masao


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general