Обсуждение: [BUGS] Missing PRIMARY KEYs and duplicated rows

Поиск
Список
Период
Сортировка

[BUGS] Missing PRIMARY KEYs and duplicated rows

От
Chris Pacejo
Дата:
Hi all, we're encountering a very strange issue lately on a 9.4.5 server.

We recently performed a dump & restore of the entire cluster to
migrate to a new server.  All but one of the databases seem fine.
That one is encountering a very strange issue.

It seems that periodically, the primary keys (and underlying indexes)
of all tables in the database disappear, and every row in each table
gets duplicated.  This appears to occur at the storage level, as the
duplicated rows get their own ctids (so it's not just a query
artifact).  I've seen this happen twice in the past week.

I can delete the duplicate rows (by their ctid).  In the first
instance, I was unable to re-add the primary key, as it had reappeared
by the time I issued the statement.  In the most recent instance, I
did re-add the primary key to a table, and the other primary keys have
not yet re-appeared.

All indexes are b-tree indexes.  They were not created using
CONCURRENTLY.  They are in the default tablespace.  The installation
is on an Intel CentOS machine using the PGDG RPMs.  I've checked the
release notes for all 9.4 bugfixes and do not see anything which could
relate to this issue.

Any insight into this issue?  We will probably just drop and recreate
the database from the recent dump, as it is on a production server and
we'd rather not take chances tinkering around with it.  But it would
be nice to know why/how we've encountered this to ensure it won't
happen again, because frankly it's quite a scary issue.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Peter Geoghegan
Дата:
On Wed, Apr 12, 2017 at 1:46 PM, Chris Pacejo <cpacejo@clearskydata.com> wrote:
> All indexes are b-tree indexes.

Can you show us the definition of all affected indexes? Any
discernible pattern to them?


-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Chris Pacejo
Дата:
On Wed, Apr 12, 2017 at 4:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Wed, Apr 12, 2017 at 1:46 PM, Chris Pacejo <cpacejo@clearskydata.com> wrote:
>> All indexes are b-tree indexes.
>
> Can you show us the definition of all affected indexes? Any
> discernible pattern to them?

They are bog-standard b-trees created on behalf of a primary key,
almost always an integer or bigint.  E.g., from one of the unaffected
databases:

Indexes:   "pk_databasechangeloglock" PRIMARY KEY, btree (id)

The primary keys disappear from pg_class as well.

The only pattern is that they're all in the same database.  Which
seems very strange to me; I'm having trouble thinking of what part of
Postgres would affect both schema AND data in all tables of ONE
database.  If I remember correctly each table is stored in a separate
file, and the WAL and server processes are shared across all
databases.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Mark Kirkwood
Дата:
On 13/04/17 08:46, Chris Pacejo wrote:

> Hi all, we're encountering a very strange issue lately on a 9.4.5 server.
>
> We recently performed a dump & restore of the entire cluster to
> migrate to a new server.  All but one of the databases seem fine.
> That one is encountering a very strange issue.
>
> It seems that periodically, the primary keys (and underlying indexes)
> of all tables in the database disappear, and every row in each table
> gets duplicated.  This appears to occur at the storage level, as the
> duplicated rows get their own ctids (so it's not just a query
> artifact).  I've seen this happen twice in the past week.
>
> I can delete the duplicate rows (by their ctid).  In the first
> instance, I was unable to re-add the primary key, as it had reappeared
> by the time I issued the statement.  In the most recent instance, I
> did re-add the primary key to a table, and the other primary keys have
> not yet re-appeared.
>
> All indexes are b-tree indexes.  They were not created using
> CONCURRENTLY.  They are in the default tablespace.  The installation
> is on an Intel CentOS machine using the PGDG RPMs.  I've checked the
> release notes for all 9.4 bugfixes and do not see anything which could
> relate to this issue.
>
> Any insight into this issue?  We will probably just drop and recreate
> the database from the recent dump, as it is on a production server and
> we'd rather not take chances tinkering around with it.  But it would
> be nice to know why/how we've encountered this to ensure it won't
> happen again, because frankly it's quite a scary issue.
>
>

I'd be inclined to suspect the hardware with weird behaviour like that 
(memory and disk errors in particular).

Cheers

Mark



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Chris Pacejo
Дата:
On Apr 12, 2017 6:13 PM, "Mark Kirkwood" <mark.kirkwood@catalyst.net.nz> wrote:

I'd be inclined to suspect the hardware with weird behaviour like that (memory and disk errors in particular).


Me too, but how would that explain the observation that it's occurring only within one database (and on all tables within that database)? The only thing all tables within a database have in common on disk is that they reside in the same directory.

And the ctids of the duplicate rows indicate that they're allocated sequentially in the same page as the legitimate rows. (The tables are mostly all less than a dozen rows.) How could this be explained by disk corruption?

I suppose memory corruption is possible, but it still seems oddly specific.

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Mark Kirkwood
Дата:
On 13/04/17 10:22, Chris Pacejo wrote:

> On Apr 12, 2017 6:13 PM, "Mark Kirkwood" 
> <mark.kirkwood@catalyst.net.nz <mailto:mark.kirkwood@catalyst.net.nz>> 
> wrote:
>
>
>     I'd be inclined to suspect the hardware with weird behaviour like
>     that (memory and disk errors in particular).
>
>
>
> Me too, but how would that explain the observation that it's occurring 
> only within one database (and on all tables within that database)? The 
> only thing all tables within a database have in common on disk is that 
> they reside in the same directory.
>
> And the ctids of the duplicate rows indicate that they're allocated 
> sequentially in the same page as the legitimate rows. (The tables are 
> mostly all less than a dozen rows.) How could this be explained by 
> disk corruption?
>
> I suppose memory corruption is possible, but it still seems oddly 
> specific.

Things disappearing and coming back sounds more like bad memory to as 
well. I'd recommend taking that server down and running memcheck.

regards

Mark



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Missing PRIMARY KEYs and duplicated rows

От
Chris Pacejo
Дата:
Aha.  In fact it is none of the above.  This turns out to be a
mundane, if insidious, combination of DNS weirdness and doubly-run
restore script.  Thank you all for your time, sorry to bother the
list.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs