Обсуждение: Possible data corruption with Postgres 7.4.8

Поиск
Список
Период
Сортировка

Possible data corruption with Postgres 7.4.8

От
"Eric B. Ridge"
Дата:
First off, let me make clear that I blame the strange data corruption
problem we encountered today on our hardware raid controller -- some
versions of its firmware are known-to-be-buggy and cause the raid set
to "drop" off, and we've encountered this problem in the past on this
particular server.  I also want to blame the corruption on pg7.4.8.
Later versions seem to have fixes for (unspecific) "corruption"
issues.  But hardware seems much more likely.

The environment for this server is:
    - pg7.4.8
    - 2x2.4GHz Xeons, HT
    - 4gig ram
    - Linux kernel version 2.6.15
    - Dell PowerEdge Expandable RAID Controller 3/Di (rev 01) which
supposedly contains the latest firmware patches (tho I dunno how they
relate to "rev 01").

Issue 1
-------

A single record quietly and silently disappeared in the middle of the
day.  We have no daily vacuum processes (only vacuum at night), stats
collection is turned off, and our application has no facility for
actually DELETE-ing records (at least not from the containing
table).  It's pure UPDATE, INSERT, and SELECT.  No foreign key
references that might cause cascading deletes.  And no manual
intervention via psql (that anyone will fess up to!).  From an
application standpoint, there's simply no explanation for a record
disappearing.

Anyways, we were playing around with pg_filedump on the affected
table and discovered that the missing record in question had 212 old
versions, all with exactly this infomask:

    infomask: 0x2513 (HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|
XMAX_COMMITTED|UPDATED)

I don't claim to fully understand what that mask actually means, but
from what I can gather from looking at the masks of other known-to-be-
good records, is that each of the 212 versions are "dead" in terms of
MVCC rules, and that the expiration occurred as the result of an
update to that version while it was "alive".  Is this correct (or
kinda close)?  If I'm guessing correctly it seems we should have had
at least one entry with the XMAX_INVALID flag set... that entry being
the most-current version of the record.  Am I even providing enough
information here for someone else to guess?

At first we feared transaction id wraparound, but the database
cluster is being vacuumed every night and the age(datfrozenxid) was
nowhere near the "2 billion" mark.  We were roughly 73 million past 1
billion.  But we've never seen wraparound before and don't really
know what age(datfrozenxid) would return.  However, it's damn near
impossible we would have run >1 billion transactions in the 9 hours
since the last VACUUM.

Also the pg7.4 docs indicate that VACUUM will output a warning
message when nearing the 1 billion transaction mark.  In all our
VACUUM logs, we've never seen this.

Which leads to...

Issue 2
-------

Last night's VACUUM process output this for a number of indexes:
    WARNING:  index "foo_pkey" contains 1348636 row versions, but
table contains 1348635 row versions

however none of the indexes were on the table with the missing
record.  All the indexes for "foo_pkey"'s table output the same
warning message with the same number of index and table row versions
-- all off by one.  We couldn't decide if each index was corrupt or
if only the table was corrupt or both.

And we were seeing all the really great things you'd expect to see
when indexes don't jive with their tables.  SELECTS using index scans
were returning the wrong records, while sequential scans would return
the correct record (or no record at all, which was expected).

Finally, after running some consistency checks against previous
backups all we could decide to do was upgrade to pg7.4.12, REINDEX
every index, vacuum the thing again, restore the one missing record,
and move on.

We did save a binary copy of the entire installation tree, including
$PGDATA in case further investigation is necessary.

Does anyone here have any kind of explanation other than bad hardware?

eric



Re: Possible data corruption with Postgres 7.4.8

От
Tom Lane
Дата:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> Does anyone here have any kind of explanation other than bad hardware?

Well, there are several data-corruption bugs fixed between 7.4.8 and
7.4.12, though whether any of them explains your symptoms is difficult
to say:

2005-11-02 19:23  tgl

    * src/backend/access/transam/slru.c (REL7_4_STABLE): Fix
    longstanding race condition in transaction log management: there
    was a very narrow window in which SimpleLruReadPage or
    SimpleLruWritePage could think that I/O was needed when it wasn't
    (and indeed the buffer had already been assigned to another page).
    This would result in an Assert failure if Asserts were enabled, and
    probably in silent data corruption if not.  Reported independently
    by Jim Nasby and Robert Creager.

    I intend a more extensive fix when 8.2 development starts, but this
    is a reasonably low-impact patch for the existing branches.

2005-08-25 18:07  tgl

    * src/: backend/access/heap/heapam.c, backend/commands/async.c,
    backend/commands/trigger.c, backend/commands/vacuum.c,
    backend/executor/execMain.c, backend/utils/time/tqual.c,
    include/access/heapam.h, include/executor/executor.h
    (REL7_4_STABLE): Back-patch fixes for problems with VACUUM
    destroying t_ctid chains too soon, and with insufficient paranoia
    in code that follows t_ctid links.  This patch covers the 7.4
    branch.


2005-05-07 17:33  tgl

    * src/backend/: access/heap/hio.c, access/nbtree/nbtpage.c,
    access/nbtree/nbtree.c, commands/vacuumlazy.c (REL7_4_STABLE):
    Repair very-low-probability race condition between relation
    extension and VACUUM: in the interval between adding a new page to
    the relation and formatting it, it was possible for VACUUM to come
    along and decide it should format the page too.  Though not harmful
    in itself, this would cause data loss if a third transaction were
    able to insert tuples into the vacuumed page before the original
    extender got control back.

2005-05-07 17:23  tgl

    * src/backend/utils/time/tqual.c (REL7_4_STABLE): Adjust time qual
    checking code so that we always check TransactionIdIsInProgress
    before we check commit/abort status.  Formerly this was done in
    some paths but not all, with the result that a transaction might be
    considered committed for some purposes before it became committed
    for others.  Per example found by Jan Wieck.


The relation-extension race condition could explain recently-added
tuples simply disappearing, though if it happened in more than one table
you'd have to assume that the race condition window got hit more than
once.  The slru race condition is even narrower, but if it hit then it
could cause tuples inserted by the same transaction into different
tables to become lost.  Either of these seem to match your symptoms?

            regards, tom lane

Re: Possible data corruption with Postgres 7.4.8

От
"Eric B. Ridge"
Дата:
On Mar 13, 2006, at 11:12 PM, Tom Lane wrote:

> The relation-extension race condition could explain recently-added
> tuples simply disappearing, though if it happened in more than one
> table
> you'd have to assume that the race condition window got hit more than
> once.  The slru race condition is even narrower, but if it hit then it
> could cause tuples inserted by the same transaction into different
> tables to become lost.  Either of these seem to match your symptoms?

If "tuples inserted" can mean "tuples inserted by means of an UPDATE
statement", then I think the latter applies.

In digging further, the record which disappeared should have had
corresponding records in three other tables, each of which were also
missing.  The update across all four tables would have been part of
the same transaction (they're updated at the same time as part of an
ON UPDATE RULE against a VIEW).

eric