Обсуждение: BUG #4929: Corrupted pg_class, possibly truncate/rollback related

Поиск
Список
Период
Сортировка

BUG #4929: Corrupted pg_class, possibly truncate/rollback related

От
"Robert Treat"
Дата:
The following bug has been logged online:

Bug reference:      4929
Logged by:          Robert Treat
Email address:      xzilla@users.sourceforge.net
PostgreSQL version: 8.3.1
Operating system:   Linux 2.6.18-53.1.14.el5 #1 SMP x86_64 GNU/Linux
(Centos)
Description:        Corrupted pg_class, possibly truncate/rollback related
Details:

Howdy,

I wanted to see if a situation I've encountered is a known issue or perhaps
worth more investigation. I'm working on a Postgres 8.3.1 system, where I've
uncovered some catalog corruption. The problems started with a failing
pg_dump...

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  more than one row returned by a
subquery used as an expression
pg_dump: The command was: SELECT tableoid, oid, typname, typnamespace,
(SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) as rolname,
typinput::oid as typinput, typoutput::oid as typoutput
, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT
relkind FROM pg_class WHERE oid = typrelid) END as typrelkind, typtype,
typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM
pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type


I tracked this down to this problem in pg_class

pe2=# select xmin, xmax, cmin, cmax, ctid, oid, relnamespace, relname,
reltype, relowner, relfilenode, relpages, reltuples from pg_class  where oid
= 23708;
    xmin    |    xmax    | cmin | cmax |   ctid   |  oid  | relnamespace |
relname  | reltype | relowner | relfilenode | relpages |  reltuples
------------+------------+------+------+----------+-------+--------------+--
--------+---------+----------+-------------+----------+-------------
 3291061347 |          0 |    6 |    6 | (1118,2) | 23708 |        23681 |
prooln_m |   23710 |       10 |      654963 |   114055 | 7.42746e+06
          2 | 3291061347 |    6 |    6 | (23,39)  | 23708 |        23681 |
prooln_m |   23710 |       10 |      181519 |   104401 |  6.5017e+06


So, clearly this is bad. It seems we've gotten some level of corruption on
disk. The most perculiar bits of information around this system are that we
tend to have long running vacuum jobs (multiple days), and we recently did a
truncate + rollback within a transaction on the table in question. I've also
noticed that the file on disk for the 181519 row is not actually there. Oh,
and this does run on lvm, though we haven't used the lvm feature set for a
long time. So, worth investigating?

Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related

От
Andrew Gierth
Дата:
>>>>> "Robert" == "Robert Treat" <xzilla@users.sourceforge.net> writes:

 Robert> Bug reference:      4929
 Robert> Logged by:          Robert Treat
 Robert> Email address:      xzilla@users.sourceforge.net
 Robert> PostgreSQL version: 8.3.1

pe2=# select xmin, xmax, cmin, cmax, ctid, oid, relnamespace, relname,
 reltype, relowner, relfilenode, relpages, reltuples from pg_class  where oid = 23708;
     xmin    |    xmax    | cmin | cmax |   ctid   |  oid  | relnamespace | relname  | reltype | relowner | relfilenode
|relpages |  reltuples   

------------+------------+------+------+----------+-------+--------------+----------+---------+----------+-------------+----------+-------------
  3291061347 |          0 |    6 |    6 | (1118,2) | 23708 |        23681 | prooln_m |   23710 |       10 |      654963
|  114055 | 7.42746e+06 
           2 | 3291061347 |    6 |    6 | (23,39)  | 23708 |        23681 | prooln_m |   23710 |       10 |      181519
|  104401 |  6.5017e+06 

 Robert> So, clearly this is bad. It seems we've gotten some level of
 Robert> corruption on disk. The most perculiar bits of information
 Robert> around this system are that we tend to have long running
 Robert> vacuum jobs (multiple days), and we recently did a truncate +
 Robert> rollback within a transaction on the table in question. I've
 Robert> also noticed that the file on disk for the 181519 row is not
 Robert> actually there. Oh, and this does run on lvm, though we
 Robert> haven't used the lvm feature set for a long time. So, worth
 Robert> investigating?

I did some analysis on this at Robert's request on IRC. Here are hexdumps
of the offending tuples:

(1118,2):

00001e60  63 9c 29 c4 00 00 00 00  06 00 00 00 00 00 5e 04  |c.)...........^.|
00001e70  02 00 1b 00 0b 29 20 ff  ff ff 03 00 9c 5c 00 00  |.....) ......\..|
00001e80  70 72 6f 6f 6c 6e 5f 6d  00 00 00 00 00 00 00 00  |prooln_m........|
00001e90  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001ec0  81 5c 00 00 9e 5c 00 00  0a 00 00 00 00 00 00 00  |.\...\..........|
00001ed0  73 fe 09 00 00 00 00 00  87 bd 01 00 02 ab e2 4a  |s..............J|
00001ee0  00 00 00 00 00 00 00 00  01 00 72 00 0f 00 00 00  |..........r.....|
00001ef0  00 00 00 00 00 00 00 00  00 01 00 00 42 9c 29 c4  |............B.).|
00001f00  5b 01 00 00 00 00 00 00  00 09 04 00 00 02 00 00  |[...............|
00001f10  00 01 00 00 00 0a 00 00  00 0a 00 00 00 6f 00 00  |.............o..|
00001f20  00 15 5a 02 00 0a 00 00  00 02 00 00 00 00 00 00  |..Z.............|

Everything above looks about as we expect.

(1118,1): this is an earlier version of the tuple, correctly marked dead,
presumably resulting from the truncate/rollback referred to above:

00001f30  19 55 23 c4 00 00 00 00  06 00 00 00 00 00 5e 04  |.U#...........^.|
00001f40  01 00 1b 00 0b 2a 20 ff  ff ff 03 00 9c 5c 00 00  |.....* ......\..|
00001f50  70 72 6f 6f 6c 6e 5f 6d  00 00 00 00 00 00 00 00  |prooln_m........|
00001f60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001f90  81 5c 00 00 9e 5c 00 00  0a 00 00 00 00 00 00 00  |.\...\..........|
00001fa0  72 fe 09 00 00 00 00 00  00 00 00 00 00 00 00 00  |r...............|
00001fb0  00 00 00 00 00 00 00 00  01 00 72 00 0f 00 00 00  |..........r.....|
00001fc0  00 00 00 00 00 00 00 00  00 01 00 00 19 55 23 c4  |.............U#.|
00001fd0  5b 01 00 00 00 00 00 00  00 09 04 00 00 02 00 00  |[...............|
00001fe0  00 01 00 00 00 0a 00 00  00 0a 00 00 00 6f 00 00  |.............o..|
00001ff0  00 15 5a 02 00 0a 00 00  00 02 00 00 00 00 00 00  |..Z.............|

Again, no obvious surprises.

This is (23,39) which is the broken one; note HEAP_XMAX_INVALID is
set, despite the fact that this is a transaction that committed (as
evidenced by the removal of the old relfilenode) so the row is showing
up incorrectly to queries. Note also that HEAP_ONLY_TUPLE is set (but
there are no other versions of this tuple on page 23).

00000a20  02 00 00 00 63 9c 29 c4  06 00 00 00 00 00 5e 04  |....c.).......^.|
00000a30  02 00 1b 80 0b 29 20 ff  ff ff 03 00 9c 5c 00 00  |.....) ......\..|
00000a40  70 72 6f 6f 6c 6e 5f 6d  00 00 00 00 00 00 00 00  |prooln_m........|
00000a50  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000a80  81 5c 00 00 9e 5c 00 00  0a 00 00 00 00 00 00 00  |.\...\..........|
00000a90  0f c5 02 00 00 00 00 00  d1 97 01 00 8a 6a c6 4a  |.............j.J|
00000aa0  00 00 00 00 00 00 00 00  01 00 72 00 0f 00 00 00  |..........r.....|
00000ab0  00 00 00 00 00 00 00 00  00 01 00 00 c5 52 c5 b9  |.............R..|
00000ac0  5b 01 00 00 00 00 00 00  00 09 04 00 00 02 00 00  |[...............|
00000ad0  00 01 00 00 00 0a 00 00  00 0a 00 00 00 6f 00 00  |.............o..|
00000ae0  00 15 5a 02 00 0a 00 00  00 02 00 00 00 00 00 00  |..Z.............|


--
Andrew (irc:RhodiumToad)

Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related

От
Tom Lane
Дата:
"Robert Treat" <xzilla@users.sourceforge.net> writes:
> PostgreSQL version: 8.3.1
> Description:        Corrupted pg_class, possibly truncate/rollback related

FWIW, there is a bug fix in 8.3.4 addressing possible HOT-chain
corruption after a REINDEX of pg_class.  Not sure if that could've been
the issue here.

            regards, tom lane

Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > "Robert Treat" <xzilla@users.sourceforge.net> writes:
 >> PostgreSQL version: 8.3.1
 >> Description:        Corrupted pg_class, possibly truncate/rollback related

 Tom> FWIW, there is a bug fix in 8.3.4 addressing possible HOT-chain
 Tom> corruption after a REINDEX of pg_class.  Not sure if that
 Tom> could've been the issue here.

I asked Robert about that when we were discussing it on IRC, and he
said that no such REINDEX had been done (and nor had ALTER TABLE
RENAME and ALTER TABLE SET SCHEMA been done on the affected table).

However I'm wondering if another 8.3.4 fix, the RecentGlobalXmin one,
could be relevant here?
http://archives.postgresql.org/pgsql-committers/2008-09/msg00105.php
(I'm not seeing how it would be, but... note that the xids have got
to the point that they'd appear to be in the past from the point of
view of FirstNormalTransactionId)

--
Andrew (irc:RhodiumToad)

Re: BUG #4929: Corrupted pg_class, possibly truncate/rollback related

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> However I'm wondering if another 8.3.4 fix, the RecentGlobalXmin one,
> could be relevant here?
> http://archives.postgresql.org/pgsql-committers/2008-09/msg00105.php
> (I'm not seeing how it would be, but... note that the xids have got
> to the point that they'd appear to be in the past from the point of
> view of FirstNormalTransactionId)

Hmmm ... I think you're onto something.  My original speculation at
http://archives.postgresql.org/message-id/29544.1221061979@sss.pgh.pa.us
included a worry that the HOT patch could try to use RecentGlobalXmin
before it had gotten set.  In particular, before the patch you mention
above, the InitPostgres transaction would be running around and doing
quite a lot of system catalog access with an unset RecentGlobalXmin.
If it happened to try to execute heap_page_prune on a pg_class page,
there would be a risk of deciding that some valid tuple wasn't valid.
I don't have time to try to trace through the logic and see if this
explains Robert's problem, though.

            regards, tom lane