Tracking down cause of duplicated oid in table.

Поиск
Список
Период
Сортировка
От Jared Carr
Тема Tracking down cause of duplicated oid in table.
Дата
Msg-id 421CE965.4030101@rbsinteractive.com
обсуждение исходный текст
Список pgsql-general
We have a table that we use to store aggregated data, nightly
we add a row for the previous day, and update all the rows in
the table with the new information.  The table is then vacuumed
full.

Our setup is:
Linux Kernel 2.6.4
SCSI Drives Hardware Raid 10 (4 Drives)
Reiserfs
PostgreSQL 7.4.5

The table looks like:
qualitysmith=# \d admin_report_materialized_view
                                       Table "public.admin_report_materialized_view"
             Column            |  Type   |                                   Modifiers

------------------------------+---------+--------------------------------------------------------------------------------
  id                           | integer | not null default
nextval('public.admin_report_materialized_view_id_seq'::text)
  new_order_count              | integer | not null default 0
  out_of_area_count            | integer | not null default 0
  fake_order_count             | integer | not null default 0
  fake_zip_count               | integer | not null default 0
  in_area_zip_count            | integer | not null default 0
  customer_estimates           | integer | not null default 0
  total_estimate_count         | integer | not null default 0
  cancelled_appointment_count  | integer | not null default 0
  gross_estimate_count         | integer | not null default 0
  total_repair_estimate_count  | integer | not null default 0
  total_replace_estimate_count | integer | not null default 0
  active_contractors_count     | integer | not null default 0
  estimate_percentage          | numeric | not null default 0
  trades_id                    | integer | not null
  master_day                   | date    | not null
Indexes:
     "admin_report_materialized_view_pkey" primary key, btree (id)
     "admin_report_materialized_view_unique_day_trades_id" unique, btree (master_day, trades_id)

Yesterday the following popped up in our log files:
Feb 22 06:12:34 bigbrother postgres[9832]: [162-1] WARNING:  index "admin_report_materialized_view_pkey" contains 18677
rowversions, but table contains 18674 row versions 
Feb 22 06:12:34 bigbrother postgres[9832]: [162-2] HINT:  Rebuild the index with REINDEX.
Feb 22 06:12:34 bigbrother postgres[9832]: [163-1] WARNING:  index
"admin_report_materialized_view_unique_day_trades_id"contains 18677 row versions, but table contains 18674 row 
Feb 22 06:12:34 bigbrother postgres[9832]: [163-2]  versions
Feb 22 06:12:34 bigbrother postgres[9832]: [163-3] HINT:  Rebuild the index with REINDEX.

Reindex then gives:
qualitysmith=# reindex index admin_report_materialized_view_pkey;
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

So digging through the table I found the following rows:
qualitysmith=# select tableoid, oid, ctid, xmin, xmax, cmin, cmax, id from admin_report_materialized_view where oid =
'104649735';
  tableoid  |    oid    |  ctid   |   xmin    |   xmax    |   cmin    |   cmax    |  id
-----------+-----------+---------+-----------+-----------+-----------+-----------+------
  104471713 | 104649735 | (13,42) | 704184382 | 704967810 | 704967810 |        15 | 2284
  104471713 | 104649735 | (14,68) | 704184382 |        15 |        15 | 707367279 | 2284

And pg_filedump -i -f -R 13 $PGDATA/base/29539533/104471713 gives (partial):

  Item  42 -- Length:   96  Offset: 4160 (0x1040)  Flags: USED
   XID: min (704184382)  CMIN|XMAX: 704967810  CMAX|XVAC: 15
   Block Id: 470  linp Index: 65   Attributes: 16   Size: 28
   infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

   1040: 3e00f929 82f4042a 0f000000 0000d601  >..)...*........
   1050: 41001000 12291c00 07d43c06 ec080000  A....)....<.....
   1060: 00000000 00000000 00000000 00000000  ................
   1070: 00000000 00000000 00000000 00000000  ................
   1080: 00000000 00000000 00000000 00000000  ................
   1090: 08000000 00000000 0a000000 e0060000  ................

pg_filedump -i -f -R 14 $PGDATA/base/29539533/104471713 gives (partial):

  Item  68 -- Length:   96  Offset: 1440 (0x05a0)  Flags: USED
   XID: min (704184382)  CMIN|XMAX: 15  CMAX|XVAC: 707367279
   Block Id: 14  linp Index: 68   Attributes: 16   Size: 28
   infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

   05a0: 3e00f929 0f000000 6f91292a 00000e00  >..)....o.)*....
   05b0: 44001000 12291c00 07d43c06 ec080000  D....)....<.....
   05c0: 00000000 00000000 00000000 00000000  ................
   05d0: 00000000 00000000 00000000 00000000  ................
   05e0: 00000000 00000000 00000000 00000000  ................
   05f0: 08000000 00000000 0a000000 e0060000  ................

I am not positive my calculations here are correct.  Finding the appropriate
place in the CLOG for the XMIN I come up with file 029F byte offset 2400F which
is 10.  For XMAX (704967810) I come up with file 02A0 offset 13D20 which has
value 20. (I don't know what these mean so they are included for anyone who
may.)

Ok, so trying to hazard a best guess as to what happened. At this point it seems
as though the duplicated row popped up as the result of the vacuum full. Reasoning
on that would be that vacuum would be the only thing that would legitimately move
tuples around in a table without changing their xmin. In this particular case the
vacuum full is run out of crontab at 00:33 via:

/usr/local/bin/vacuumdb -z -q -f -d qualitysmith -t admin_report_materialized_view

Unfortunately I have not found any indication in our log files that the postgres
server underwent anything unusual during this timeframe. No machine reboots, no
postgres server restarts. For reference:

Feb 22 00:02:59 bigbrother postgres[30550]: [162-1] ERROR:  column "job_type_id" of relation "promo_requests" does not
exist
Feb 22 00:04:13 bigbrother postgres[30759]: [162-1] LOG:  recycled transaction log file "0000004400000060"
Feb 22 00:04:13 bigbrother postgres[30759]: [163-1] LOG:  recycled transaction log file "0000004400000061"
Feb 22 00:04:13 bigbrother postgres[30759]: [164-1] LOG:  recycled transaction log file "000000440000005F"
Feb 22 00:05:42 bigbrother postgres[31051]: [162-1] ERROR:  column "job_type_id" of relation "promo_requests" does not
exist
Feb 22 00:06:50 bigbrother postgres[31240]: [162-1] ERROR:  duplicate key violates unique constraint
"visitor_referal_code_unique_btree_idx"
Feb 22 00:09:20 bigbrother postgres[31670]: [162-1] LOG:  recycled transaction log file "0000004400000062"
Feb 22 00:09:20 bigbrother postgres[31670]: [163-1] LOG:  recycled transaction log file "0000004400000063"
Feb 22 00:09:20 bigbrother postgres[31670]: [164-1] LOG:  recycled transaction log file "0000004400000064"
Feb 22 00:24:41 bigbrother postgres[1837]: [162-1] ERROR:  duplicate key violates unique constraint
"visitor_referal_code_unique_btree_idx"
Feb 22 00:28:45 bigbrother postgres[2520]: [162-1] ERROR:  duplicate key violates unique constraint
"visitor_referal_code_unique_btree_idx"
Feb 22 00:38:19 bigbrother postgres[4224]: [162-1] ERROR:  duplicate key violates unique constraint
"visitor_referal_code_unique_btree_idx"
Feb 22 00:39:37 bigbrother postgres[4429]: [162-1] LOG:  recycled transaction log file "0000004400000065"
Feb 22 00:52:23 bigbrother postgres[6811]: [162-1] ERROR:  duplicate key violates unique constraint
"visitor_referal_code_unique_btree_idx"

So now I am back to wondering if this problem originated when I think that it did
(between 2005/02/22 00:00:00 and 2005/02/22 01:00:00) or if this could have been
hiding under the surface for longer than that.

I am suspecting a hardware/fs driver/kernel bug here, and I am hoping to be able
to pinpoint a time in hopes of being able to find any indicator at all that could
tell me where this all started.

Thanks for your time.

--
Jared Carr
RBS Interactive - 89 Glass - QualitySmith
jared@89glass.com

В списке pgsql-general по дате отправления:

Предыдущее
От: "Cristian Prieto"
Дата:
Сообщение: Seguimos buscando webdeveloper...
Следующее
От: "Don Doumakes"
Дата:
Сообщение: Re: Recovering db from cracked server