Re: Partially corrupted table

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Partially corrupted table
Дата
Msg-id 19402.1156894425@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Partially corrupted table  ("Filip Hrbek" <filip.hrbek@plz.comstar.cz>)
Ответы Re: Partially corrupted table  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-bugs
Well, it's a corrupt-data problem all right.  The tuple that's
causing the problem is on page 1208, item 27:

 Item  27 -- Length:  240  Offset: 1400 (0x0578)  Flags: USED
  XMIN: 5213  CMIN: 140502  XMAX: 0  CMAX|XVAC: 0
  Block Id: 1208  linp Index: 27   Attributes: 29   Size: 28
  infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

  0578: 5d140000 d6240200 00000000 00000000  ]....$..........
  0588: 0000b804 1b001d00 02091c00 0e000000  ................
  0598: 02000000 42020000 23040000 6b000000  ....B...#...k...
  05a8: 02000000 6a010000 0d000000 42020000  ....j.......B...
  05b8: 02000000 10000000 08000000 00000400  ................
  05c8: 08000000 00000400 0a000000 ffff0400  ................
  05d8: 78050000 0a000000 00000200 03000000  x...............
  05e8: 08000000 00000300 08000000 00000400  ................
  05f8: 08000000 00000400 08000000 00000400  ................
  0608: 08000000 00000200 08000000 00000300  ................
  0618: 08800000 00000400 08000000 00000400  ................
        ^^^^^^^^
  0628: 08000000 00000400 08000000 00000200  ................
  0638: 08000000 00000300 08000000 00000400  ................
  0648: 08000000 00000400 18000000 494e565f  ............INV_
  0658: 41534153 5f323030 36303130 31202020  ASAS_20060101

The underlined word is a field length word that evidently should contain
8, but contains hex 8008.  This causes the tuple-data decoder to step
way past the end of the tuple and off into never-never land.  Since the
results will depend on which shared buffer the page happens to be in and
what happens to be at the address the step lands at, the inconsistent
results from try to try are not so surprising.

The next question is how did it get that way.  In my experience a
single-bit flip like that is most likely to be due to flaky memory,
though bad motherboards or cables are not out of the question either.
I'd recommend some thorough hardware testing on the original machine.

It seems there's only the one bad bit; I did

dwhdb=# delete from dwhdata_salemc.fct where ctid = '(1208,27)';
DELETE 1

and then was able to copy the table repeatedly without crash.  I'd
suggest doing that and then reconstructing the deleted tuple from
the above dump.

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Partially corrupted table
Следующее
От: Charlie Savage
Дата:
Сообщение: Re: BUG #2594: Gin Indexes cause server to crash on Windows