Обсуждение: database corruption help

Поиск
Список
Период
Сортировка

database corruption help

От
"John Lister"
Дата:
Hi, my wal archiving broke and postgresql filled up the local disk with transaction logs, which i foolishly deleted in a moment of madness, after resetting the transaction log a few of my tables are damaged but repairable. However the system tables also seemed to have suffered. My main problem is duplicate rows which violated the primary key and if i do a reindex system i get similar errors, for example:
 
ERROR:  could not create unique index "pg_class_oid_index"
a quick inspection of the pg_class table doesn't show any duplicates, is there anyway i can find out which row(s) are duplicated and remove them without a full db restore?
 
also doing something like this doesn't return anything which worked for my other tables
 
select oid from pg_class where oid in(select oid from pg_class group by oid having count(oid)>1 )
 
 
Many thanks
--
 
Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/

Re: database corruption help

От
Tom Lane
Дата:
"John Lister" <john.lister-ps@kickstone.com> writes:
> ERROR:  could not create unique index "pg_class_oid_index"

> a quick inspection of the pg_class table doesn't show any duplicates, is there anyway i can find out which row(s) are
duplicatedand remove them without a full db restore? 

> also doing something like this doesn't return anything which worked for my other tables

> select oid from pg_class where oid in(select oid from pg_class group by oid having count(oid)>1 )

Try doing it with enable_indexscan = off and enable_bitmapscan = off.
The system might be relying on the old, bogus index to do the group by
oid.

            regards, tom lane

Re: database corruption help

От
"John Lister"
Дата:
Cheers tom, that did it - i've removed the duplicates and seeing what else
is broken.

.


> "John Lister" <john.lister-ps@kickstone.com> writes:
>> ERROR:  could not create unique index "pg_class_oid_index"
>
>> a quick inspection of the pg_class table doesn't show any duplicates, is
>> there anyway i can find out which row(s) are duplicated and remove them
>> without a full db restore?
>
>> also doing something like this doesn't return anything which worked for
>> my other tables
>
>> select oid from pg_class where oid in(select oid from pg_class group by
>> oid having count(oid)>1 )
>
> Try doing it with enable_indexscan = off and enable_bitmapscan = off.
> The system might be relying on the old, bogus index to do the group by
> oid.
>
> regards, tom lane
>


Re: database corruption help

От
Tom Lane
Дата:
John Lister <john.lister@kickstone.com> writes:
> Although saying that..
> reindex now works, but doing a vacuum verbose complains that the index
> is out of step with the table and i should reindex..
> would i be better shutting the db down, restarting in standalone mode
> (and also using  the -P option) before reindexing?

Yeah, if you have reason not to trust the system indexes then -P is
a good idea until they are fixed.  Standalone mode per se isn't that
important --- you could do this from a regular session with -P specified
via PGOPTIONS.

            regards, tom lane

Re: database corruption help

От
John Lister
Дата:
 >Yeah, if you have reason not to trust the system indexes then -P is
 >a good idea until they are fixed. Standalone mode per se isn't that
 >important --- you could do this from a regular session with -P specified
 >via PGOPTIONS.

still getting the same problem.


 > PGOPTIONS="-P"
 > psql backend
 >> reindex system backend;
 >> NOTICE:  table "pg_class" was reindexed
 >> NOTICE:  table "pg_statistic" was reindexed
 >> NOTICE:  table "pg_ts_config" was reindexed ...

 >> vacuum full verbose pg_class

 >> INFO:  vacuuming "pg_catalog.pg_class"
 >> INFO:  "pg_class": found 7 removable, 1839 nonremovable row versions
in 18812 pages
 >> DETAIL:  27 dead row versions cannot be removed yet.
 >> Nonremovable row versions range from 160 to 229 bytes long.
 >> There were 929452 unused item pointers.
 >> Total free space (including removable row versions) is 149628172 bytes.
 >> 18749 pages are or will become empty, including 0 at the end of the
table.
 >> 0 pages containing 0 free bytes are potential move destinations.
 >> CPU 0.01s/0.07u sec elapsed 0.17 sec.
 >> INFO:  index "pg_class_oid_index" now contains 1812 row versions in
7 pages
 >> DETAIL:  0 index row versions were removed.
 >> 0 index pages have been deleted, 0 are currently reusable.
 >> CPU 0.00s/0.01u sec elapsed 0.09 sec.
 >> WARNING:  index "pg_class_oid_index" contains 1812 row versions, but
table contains 1807 row versions
 >> HINT:  Rebuild the index with REINDEX.
 >> INFO:  index "pg_class_relname_nsp_index" now contains 1812 row
versions in 23 pages
 >> DETAIL:  0 index row versions were removed.
 >> 0 index pages have been deleted, 0 are currently reusable.
 >> CPU 0.00s/0.01u sec elapsed 0.09 sec.
 >> WARNING:  index "pg_class_relname_nsp_index" contains 1812 row
versions, but table contains 1807 row versions
 >> HINT:  Rebuild the index with REINDEX.


Just  noticed the 7 dead rows above, are they likely to be causing a
problem?
BTW this is postgresql 8.3.5 if that makes a difference...

I'm guessing it is fairly important to get rid of these warnings....

Thanks


Re: database corruption help

От
Tom Lane
Дата:
John Lister <john.lister-ps@kickstone.com> writes:
> still getting the same problem.

>>> PGOPTIONS="-P"
>>> psql backend

I think you need export PGOPTIONS="-P" to make that work.  Whether
it's related to your problem isn't clear though.

            regards, tom lane

Re: database corruption help

От
John Lister
Дата:
Sorry, had exported it - bad cut/pasting...
I even tried it in single-user mode passing -P directly but got the same
result
Also confused/concerned by the 7 dead rows in pg_class. as i've
restarted the server all transaction should have finished so i'd like to
reclaim these - especially as vacuum reports the table is using 80Mb+

Thanks


>> still getting the same problem.
>>
>
>
>>>> PGOPTIONS="-P"
>>>> psql backend
>>>>
>
> I think you need export PGOPTIONS="-P" to make that work.  Whether
> it's related to your problem isn't clear though.
>
>             regards, tom lane
>

Re: database corruption help

От
John Lister
Дата:
I seem to have more dead rows now..

doing a vacuum full on pg_class gives me

INFO:  vacuuming "pg_catalog.pg_class"INFO:  "pg_class": found 37
removable, 1845 nonremovable row versions in 18905 pages
DETAIL:  27 dead row versions cannot be removed yet.
Nonremovable row versions range from 160 to 229 bytes long.
There were 933834 unused item pointers.
Total free space (including removable row versions) is 150368692 bytes.
18839 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.01s/0.05u sec elapsed 0.17 sec.INFO:  index "pg_class_oid_index"
now contains 1813 row versions in 7 pages
DETAIL:  56 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index
"pg_class_relname_nsp_index" now contains 1818 row versions in 24 pages
DETAIL:  63 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.WARNING:  index
"pg_class_relname_nsp_index" contains 1818 row versions, but table
contains 1813 row versions
HINT:  Rebuild the index with REINDEX.INFO:  analyzing
"pg_catalog.pg_class"INFO:  "pg_class": scanned 3000 of 18905 pages,
containing 227 live rows and 6 dead rows; 227 rows in sample, 1430
estimated total rows
Total query runtime: 4469 ms.


I've restarted the server which should have got rid of any
outstanding/long running transactions  - which it though was the major
cause of lingering dead rows.

Any ideas how to fix this as i'd like to reduce this from 150+Mb which
can't be good for performance....

Thanks



Re: database corruption help

От
Tom Lane
Дата:
John Lister <john.lister-ps@kickstone.com> writes:
> I seem to have more dead rows now..
> doing a vacuum full on pg_class gives me

> INFO:  vacuuming "pg_catalog.pg_class"INFO:  "pg_class": found 37
> removable, 1845 nonremovable row versions in 18905 pages
> DETAIL:  27 dead row versions cannot be removed yet.
> Nonremovable row versions range from 160 to 229 bytes long.
> There were 933834 unused item pointers.
> Total free space (including removable row versions) is 150368692 bytes.
> 18839 pages are or will become empty, including 0 at the end of the table.
> 0 pages containing 0 free bytes are potential move destinations.

Hmm, that last seems to indicate that do_shrinking has been reset.
It looks like the cases where that happens are reported with elog(LOG)
which means they probably only went to the postmaster log (maybe we
should fix things so that vacuum verbose reports those).  What have
you got in the postmaster log?

            regards, tom lane

Re: database corruption help

От
John Lister
Дата:
I'm running ubuntu and can see references to only one log file which i'm
assuming is the postmaster log..

Anyway the only relevant bits are:

GMT LOG:  relation "pg_class" TID 15538/4: dead HOT-updated tuple ---
cannot shrink relation
2009-02-12 21:06:40 GMT STATEMENT:  VACUUM FULL VERBOSE ANALYZE pg_class
2009-02-12 21:06:40 GMT WARNING:  index "pg_class_relname_nsp_index"
contains 1818 row versions, but table contains 1813 row versions

If that helps...

Thanks
> John Lister <john.lister-ps@kickstone.com> writes:
>
>> I seem to have more dead rows now..
>> doing a vacuum full on pg_class gives me
>>
>
>
>> INFO:  vacuuming "pg_catalog.pg_class"INFO:  "pg_class": found 37
>> removable, 1845 nonremovable row versions in 18905 pages
>> DETAIL:  27 dead row versions cannot be removed yet.
>> Nonremovable row versions range from 160 to 229 bytes long.
>> There were 933834 unused item pointers.
>> Total free space (including removable row versions) is 150368692 bytes.
>> 18839 pages are or will become empty, including 0 at the end of the table.
>> 0 pages containing 0 free bytes are potential move destinations.
>>
>
> Hmm, that last seems to indicate that do_shrinking has been reset.
> It looks like the cases where that happens are reported with elog(LOG)
> which means they probably only went to the postmaster log (maybe we
> should fix things so that vacuum verbose reports those).  What have
> you got in the postmaster log?
>
>             regards, tom lane
>