Re: Database corruption in 7.0.3

Поиск
Список
Период
Сортировка
От Denis Perchine
Тема Re: Database corruption in 7.0.3
Дата
Msg-id 200103150952.PAA27478@technoart.net
обсуждение исходный текст
Ответ на Database corruption in 7.0.3  (Tim Allen <tim@proximity.com.au>)
Список pgsql-hackers
Can confirm this. Get this just yesterday time ago...

Messages:

NOTICE:  Rel acm: TID 1697/217: OID IS INVALID. TUPGONE 1.

And lots of such lines...
And

pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 


In the end :-((( I lost a library of our institute... :-((( But I have a 
backup!!! :-)))) This table even have NO indices!!!

Program received signal SIGSEGV, Segmentation fault.
0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
311                             alignedSize = MAXALIGN((*lp).lp_len);
(gdb) bt
#0  0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
#1  0x80a9b07 in vc_scanheap (vacrelstats=0x82675b0, onerel=0x8273428, 
vacuum_pages=0xbfffe928, fraged_pages=0xbfffe918) at vacuum.c:1022
#2  0x80a8e8b in vc_vacone (relid=27296, analyze=0 '\000', va_cols=0x0) at 
vacuum.c:599
#3  0x80a8217 in vc_vacuum (VacRelP=0xbfffe9b4, analyze=0 '\000', 
va_cols=0x0) at vacuum.c:299
#4  0x80a818b in vacuum (vacrel=0x8267400 "", verbose=1 '\001', analyze=0 
'\000', va_spec=0x0) at vacuum.c:223
#5  0x813fba5 in ProcessUtility (parsetree=0x8267418, dest=Remote) at 
utility.c:694
#6  0x813c16e in pg_exec_query_dest (query_string=0x820aaa0 "vacuum verbose 
acm;", dest=Remote, aclOverride=0 '\000') at postgres.c:617
#7  0x813c08e in pg_exec_query (query_string=0x820aaa0 "vacuum verbose acm;") 
at postgres.c:562
#8  0x813d4c3 in PostgresMain (argc=9, argv=0xbffff068, real_argc=9, 
real_argv=0xbffffa3c) at postgres.c:1588
#9  0x811ace5 in DoBackend (port=0x8223068) at postmaster.c:2009
#10 0x811a639 in BackendStartup (port=0x8223068) at postmaster.c:1776
#11 0x811932f in ServerLoop () at postmaster.c:1037
#12 0x8118b0e in PostmasterMain (argc=9, argv=0xbffffa3c) at postmaster.c:725
#13 0x80d5e5e in main (argc=9, argv=0xbffffa3c) at main.c:93
#14 0x40111fee in __libc_start_main () from /lib/libc.so.6

This is plain 7.0.3.

On Thursday 15 March 2001 14:52, Tim Allen wrote:
> We have an application that we were running quite happily using pg6.5.3
> in various customer sites. Now we are about to roll out a new version of
> our application, and we are going to use pg7.0.3. However, in testing
> we've come across a couple of isolated incidents of database
> corruption. They are sufficiently rare that I can't reproduce the problem,
> nor can I put my finger on just what application behaviour causes the
> problems.
>
> The symptoms most often involve some sort of index corruption, which is
> reported by vacuum and it seems that vacuum can fix it. On occasion vacuum
> reports "invalid OID" or similar (sorry, don't have exact wording of
> message). On one occasion the database has been corrupted to the point of
> unusability (ie vacuum admitted that it couldn't fix the problem), and a
> dump/restore was required (thankfully that at least worked). The index
> corruption also occasionally manifests itself in the form of spurious
> uniqueness constraint violation errors.
>
> The previous version of our app using 6.5.3 has never shown the slightest
> symptom of database misbehaviour, to the best of my knowledge, despite
> fairly extensive use. So our expectations are fairly high :-).
>
> One thing that is different about the new version of our app is that we
> now use multiple connections to the database (previously we only had
> one). We can in practice have transactions in progress on several
> connections at once, and it is possible for some transactions to be rolled
> back under application control (ie explicit ROLLBACK; statement).
>
> I realise I haven't really provided an awful lot of information that would
> help identify the problem, so I shall attempt to be understanding if
> no-one can offer any useful suggestions. But I hope someone can :-). Has
> anyone seen this sort of problem before? Are there any known
> database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in
> the mailing lists. Is using multiple connections likely to stimulate any
> known areas of risk?
>
> BTW we are using plain vanilla SQL, no triggers, no new types defined, no
> functions, no referential integrity checks, nothing more ambitious than a
> multi-column primary key.
>
> The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our
> testing boxes and on my development box we have never seen this, but we
> have seen it several times on our other test box and at least one customer
> site, so there is some possibility it's related to dodgy hardware. The
> customer box with the problem is a multi-processor box, all the other
> boxes we've tested on are single-processor.
>
> TIA for any help,
>
> Tim

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Paul
Дата:
Сообщение: Sheduling in SQL
Следующее
От: Karel Zak
Дата:
Сообщение: Re: AW: Re: Week number