Still crashing with latest 7.0.2 (Re: (forw) more crashes)

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Still crashing with latest 7.0.2 (Re: (forw) more crashes)
Дата
Msg-id 20001008034834.C272@fw.wintelcom.net
обсуждение исходный текст
Ответ на Re: (forw) more crashes  (Alfred Perlstein <bright@wintelcom.net>)
Ответы Re: Still crashing with latest 7.0.2 (Re: (forw) more crashes)  (Alfred Perlstein <bright@wintelcom.net>)
Список pgsql-hackers
* Alfred Perlstein <bright@wintelcom.net> [001006 16:02] wrote:
> * Tom Lane <tgl@sss.pgh.pa.us> [001004 09:56] wrote:
> > Alfred Perlstein <bright@wintelcom.net> writes:
> > > I have a reliable way to make postgresql crash after a
> > > couple of hours over here and a backtrace that looks like a good
> > > catch.
> > 
> > I'm interested in pursuing this, but the backtrace doesn't give enough
> > info to debug it.  It looks like the backend is crashing because of
> > a previously-corrupted tuple, so what we'll need to do is work backwards
> > to find where the data corruption is occurring.
> > 
> > Can you boil down the test sequence to something that could be
> > reproduced by other people?  The most convenient way to work on it
> > would be to see it happen here...
> 
> I just wanted to note on the list that these crashes seem to have
> stopped with the latest 7.0.2-patches (as of 11:30ish PM EST Oct,
> 4th), it's been over 24 hours since the upgrade (previously I
> couldn't go for more than 20 without a crash).
> 
> My only concern is that I didn't notice anything on the cvs list
> that referenced a fix for crashes.
> 
> Well anyhow I'll post an update in a couple of days if all is well
> or not.

Unfortunatly I'm still getting crashes, this one looks like it's
during a vacuum, previously I got a crash while doing an UPDATE, but
in exactly the same spot, it took quite a bit longer to provoke this
time:

-rw-------  1 pgsql  pgsql   277561344 Oct  8 02:56 postgres.core


#0  0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3,   tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at
heaptuple.c:537
537                             off = att_addlength(off, att[i]->attlen, tp + off);
(gdb) bt
#0  0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3,    tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at
heaptuple.c:537
#1  0x8075851 in GetIndexValue (tuple=0xbfbfe974, hTupDesc=0x84ca368,    attOff=3, attrNums=0x8508240, fInfo=0x0,
attNull=0xbfbfe7fb"")   at indexam.c:445
 
#2  0x80903be in FormIndexDatum (numberOfAttributes=4,    attributeNumber=0x8508240, heapTuple=0xbfbfe974,
heapDescriptor=0x84ca368,   datum=0x8508018, nullv=0x84ba170 "    ", fInfo=0x0) at index.c:1256
 
#3  0x80a05e6 in vc_repair_frag (vacrelstats=0x84ba290, onerel=0x84c6788,    vacuum_pages=0xbfbfea1c,
fraged_pages=0xbfbfea0c,nindices=1,    Irel=0x84ba118) at vacuum.c:1634
 
#4  0x809e3b9 in vc_vacone (relid=1315147913, analyze=0, va_cols=0x0)   at vacuum.c:640
#5  0x809d9ac in vc_vacuum (VacRelP=0xbfbfeaac, analyze=0 '\000', va_cols=0x0)   at vacuum.c:299
#6  0x809d934 in vacuum (vacrel=0x84ba0e8 "\030", verbose=1, analyze=0 '\000',    va_spec=0x0) at vacuum.c:223
#7  0x810ca8c in ProcessUtility (parsetree=0x84ba110, dest=Remote)   at utility.c:694
#8  0x810a44e in pg_exec_query_dest (   query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;",
dest=Remote,aclOverride=0) at postgres.c:617
 
#9  0x810a3a9 in pg_exec_query (   query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;")   at
postgres.c:562
#10 0x810b336 in PostgresMain (argc=7, argv=0xbfbff12c, real_argc=10,    real_argv=0xbfbffb8c) at postgres.c:1588
#11 0x80f0742 in DoBackend (port=0x8464000) at postmaster.c:2009
#12 0x80f02d5 in BackendStartup (port=0x8464000) at postmaster.c:1776
#13 0x80ef4f9 in ServerLoop () at postmaster.c:1037
#14 0x80eeede in PostmasterMain (argc=10, argv=0xbfbffb8c) at postmaster.c:725
#15 0x80bf3eb in main (argc=10, argv=0xbfbffb8c) at main.c:93
#16 0x8063495 in _start ()
st
532     
533                                     if (usecache)
534                                             att[i]->attcacheoff = off;
535                             }
536     
537                             off = att_addlength(off, att[i]->attlen, tp + off);
538     
539                             if (usecache &&
540                                     att[i]->attlen == -1 && !VARLENA_FIXED_SIZE(att[i]))
541                                     usecache = false;

it looks like it's dieing in the same place as the previous coredumps
however this looks like it's during a vacuum rather than an update:

(gdb) print off
$1 = -838833616
(gdb) print att[i]
$2 = 0x84ca640
(gdb) print *(att[i])
$3 = {attrelid = 1315147913, attname = {   data = "attr_name", '\000' <repeats 22 times>,    alignmentDummy =
1920234593},atttypid = 1043, attdisbursion = 0,  attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod =
36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',  attalign = 105 'i', attnotnull = 0 '\000',
atthasdef= 0 '\000'}
 
(gdb) print i            
$4 = 2
(gdb) print tp
$5 = 0x5808eba5 "Yj"
(gdb) print tp+off       
$6 = 0x260955d5 <Address 0x260955d5 out of bounds>

ack!

(gdb) print usecache
$7 = 0 '\000'
(gdb) print attnum
$8 = 3
(gdb) print slow
$9 = 139159376
(gdb) print *slow
$10 = 139241024
(gdb) print (char *) tup + tup->t_hoff
$11 = 0x5808eba5 "Yj"
(gdb) print tup
$12 = 0x5808eba0
(gdb) print *tup
$13 = {t_oid = 0, t_cmin = 6969654, t_cmax = 6958161, t_xmin = 1742,  t_xmax = 6955895, t_ctid = {ip_blkid = {bi_hi =
0,bi_lo = 639},    ip_posid = 84}, t_natts = 737, t_infomask = 32846, t_hoff = 5 '\005',  t_bits = "\000\002¥ "}
 
(gdb) print *tupleDesc 
$14 = {natts = 1358981721, attrs = 0xce006a2c, constr = 0x77000006}
(gdb) print *(att[0])
$15 = {attrelid = 1315147913, attname = {   data = "counter_id", '\000' <repeats 21 times>,    alignmentDummy =
1853189987},atttypid = 23, attdisbursion = 0,  attlen = 4, attnum = 1, attnelems = 0, attcacheoff = 0, atttypmod = -1,
attbyval= 1 '\001', attstorage = 112 'p', attisset = 0 '\000',  attalign = 105 'i', attnotnull = 0 '\000', atthasdef =
0'\000'}
 
(gdb) print *(att[1])
$16 = {attrelid = 1315147913, attname = {   data = "attr_type", '\000' <repeats 22 times>,    alignmentDummy =
1920234593},atttypid = 1043, attdisbursion = 0,  attlen = -1, attnum = 2, attnelems = 0, attcacheoff = 4, atttypmod =
36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',  attalign = 105 'i', attnotnull = 0 '\000',
atthasdef= 0 '\000'}
 
(gdb) print *(att[2])
$17 = {attrelid = 1315147913, attname = {   data = "attr_name", '\000' <repeats 22 times>,    alignmentDummy =
1920234593},atttypid = 1043, attdisbursion = 0,  attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod =
36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',  attalign = 105 'i', attnotnull = 0 '\000',
atthasdef= 0 '\000'}
 
(gdb) print *(att[3])
$18 = {attrelid = 1315147913, attname = {   data = "attr_vers", '\000' <repeats 22 times>,    alignmentDummy =
1920234593},atttypid = 1043, attdisbursion = 0,  attlen = -1, attnum = 4, attnelems = 0, attcacheoff = -1, atttypmod =
36, attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',  attalign = 105 'i', attnotnull = 0 '\000',
atthasdef= 0 '\000'}
 
(gdb) print *(att[4])
$19 = {attrelid = 1315147913, attname = {   data = "attr_hits", '\000' <repeats 22 times>,    alignmentDummy =
1920234593},atttypid = 20, attdisbursion = 0,  attlen = 8, attnum = 5, attnelems = 0, attcacheoff = -1, atttypmod = -1,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',  attalign = 100 'd', attnotnull = 0 '\000', atthasdef =
1'\001'}
 
(gdb) print *tuple
$20 = {t_len = 80, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 640},    ip_posid = 5}, t_datamcxt = 0x0, t_data =
0x5808eba0}



thanks,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: -S is missing in postgresql.conf?
Следующее
От: Chris
Дата:
Сообщение: Re: inheritance/oid questions