Обсуждение: 7.1 on DEC/Alpha
Hi, I saw the thread from a few days ago about Linux/Alpha and 7.1. I believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D). I noticed the following in the postmaster.log, which occurs, as the Linux/Alpha bug report states, during the misc regression test. DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often Server process(pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000 Terminating any active server processes... Server processeswere terminated at Fri Dec 22 17:15:48 2000 Reinitializing shared memory and semaphores DEBUG: starting up DEBUG: database system was interrupted at 2000-12-22 17:15:47 DEBUG: CheckPoint record at (0, 316624) DEBUG: Redo recordat (0, 316624); Undo record at (0, 0); Shutdown TRUE the full src/test/regress/log/postmaster.log can be snagged from http://www.rcfile.org/postmaster.log in addition to this, compiling on DEC/Alpha with gcc does not work, without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is a builtin that gcc does not know about. The DEC cc builds pg properly. either way pg is built the test results are much the same, esp the FAILURE of misc regression test. If there is anything else I can do to help get this working, please let me know. Brent Verner
On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote: observation: commenting out the queries with 'FROM person* p' causes the misc regression test to pass. SELECT p.name, p.hobbies.name FROM person* p; Brent | Hi, | I saw the thread from a few days ago about Linux/Alpha and 7.1. I | believe I'm seeing the same problems with DEC/Alpha (Tru64Unix 4.0D). | | I noticed the following in the postmaster.log, which occurs, as the | Linux/Alpha bug report states, during the misc regression test. | | DEBUG: copy: line 293, XLogWrite: had to create new log file - you probably should do checkpoints more often | Server process (pid 24954) exited with status 139 at Fri Dec 22 17:15:48 2000 | Terminating any active server processes... | Server processes were terminated at Fri Dec 22 17:15:48 2000 | Reinitializing shared memory and semaphores | DEBUG: starting up | DEBUG: database system was interrupted at 2000-12-22 17:15:47 | DEBUG: CheckPoint record at (0, 316624) | DEBUG: Redo record at (0, 316624); Undo record at (0, 0); Shutdown TRUE | | the full src/test/regress/log/postmaster.log can be snagged from | http://www.rcfile.org/postmaster.log | | in addition to this, compiling on DEC/Alpha with gcc does not work, | without some shameful hackery :) as __INTERLOCKED_TESTBITSS_QUAD() is | a builtin that gcc does not know about. The DEC cc builds pg properly. | either way pg is built the test results are much the same, esp the | FAILURE of misc regression test. | | If there is anything else I can do to help get this working, please | let me know. | | Brent Verner
On 22 Dec 2000 at 21:58 (-0500), Brent Verner wrote: | On 22 Dec 2000 at 20:27 (-0500), Brent Verner wrote: | | observation: | | commenting out the queries with 'FROM person* p' causes the misc | regression test to pass. that's not what I meant to say. the misc test still FAILS, but it no longer causes pg to die. b
here's a post-mortem. #0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0, econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0)at execQual.c:1096 #1 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234 #2 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x14016aa70, argList=0x14016a030, econtext=0x14016a030) at execQual.c:603 #3 0x1200cde54 in ExecMakeFunctionResult (fcache=0x14016aa70, arguments=0x1401616d0, econtext=0x14016a030, isNull=0x11fffdf88"", isDone=0x0) at execQual.c:654 #4 0x1200ce224 in ExecEvalOper (opClause=0x1401615f0, econtext=0x14016a030, isNull=0x11fffdf88 "", isDone=0x0) at execQual.c:841 #5 0x1200cea24 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1204 #6 0x1200cec54 in ExecQual (qual=0x14016a1a0, econtext=0x14016a030) at execQual.c:1356 #7 0x1200cf2a8 in ExecScan (node=0x14016a1d0, accessMtd=0x1200d8320 <SeqNext>) at execScan.c:129 #8 0x1200d846c in ExecSeqScan (node=0x1401615f0) at nodeSeqscan.c:138 #9 0x1200cc280 in ExecProcNode (node=0x14016a1d0, parent=0x14016a1d0) at execProcnode.c:284 #10 0x1200ca8c0 in ExecutePlan (estate=0x14016a310, plan=0x14016a1d0, numberTuples=1, direction=ForwardScanDirection,destfunc=0x140020c20) at execMain.c:959 #11 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14016a310, count=0) at execMain.c:199 #12 0x1200d1140 in postquel_getnext (es=0x140160630) at functions.c:324 #13 0x1200d1300 in postquel_execute (es=0x140160630, fcinfo=0x1401604a0, fcache=0x140160590) at functions.c:417 #14 0x1200d14d8 in fmgr_sql (fcinfo=0x1401604a0) at functions.c:542 #15 0x1200ce09c in ExecMakeFunctionResult (fcache=0x140160480, arguments=0x14015e810, econtext=0x140119cd0, isNull=0x140160350"", isDone=0x11fffe258) at execQual.c:712 #16 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0, isNull=0x140160350 "", isDone=0x11fffe258)at execQual.c:883 #17 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208 #18 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030, isNull=0x1 <Error reading address 0x1: Invalidargument>, isDone=0x0) at execFlatten.c:56 #19 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183 #20 0x1200cdd74 in ExecEvalFuncArgs (fcache=0x140160290, argList=0x14016a030, econtext=0x140119cd0) at execQual.c:603 #21 0x1200cde54 in ExecMakeFunctionResult (fcache=0x140160290, arguments=0x14015e840, econtext=0x140119cd0, isNull=0x11fffe3a0"", isDone=0x11fffe468) at execQual.c:654 #22 0x1200ce2c4 in ExecEvalFunc (funcClause=0x1401615f0, econtext=0x140119cd0, isNull=0x11fffe3a0 "", isDone=0x11fffe468)at execQual.c:883 #23 0x1200cea3c in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1208 #24 0x1200ce574 in ExecEvalFieldSelect (fselect=0x14015e720, econtext=0x14016a030, isNull=0x11fffe3a0 "", isDone=0x0)at execQual.c:1091 #25 0x1200ceafc in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1234 #26 0x1200c8e10 in ExecEvalIter (iterNode=0x1401615f0, econtext=0x14016a030, isNull=0x1 <Error reading address 0x1: Invalidargument>, isDone=0x0) at execFlatten.c:56 #27 0x1200ce9b0 in ExecEvalExpr (expression=0x1401615f0, econtext=0x0, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1183 #28 0x1200ceea4 in ExecTargetList (targetlist=0x14015e870, targettype=0x140160000, values=0x140160260, econtext=0x140119cd0, isDone=0x11fffe5a8) at execQual.c:1528 #29 0x1200cf1a8 in ExecProject (projInfo=0x0, isDone=0x1) at execQual.c:1751 #30 0x1200d8074 in ExecResult (node=0x14015e5b0) at nodeResult.c:167 #31 0x1200cc238 in ExecProcNode (node=0x14015e5b0, parent=0x14015e5b0) at execProcnode.c:272 #32 0x1200ca8c0 in ExecutePlan (estate=0x14015eab0, plan=0x14015e5b0, numberTuples=0, direction=ForwardScanDirection,destfunc=0x1401603a0) at execMain.c:959 #33 0x1200c9b50 in ExecutorRun (queryDesc=0x1401615f0, estate=0x14015eab0, count=0) at execMain.c:199 #34 0x12013e5c0 in ProcessQuery (parsetree=0x14015ea80, plan=0x140160000) at pquery.c:305 #35 0x12013c568 in pg_exec_query_string ( query_string=0x140115310 "SELECT p.hobbies.equipment.name, p.hobbies.name, p.nameFROM person* p;", parse_context=0x1400c5c60) at postgres.c:817 #36 0x12013dd10 in PostgresMain (argv=0x11fffe9a8, real_argv=0x11ffffae8, username=0x1400b72f9 "pgadmin") at postgres.c:1827 #37 0x12011aef0 in DoBackend (port=0x1400b7080) at postmaster.c:2021 #38 0x12011a888 in BackendStartup (port=0x1400b7080) at postmaster.c:1798 #39 0x12011938c in ServerLoop () at postmaster.c:957 #40 0x120118c10 in PostmasterMain (argv=0x11ffffae8) at postmaster.c:664 #41 0x1200e5980 in main (argv=0x11ffffae8) at main.c:138
Brent Verner <brent@rcfile.org> writes: > here's a post-mortem. > #0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0, > econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096 Looks reasonable as far as it goes. Evidently the crash is in the heap_getattr macro call at line 1096 of src/backend/executor/execQual.c. We need to look at the data structures that macro uses. What do you get from p *fselect p *econtext p *resSlot->val p *resSlot->ttc_tupleDescriptor BTW, if you didn't configure with --enable-cassert, it'd be a good idea to go back and try it that way... regards, tom lane
On 24 Dec 2000 at 01:00 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > here's a post-mortem. | | > #0 0x1200ce58c in ExecEvalFieldSelect (fselect=0x1401615c0, | > econtext=0x14016a030, isNull=0x14016ab31 "", isDone=0x0) at execQual.c:1096 | | Looks reasonable as far as it goes. Evidently the crash is in the | heap_getattr macro call at line 1096 of src/backend/executor/execQual.c. | We need to look at the data structures that macro uses. | What do you get from | | p *fselect $1 = {type = T_FieldSelect, arg = 0x140169d40, fieldnum = 1, resulttype = 25, resulttypmod = -1} | p *econtext $2 = {type = T_ExprContext, ecxt_scantuple = 0x14016a568, ecxt_innertuple = 0x0, ecxt_outertuple = 0x0, ecxt_per_query_memory= 0x1400c5df0, ecxt_per_tuple_memory = 0x1400c6670, ecxt_param_exec_vals = 0x0, ecxt_param_list_info= 0x140141760, ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0} | p *resSlot->val Error accessing memory address 0x40141838: Invalid argument. | p *resSlot->ttc_tupleDescriptor Error accessing memory address 0x40141848: Invalid argument. additionally: (gdb) p result $4 = 1075058736 (gdb) p *resSlot Error accessing memory address 0x40141830: Invalid argument. | BTW, if you didn't configure with --enable-cassert, it'd be a good idea | to go back and try it that way... will reconfig/rebuild shortly. brent
Brent Verner <brent@rcfile.org> writes: > (gdb) p *resSlot > Error accessing memory address 0x40141830: Invalid argument. Oooh. resSlot has been truncated to 32 bits --- judging by the other nearby pointer values, it almost certainly should have been 0x140141830. Now we have a lead. I am guessing that the truncation happened somewhere in executor/functions.c, but don't see it right away... regards, tom lane
On 24 Dec 2000 at 00:47 (-0500), Tom Lane wrote: | | > I'll send the patch that allows me to | > cleanly build with gcc. right now, s_lock.h does the wrong thing | > when compiling on Alpha/OSF with gcc. | | Roger, we want to build with either. The attached patch _seems_ to do the right thing. could someone who knows Alpha assembly check it out (please). for more info on Alpha assembly, this link may help. http://tru64unix.compaq.com/faqs/publications/base_doc/DOCUMENTATION/V40D_HTML/APS31DTE/TITLE.HTM brent 'who learned too much today'
Вложения
On 24 Dec 2000 at 01:19 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > (gdb) p *resSlot | > Error accessing memory address 0x40141830: Invalid argument. | | Oooh. resSlot has been truncated to 32 bits --- judging by the other | nearby pointer values, it almost certainly should have been 0x140141830. | Now we have a lead. FWIW, saying 'set econtext->ecxt_param_list_info->value 0x14014183' in geb allows the process to not SEGV where it _was_ destined to do so, though it does SEGV in a later return to the function. I've tried to determine where this value is originating, and where it is subsequently modified, but have not been able to do so. lost in gdb. Q: I tried doing 'watch <address>', but this (appeared) to just hang. is there some trick to using 'watch' on addresses thatI might be overlooking? | I am guessing that the truncation happened somewhere in | executor/functions.c, but don't see it right away... more observations WRT sql that blows up postgres on Alpha. works: SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM ONLY person p; breaks: SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM person p; SELECT p.hobbies.equipment.name, p.hobbies.name,p.name FROM person* p; whatever it is that ONLY causes, avoids the breakage. I've spent the past two days in a gdb-hole, going in circles. I just think don't know enough (about gdb or postgres) to make any further progress. anyway, if someone could tell me what difference the ONLY keyword makes WRT pg internally, it might help me quit running in circles. thanks. brent
Brent Verner <brent@rcfile.org> writes: > more observations WRT sql that blows up postgres on Alpha. > works: > SELECT p.hobbies.equipment.name, p.hobbies.name, p.name > FROM ONLY person p; > breaks: > SELECT p.hobbies.equipment.name, p.hobbies.name, p.name > FROM person p; > SELECT p.hobbies.equipment.name, p.hobbies.name, p.name > FROM person* p; OK, I see the problem. The breakage actually is present in 7.0.* and prior versions as well, it just doesn't happen to be exposed by the regress tests --- until now. The trouble is the way that entire-tuple function arguments are handled. Tuple types are declared in pg_type as being the same size as Oid, ie, 4 bytes. This reflects situations where a tuple value is represented by an Oid reference to a row in a table. (I am not sure whether there is any code left that depends on that ... in any case I'm nervous about changing it during beta.) But the expression evaluator's implementation of a tuple argument is that the Datum value contains a pointer to a TupleTableSlot. This works fine as long as the Datum is just passed around as a Datum, but if anyone tries to form a tuple containing that Datum, only 4 bytes get stored into the tuple. Result: failure on machines where pointers are wider than 4 bytes. The reason this shows up in this particular regression test now, and not before, is that 7.1 does the function evaluations at the top of the Append plan that implements inheritance union, whereas 7.0 did it at the bottom. That means that in 7.1, the TupleTableSlot Datum gets inserted into a tuple that becomes part of the Append output before it gets to the function execution. 7.0 would still show the bug under the right circumstances --- a join would do it, for example. I think that there may still be cases where an Oid is the correct representation of a tuple type; anyway I'm afraid to foreclose that possibility. What I'm thinking about doing is setting typmod of an entire-tuple function argument to sizeof(Pointer), rather than the default -1, to indicate that a pointer representation is being used. Comments, hackers? regards, tom lane
On 26 Dec 2000 at 14:41 (-0500), Tom Lane wrote: | I wrote: | > ... What I'm thinking about doing is setting typmod of | > an entire-tuple function argument to sizeof(Pointer), rather than | > the default -1, to indicate that a pointer representation is being | > used. Comments, hackers? | | Here is a patch to current sources along this line. I have not | committed it, since I'm not sure it does the job. It doesn't break | the regress tests on my machine, but does it fix them on Alphas? | Please apply it locally and let me know what you find. what I'm seeing now is much the same. FWIW, it looks like we're picking up the cruft around functions.c:354 paramLI->value = fcinfo->arg[paramLI->id - 1]; (both of which are type Datum) i've been in circles trying to figure out where fcinfo->arg is filled. can you point me toward that? thanks for your help. brent
Brent Verner <brent@rcfile.org> writes: > | Please apply it locally and let me know what you find. > what I'm seeing now is much the same. Drat. More to do, then. > i've been in circles trying to figure out where fcinfo->arg is filled. > can you point me toward that? See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c. But fmgr is probably only the carrier of disease, not the source... regards, tom lane
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > | Please apply it locally and let me know what you find. | | > what I'm seeing now is much the same. sorry, I sent the previous email w/o the details of the different behavior. Inside ExecEvalFieldSelect(), result is now 303, instead of 110599844 (...or whatever is was). I'm not sure if this gives you any additional clues. thanks. brent
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > | Please apply it locally and let me know what you find. | | > what I'm seeing now is much the same. | | Drat. More to do, then. | | > i've been in circles trying to figure out where fcinfo->arg is filled. | > can you point me toward that? | | See src/backend/utils/fmgr/README and src/backend/utils/fmgr/fmgr.c. | But fmgr is probably only the carrier of disease, not the source... ok, I've tracked this further (in the right direction I hope:). these are the steps leading up the the assignment of the fscked fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually blow up ExecEvalFieldSelect. Breakpoint 4, ExecMakeFunctionResult (fcache=0x14014e700, arguments=0x14014c850, econtext=0x140127ae0, isNull=0x14014e390"", isDone=0x11fffde78) at execQual.c:652 652 if (fcache->fcinfo.nargs > 0 && !fcache->argsValid) (gdb) print fcache->fcinfo $56 = {flinfo = 0x14014e700, context = 0x0, resultinfo = 0x14014e7d0, isnull = 0 '\000', nargs = 1, arg = {0 <repeats 16times>}, argnull = '\000' <repeats 15 times>} (gdb) cont Breakpoint 6, ExecEvalVar (variable=0x14014c820, econtext=0x140127ae0, isNull=0x14014e7c0 "") at execQual.c:298 298 switch (variable->varno) (gdb) print *variable $57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220, vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno= 0} (gdb) print *econtext $58 = {type = T_ExprContext, ecxt_scantuple = 0x14014cc58, ecxt_innertuple = 0x0, ecxt_outertuple = 0x14014cc58, ecxt_per_query_memory= 0x1400e6370, ecxt_per_tuple_memory = 0x1400e66a0, ecxt_param_exec_vals = 0x0, ecxt_param_list_info= 0x0, ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0} (gdb) break 313 (gdb) cont (gdb) print *slot $60 = {type = T_TupleTableSlot, val = 0x14014e430, ttc_shouldFree = 0 '\000', ttc_descIsNew = 1 '\001', ttc_tupleDescriptor= 0x14014ded0, ttc_buffer = 0} (gdb) break 353 (gdb) cont (gdb) print *heapTuple $73 = {t_len = 48, t_self = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, t_tableOid = 0, t_datamcxt = 0x1400e6370, t_data = 0x14014e450} (gdb) print attnum $74 = 1 (gdb) print *tuple_type $75 = {natts = 2, attrs = 0x14014df00, constr = 0x0} (gdb) print isNull $76 = (bool *) 0x14014e7c0 "" (gdb) break 359 (gdb) cont # after heap_getattr, we have the smashed value. (gdb) print result $79 = 303 is this nearing the problem, or still simply witnessing symptoms? brent 'delirious from sleep dep.'
On 26 Dec 2000 at 23:41 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > | Please apply it locally and let me know what you find. | | > what I'm seeing now is much the same. | | Drat. More to do, then. after hours in the gdb-hole, I see this... maybe a clue? :) src/include/access/common/heaptuple.c: 450 { 451 452 /* 453 * Fix me when going to a machine with more than a four-byte 454 * word! 455 */ 456 off = att_align(off, att[j]->attlen, att[j]->attalign); 457 458 att[j]->attcacheoff = off; 459 460 off = att_addlength(off, att[j]->attlen, tp + off); 461 } I'm pretty sure I don't know best how to fix this, but I've got some randomly entered code compiling now :) If it passes the regression tests I'll send it along. brent 'glad the coffee shop in the backyard is open now :)'
Brent Verner <brent@rcfile.org> writes: > after hours in the gdb-hole, I see this... maybe a clue? :) I don't think that comment means anything. Possibly it's a leftover from a time when there was something unportable there. But if att_align were broken on Alphas, you'd have a lot worse problems than what you're seeing. regards, tom lane
Brent Verner <brent@rcfile.org> writes: > these are the steps leading up the the assignment of the fscked > fcache->fcinfo.arg[i] at execQual.c:603, which is what will eventually > blow up ExecEvalFieldSelect. That looks OK as far as it goes. Inside ExecEvalVar, you need to look at the tuple_type data structure in more detail, specificallyp *tuple_type->attrs[0]p *tuple_type->attrs[1] (I think the leading * is correct here, try omitting it if gdb gets unhappy.) > (gdb) print *variable > $57 = {type = T_Var, varno = 65001, varattno = 1, vartype = 21220, > vartypmod = 8, varlevelsup = 0, varnoold = 1, varoattno = 0} That part looks promising --- vartypmod is sizeof(Pointer) not -1, so the front-end part of my patch seems to be working. What I suspect we'll find is that the tupledesc doesn't show sizeof the first field to be 8 the way we want. Which would imply that I missed a place (or multiple places :-() that needs to know about the convention for typmod of a tuple datatype. regards, tom lane
Brent Verner <brent@rcfile.org> writes: > | Hm. I thought I'd fixed that. Are you up to date on > | src/backend/utils/adt/oid.c ? Current CVS has rev 1.42. > yup. got that version -- 1.42 2000/12/22 21:36:09 tgl You're right, it was still broken :-(. I think I've got it now, though. Oliver Elphick was kind enough to arrange access to an Alpha running Debian Linux, and I find that current-as-of-this-moment sources pass all regression tests in either serial or parallel test mode on that system. Curiously, however, the system fails when you try to shut it down: Smart Shutdown request at Thu Dec 28 02:41:49 2000 DEBUG: shutting down FATAL 2: Checkpoint lock is busy while data base is shutting down Shutdown failed - abort I have no idea why this should be. Evidently there's something wrong with the TAS() macro --- yet it seems to work fine elsewhere. Ideas anyone? regards, tom lane
On 27 Dec 2000 at 21:45 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > | Hm. I thought I'd fixed that. Are you up to date on | > | src/backend/utils/adt/oid.c ? Current CVS has rev 1.42. | | > yup. got that version -- 1.42 2000/12/22 21:36:09 tgl | | You're right, it was still broken :-(. I think I've got it now, though. i'll check it tomorrow. | Oliver Elphick was kind enough to arrange access to an Alpha running | Debian Linux, and I find that current-as-of-this-moment sources pass | all regression tests in either serial or parallel test mode on that | system. Curiously, however, the system fails when you try to shut | it down: good. I'm glad you guys linked up :) | Smart Shutdown request at Thu Dec 28 02:41:49 2000 | DEBUG: shutting down | FATAL 2: Checkpoint lock is busy while data base is shutting down | Shutdown failed - abort I'm not seeing this with my latest revision of the TAS() asm. Smart Shutdown request at Wed Dec 27 19:25:45 2000 DEBUG: shutting down DEBUG: MoveOfflineLogs: remove 0000000000000000 DEBUG: database system is shut down | I have no idea why this should be. Evidently there's something wrong | with the TAS() macro --- yet it seems to work fine elsewhere. Ideas | anyone? re-evaluating the asm stuff now. thanks. brent
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
От
"Oliver Elphick"
Дата:
Tom Lane wrote: ... >system. Curiously, however, the system fails when you try to shut >it down: > >Smart Shutdown request at Thu Dec 2802:41:49 2000 >DEBUG: shutting down >FATAL 2: Checkpoint lock is busy while data base is shutting down >Shutdown failed- abort > >I have no idea why this should be. Evidently there's something wrong >with the TAS() macro --- yet it seemsto work fine elsewhere. Ideas >anyone? It's not just on Alpha; I've seen that on my i386 Linux system. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "For God shall bring every work into judgment, with every secret thing, whetherit be good, or whether it be evil." Ecclesiastes 12:14
"Oliver Elphick" <olly@lfix.co.uk> writes: >> Smart Shutdown request at Thu Dec 28 02:41:49 2000 >> DEBUG: shutting down >> FATAL 2: Checkpoint lock is busy while data base is shutting down >> Shutdown failed - abort > It's not just on Alpha; I've seen that on my i386 Linux system. Oooh, that's interesting. I was just blindly assuming that it was a problem with the Alpha spinlock code (we've sure heard plenty of discussion of same). But maybe there's an actual logic bug in the checkpoint code. I don't see one in a quick scan though. FWIW, I do *not* see this behavior on HPUX. It seems perfectly reproducible on the Debian Alpha box. Is it reproducible on your i386 box, or only sometimes? Vadim, any ideas? regards, tom lane
Re: [PATCHES] Re: Re: Tuple-valued datums on Alpha (was Re: 7.1 on DEC/Alpha)
От
"Oliver Elphick"
Дата:
Tom Lane wrote: >"Oliver Elphick" <olly@lfix.co.uk> writes: >>> FATAL 2: Checkpoint lock is busy while data base is shuttingdown >> It's not just on Alpha; I've seen that on my i386 Linux system. >FWIW, I do *not* see this behavior on HPUX. It seems perfectly >reproducible on the Debian Alpha box. Is it reproducibleon your >i386 box, or only sometimes? Hmm. I'm just waking up a bit more. Now I'm thinking slightly more clearly, I saw the problem yesterday when I was doing an Alpha build on faure.debian.org; so I think it was actually on Alpha, not i386 after all. Sorry for the red herring. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47 GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "For God shall bring every work into judgment, with every secret thing, whetherit be good, or whether it be evil." Ecclesiastes 12:14