Обсуждение: Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
[postgresql lists added to Cc in hope of elucidation]
Adam Di Carlo wrote:
>
>[Background: PostgreSQL is causing extremely hard crashes on my Sun4u
>(Ultra5) Debian SPARC system. Anyone should be able to reproduce this
>by installing the postgresql-test environment, and running:
>
> # cd /usr/lib/postgresql/test/regress
> # chown -R postgres .
> # su - postgres
> $ cd /usr/lib/postgresql/test/regress
> $ make runtest
>
>BEWARE -- this hard crashes my system. You may crash hard; you may
>lose data.
>
>Note: I am running a mostly up-to-date 2.2.9 kernel (stock image from
>potato) with the newest postgresql package (6.5.1-3 I believe).
>]
>
>>That is very nasty -- and unexpected; I would like to report whatever
>>information is available to pgsql-ports@postgresql.org. However, they
>>will need to know exactly what was going on - logfile output, if available,
>>progress through the test, test output file, if it survived. It doesn't
>>seem at all like the problem that I thought I was asking you to look at.
>>We should investigate whether there is some entirely separate cause.
>
>Yes. On followup, I am getting intermittant hard crashes when running
>regress.sh or doing any operation with postgresql. Obviously, this is
>more on the level of a sparc64 kernel problem, even, than a purely
>postgres problem -- after all, no user process should be able to take
>out the system this way.
I regret that I have no experience with kernel debugging.
>
>My most recent crash has this output to 'make runtest':
>
>path .. ok
>polygon .. ok
>circle .. ok
>geometry .. failed
>timespan ..
>
>And in the postgres.log, with debugging at 4:
>
>plan:
>
>{ SEQSCAN :cost 43 :size 334 :width 16 :state <> :qptargetlist
>({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 705 :restypmod -1
>:resname "one" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false }
>:expr { CONST :consttype 705 :constlen -1 :constisnull false
>:constvalue 4 [ 0 0 0 4 ] :constbyval false }} { TARGETENTRY
>:resdom { RESDOM :resno 2 :restype 600 :restypmod -1 :resname "f1"
>:reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR
>:varno 1 :varattno 1 :vartype 600 :vartypmod -1 :varlevelsup 0
>:varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16 :opType func
>:oper { FUNC :funcid 1532 :functype 16 :funcisindex false :funcsize 0
>:func_fcache @ 0x0 :func_tlist ({ TARGETENTRY :resdom { RESDOM :resno
>1 :restype 16 :restypmod -1 :resname "<noname>" :reskey 0 :reskeyop 0
>:resgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1
>:vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold -1 :varoattno 1}})
>:func_planlist <>} :args ({ VAR :varno 1 :varattno 1 :vartype 600
>:vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST
>:consttype 600 :constlen 16 :constisnull false :constvalue 16 [ 64
>20 102 102 102 102 102 102 64 65 64 0 0 0 0 0 ]
>:constbyval false })}) :lefttree <> :righttree <> :extprm () :locprm
>() :initplan <> :nprm 0 :scanrelid 1 }
>
>ProcessQuery
>CommitTransactionCommand
>StartTransactionCommand
>query: SELECT '' AS one, p1.f1
> FROM POINT_TBL p1
> WHERE p1.f1 ?| '(5.1,34.5)'::point;
>parser outputs:
>
>{ QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal
>false :isBinary false :isTemp false :unionall false :unique <>
>:sortClause <> :rtable ({ RTE :relname point_tbl :refname p1 :relid
>20864 :inh false :inFromCl true :skipAcl false}) :targetlist
>({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 705 :restypmod -1
>:resname "one" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false }
>:expr { CONST :consttype 705 :constlen -1 :constisnull false
>:constvalue 4 [ 0 0 0 4 ] :constbyval false }} { TARGETENTRY
>:resdom { RESDOM :resno 2 :restype 600 :restypmod -1
>:resname "f1" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false }
>:expr { VAR :varno 1 :varattno 1 :vartype 600 :vartypmod -1
>:varlevelsup 0 :varnoold 1 :varoattno 1}}) :qual { EXPR :typeOid 16
>:opType op :oper { OPER :opno 809
>:opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype
>
>------
>
>Output just stops there, with a hard crash to the system.
not even a kernel oops output?
>
>--
>.....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>
Can postgresql developers tell from this what routine we are in when the
crash occurs? I suppose that log output is buffered; where can we turn
off buffering so that all possible output is saved to disk before the
crash?
--
Vote against SPAM: http://www.politik-digital.de/spam/
========================================
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"And why call ye me, Lord, Lord, and do not the things
which I say?" Luke 6:46
"Oliver Elphick" <olly@lfix.co.uk> writes:
>> Yes. On followup, I am getting intermittant hard crashes when running
>> regress.sh or doing any operation with postgresql. Obviously, this is
>> more on the level of a sparc64 kernel problem, even, than a purely
>> postgres problem -- after all, no user process should be able to take
>> out the system this way.
Yipes...
> Can postgresql developers tell from this what routine we are in when the
> crash occurs? I suppose that log output is buffered; where can we turn
> off buffering so that all possible output is saved to disk before the
> crash?
The log is not nearly detailed enough to tell what routine we're in,
even if there weren't the buffering problem. Also, given that this is
a kernel crash, I'm not sure I'd assume that even fsync() after every
line of output would ensure that the last line made it to disk.
What you really want is a truss or strace log of kernel calls, anyhow,
but there's still the problem of getting it out to disk before the
crash. Better find a kernel-debugging expert to ask for advice...
regards, tom lane
Re: [HACKERS] Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
От
Michael Alan Dorman
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes: > What you really want is a truss or strace log of kernel calls, anyhow, > but there's still the problem of getting it out to disk before the > crash. Better find a kernel-debugging expert to ask for advice... Serial terminal, or printer or some such hooked up to a serial port. Mike.
Re: [HACKERS] Re: Bug#41277: postgresql 6.5.1-3 + sparc (sun4u) == nasty nasty crashes
От
Adam Di Carlo
Дата:
>> Can postgresql developers tell from this what routine we are in when the >> crash occurs? I suppose that log output is buffered; where can we turn >> off buffering so that all possible output is saved to disk before the >> crash? > >The log is not nearly detailed enough to tell what routine we're in, >even if there weren't the buffering problem. Also, given that this is >a kernel crash, I'm not sure I'd assume that even fsync() after every >line of output would ensure that the last line made it to disk. > >What you really want is a truss or strace log of kernel calls, anyhow, >but there's still the problem of getting it out to disk before the >crash. Better find a kernel-debugging expert to ask for advice... Hopefully someone from the sparc or sparc64 team at Debian can look into this. I am going on business travel for 4 days so will be away from any Debian/SPARC machines for a while. These are the questions which need to be answered: * do other people running debian sparc finding the problem, using the recipe I mentioned in previous email? * Is it 2.2.9 specific? Sun4u specific? * get strace output as Tom suggests * shouldn't we notify the Sparc/Linux folks? -- .....Adam Di Carlo....adam@onShore.com.....<URL:http://www.onShore.com/>