Re: Quite strange crash

Поиск
Список
Период
Сортировка
От Denis Perchine
Тема Re: Quite strange crash
Дата
Msg-id 0101091213290B.00613@dyp.perchine.com
обсуждение исходный текст
Ответ на Re: Quite strange crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Quite strange crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Monday 08 January 2001 23:21, Tom Lane wrote:
> Denis Perchine <dyp@perchine.com> writes:
> >>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >>>>>
> >>>>> Were there any errors before that?
> >
> > Actually you can have a look on the logs yourself.
>
> Well, I found a smoking gun:
>
> Jan  7 04:27:51 mx postgres[2501]: FATAL 1:  The system is shutting down
>
> PID 2501 had been running:
>
> Jan  7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do 
something for enormous amount of time (I have mailed a sample database to 
Vadim, but did not get any response yet). It is possible, that it was me, who 
killed the backend.

> What seems to have happened is that 2501 curled up and died, leaving
> one or more buffer spinlocks locked.  Roughly one spinlock timeout
> later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
> So that fits.
>
> The real question is what happened to 2501?  None of the other backends
> reported a SIGTERM signal, so the signal did not come from the
> postmaster.
>
> Another interesting datapoint: there is a second place in this logfile
> where one single backend reports SIGTERM while its brethren keep running:
>
> Jan  7 04:30:47 mx postgres[4269]: query: vacuum verbose;
> ...
> Jan  7 04:38:16 mx postgres[4269]: FATAL 1:  The system is shutting down

Hmmm... Maybe this also was me... But I am not sure here.

> There is something pretty fishy about this.  You aren't by any chance
> running the postmaster under a ulimit setting that might cut off
> individual backends after a certain amount of CPU time, are you?

[postgres@mx postgres]$ ulimit -a
core file size (blocks)  1000000
data seg size (kbytes)   unlimited
file size (blocks)       unlimited
max memory size (kbytes) unlimited
stack size (kbytes)      8192
cpu time (seconds)       unlimited
max user processes       2048
pipe size (512 bytes)    8
open files               1024
virtual memory (kbytes)  2105343

No, there are no any ulimits.

> What signal does a ulimit violation deliver on your machine, anyway?
       if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {               /* Send SIGXCPU every second.. */
if(!(psecs % HZ))                       send_sig(SIGXCPU, p, 1);               /* and SIGKILL when we go over max.. */
            if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max)                       send_sig(SIGKILL, p, 1);       }
 

This part of the kernel show the logic. This mean that process wil get 
SIGXCPU each second if it above soft limit, and SIGKILL when it will be above 
hardlimit.

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Quite strange crash
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Quite strange crash