Re: [HACKERS] Frustration

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Frustration
Дата
Msg-id 5760.938183214@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Frustration  (Michael Simms <grim@argh.demon.co.uk>)
Ответы RE: [HACKERS] Frustration  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Список pgsql-hackers
Michael Simms <grim@argh.demon.co.uk> writes:
> Well, thanks to tom, I know what was wrong, and I have found the problem,
> or one of them at least...
> FATAL: s_lock(0c9ef824) at bufmgr.c:1106, stuck spinlock. Aborting.
> Okee, that segment of code is, well, its some deep down internals that
> are as clear as mud to me.

Hmph.  Apparently, some backend was waiting for some other backend to
finish reading a page in or writing it out, and gave up after deciding
it had waited an unreasonable amount of time (~ 1 minute, which does
seem plenty long enough).  Probably, the I/O did in fact finish, but
the waiting backend didn't get the word for some reason.

Is it possible that there's something wrong with the spinlock code on
your hardware?  There are a bunch of different spinlock implementations
(assembly code for various hardware) in include/storage/s_lock.h and
backend/storage/buffer/s_lock.c.  Some of 'em might not be as well
tested as others.  But you're on PC hardware, right?  I would've thought
that flavor of the code would be pretty well wrung out.

Another likely explanation is that there's something wrong in
bufmgr.c's logic for setting and releasing the io_in_progress lock ---
but a quick look doesn't show any obvious error, and I would have
thought we'd have found out about any such problem long since.
Since we're not being buried in reports of stuck-spinlock errors,
I'm guessing there is some platform-specific problem on your machine.
No good ideas what it is if it isn't a spinlock failure.

(Finally, are you sure this is the *only* indication of trouble in
the logs?  If a backend crashed while holding the spinlock, the other
ones would eventually die with complaints like this, but that wouldn't
make the spinlock code be at fault...)
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Lockhart
Дата:
Сообщение: Re: [HACKERS] Re: [GENERAL] Update of bitmask type
Следующее
От: Thomas Lockhart
Дата:
Сообщение: Re: PostgreSQL Upgrade Procedure