Re: WAL recycling, ext3, Linux 2.4.18

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WAL recycling, ext3, Linux 2.4.18
Дата
Msg-id 18147.1026146435@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: WAL recycling, ext3, Linux 2.4.18  (Doug Fields <dfields-pg-general@pexicom.com>)
Список pgsql-general
Doug Fields <dfields-pg-general@pexicom.com> writes:
> Here is a stack trace. I did "where" about every second during the "pause"
> and received the same stack trace. This is on PID 3456 per the
> pg_stat_activity listing below. After things clear up, I also did a stack
> trace; it's blocked on recv, presumably waiting for more commands to come
> down the socket. (I tried a few other PIDs with similar stack traces, all
> stuck on the semop call.)

Hmm.  I don't think I entirely believe that stack trace --- at least
some of the claimed call paths are impossible.  Would it be too much
trouble to rebuild PG with --enable-debug and try again?

Also, could you do the checkpoint manually and get a stack trace from
that backend while others are hung up?

I am considering the possibility that the other backends are hung trying
to get ControlFileLock, which the checkpointer will acquire while
recycling xlog file segments --- but if your stack trace is accurate and
representative then that's not the problem because XLogInsert doesn't
directly try to acquire ControlFileLock.  In any case it's hard to
credit that the recycling process could take 90 seconds to rename a
dozen or so files.  If you have a gdb attached to a process doing a
manual checkpoint, it would be fairly easy to see how long
MoveOfflineLogs() runs.  (Set a breakpoint at its start, when control
reaches the breakpoint issue "fin" and see how long it takes to come
back.)

            regards, tom lane



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: WAL recycling, Linux 2.4.18
Следующее
От: "Nigel J. Andrews"
Дата:
Сообщение: Re: pgaccess problems