Re: Checkpoint Tuning Question

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Checkpoint Tuning Question
Дата
Msg-id 2479.1247418610@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Checkpoint Tuning Question  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Checkpoint Tuning Question
Re: Checkpoint Tuning Question
Список pgsql-general
Simon Riggs <simon@2ndQuadrant.com> writes:
> This causes us to queue for the WALInsertLock twice at exactly the time
> when every caller needs to calculate the CRC for complete blocks. So we
> queue twice when the lock-hold-time is consistently high, causing queue
> lengths to go ballistic.

You keep saying that, and it keeps not being true, because the CRC
calculation is *not* done while holding the lock.

It is true that the very first XLogInsert call in each backend after
a checkpoint starts will have to go back and redo its CRC calculation,
but that's a one-time waste of CPU.  It's hard to see how it could have
continuing effects over several seconds, especially in a system that
has CPU to spare.

What I think might be the cause is that just after a checkpoint starts,
quite a large proportion of XLogInserts will include full-page buffer
copies, thus leading to an overall higher rate of WAL creation.  That
means longer hold times for WALInsertLock due to spending more time
copying data into the WAL buffers, and it also means more WAL that has
to be synced to disk before a transaction can commit.  I'm still
convinced that Dan's problem ultimately comes down to inadequate disk
bandwidth, so I think the latter point is probably the key.

So this thought leads to a couple of other things Dan could test.
First, see if turning off full_page_writes makes the hiccup go away.
If so, we know the problem is in this area (though still not exactly
which reason); if not we need another idea.  That's not a good permanent
fix though, since it reduces crash safety.  The other knobs to
experiment with are synchronous_commit and wal_sync_method.  If the
stalls are due to commits waiting for additional xlog to get written,
then async commit should stop them.  I'm not sure if changing
wal_sync_method can help, but it'd be worth experimenting with.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: dkeeney
Дата:
Сообщение: Postgresql databases as a web service
Следующее
От: Roy Walter
Дата:
Сообщение: Re: xpath() subquery for empty array