Re: Slow Query / Check Point Segments

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Slow Query / Check Point Segments
Дата
Msg-id 4B594407.4050508@2ndquadrant.com
обсуждение исходный текст
Ответ на Slow Query / Check Point Segments  (Alex - <aintokyo@hotmail.com>)
Ответы Re: Slow Query / Check Point Segments
Re: Slow Query / Check Point Segments
Список pgsql-general
Alex - wrote:
checkpoint_segments = 32                # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min              # range 30s-1h

These parameters are not so interesting on their own.  The important thing to check is how often checkpoints are happening, and how much work each of them do.  Here's a useful section from your logs to comment on:

2010-01-22 12:21:48 JSTLOG:  checkpoint complete: wrote 83874 buffers (16.0%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=138.040 s, sync=0.000 s, total=138.063 s
2010-01-22 12:23:32 JSTLOG:  checkpoint complete: wrote 82856 buffers (15.8%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=18.740 s, sync=0.000 s, total=18.783 s
2010-01-22 12:24:26 JSTLOG:  checkpoint complete: wrote 75145 buffers (14.3%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=12.129 s, sync=0.000 s, total=12.132 s
2010-01-22 12:25:30 JSTLOG:  checkpoint complete: wrote 82108 buffers (15.7%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=10.619 s, sync=0.000 s, total=10.621 s
2010-01-22 12:28:03 JSTLOG:  checkpoint complete: wrote 87349 buffers (16.7%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=82.190 s, sync=0.000 s, total=82.192 s

Here you're getting a checkpoint every minute or three, and each of them is writing out ~80000 buffers = 625MB.  That is crazy, and no wonder your queries are slow--the system is spending all of its time doing constant, extremely expensive checkpoints.

You should re-tune this system until the checkpoints show up no more often than every 5 minutes, and preferably closer to 10.  In your case, that means greatly increasing checkpoint_segments..  And you might as well decrease checkpoint_timeout, because right now the timeout is rarely ever being reached before the system runs out of working segments and executes a checkpoint for that reason.

I would suggest changing the parameters to these:

checkpoint_segments = 96                # in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min              # range 30s-1h

And seeing how long the interval between checkpoints becomes under load afterwards.  That should make it in the just >5 minute range.  I'd expect that the number of buffers will only increase a bit, so instead of dumping out 625MB every minute or three you should see maybe 800MB every 5 minutes--big improvement.

Also:  just after making the change, save a copy of:

select now(),* from pg_stat_bgwriter

And then run that same query again a few hours after the change (and maybe the next day twoo).  Subtract the two values to see how much they changed, and then you'll find some useful info to compute the checkpoint interval without even looking at the logs.  There's a lot of theory and comments about this area at http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

2010-01-22 12:36:08 JSTLOG:  checkpoint complete: wrote 80819 buffers (15.4%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=51.476 s, sync=0.000 s, total=51.478 s
2010-01-22 13:01:54 JSTLOG:  checkpoint complete: wrote 4892 buffers (0.9%); 0 transaction log file(s) added, 0 removed, 32 recycled; write=494.868 s, sync=0.000 s, total=494.982 s


See that last one?  That's a checkpoint that's being driven by time expiring (30 minutes) rather than running out of segments.  That just suggests that the system was mostly idle during that period.  Long write times are perfectly fine here, the checkpoints are spread out over a long time if possible in order to reduce average I/O.  You're just not seeing that the rest of the time because checkpoints are happening so often.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Yan Cheng Cheok
Дата:
Сообщение: Re: Extremely Slow Cascade Delete Operation
Следующее
От: Yan Cheng Cheok
Дата:
Сообщение: Optimization on JOIN