Checkpoint distribution

Поиск

Список

Период

Сортировка

От	Borodin Vladimir
Тема	Checkpoint distribution
Дата	14 апреля 2014 г. 12:47:08
Msg-id	4AF2BEEF-5BB5-4155-BF70-D3E877914061@simply.name обсуждение исходный текст
Ответы	Re: Checkpoint distribution
Список	pgsql-performance

Дерево обсуждения

Hi all.

I’m running PostgreSQL 9.3.4 and doing stress test of the database with writing only load. The test plan does 1000 transactions per second (each of them does several updates/inserts). The problem is that checkpoint is not distributed over time well. When the checkpoint finishes, the db gets lots of I/O operations and response timings grows strongly.

My checkpoint settings looks like that:

postgres=# select name, setting from pg_catalog.pg_settings where name like 'checkpoint%' and boot_val != reset_val;

name | setting

------------------------------+---------

checkpoint_completion_target | 0.9

checkpoint_segments | 100500

checkpoint_timeout | 600

(3 rows)

postgres=#

But in the log I see that checkpoint continues less than 600*0.9 = 540 seconds:

2014-04-14 12:54:41.479 MSK,,,10517,,53468da6.2915,433,,2014-04-10 16:25:10 MSK,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""

2014-04-14 12:57:06.107 MSK,,,10517,,53468da6.2915,434,,2014-04-10 16:25:10 MSK,,0,LOG,00000,"checkpoint complete: wrote 65140 buffers (24.8%); 0 transaction log file(s) added, 0 removed, 327 recycled; write=134.217 s, sync=10.292 s, total=144.627 s; sync files=31, longest=3.332 s, average=0.331 s",,,,,,,,,»"

When the checkpoint starts (12:54:41.479) dstat says that I/O load increases:

----system---- -dsk/total- --io/total-

date/time | read writ| read writ

14-04 12:54:39| 0 15M| 0 2562

14-04 12:54:40| 0 13M| 0 2330

14-04 12:54:41| 0 97M| 0 5981

14-04 12:54:42| 0 95M| 0 8869

14-04 12:54:43| 0 147M| 0 8493

14-04 12:54:44| 0 144M| 0 8316

14-04 12:54:45| 0 176M| 0 8189

14-04 12:54:46| 0 141M| 0 8221

14-04 12:54:47| 0 143M| 0 8260

14-04 12:54:48| 0 141M| 0 7576

14-04 12:54:49| 0 173M| 0 8171

But when it finishes (12:57:06.107) the I/O load is much higher than the hardware can do:

----system---- -dsk/total- --io/total-

date/time | read writ| read writ

14-04 12:56:52| 0 33M| 0 5185

14-04 12:56:53| 0 64M| 0 5271

14-04 12:56:54| 0 65M| 0 5256

14-04 12:56:55| 0 153M| 0 15.8k

14-04 12:56:56| 0 758M| 0 18.6k

14-04 12:56:57| 0 823M| 0 4164

14-04 12:56:58| 0 843M| 0 8186

14-04 12:56:59| 0 794M| 0 15.0k

14-04 12:57:00| 0 880M| 0 5954

14-04 12:57:01| 0 862M| 0 4608

14-04 12:57:02| 0 804M| 0 7387

14-04 12:57:03| 0 849M| 0 4878

14-04 12:57:04| 0 788M| 0 20.0k

14-04 12:57:05| 0 805M| 0 6004

14-04 12:57:06| 0 143M| 0 6932

14-04 12:57:07| 0 108M| 0 6150

14-04 12:57:08| 0 42M| 0 6233

14-04 12:57:09| 0 73M| 0 6248

Response timings of the application at this moment look like that:

The hardware is quite good to handle this load (PGDATA lives on soft raid10 array of 8 ssd drives). I’ve done the same test with 3000 tps - the result was exactly the same. The only difference was that I/O spikes had been stronger.

So my question is why the checkpoint is not spread for 540 seconds? Is there any way to understand why I/O spike happens when the checkpoint finishes but does not happen during all of the checkpoint process? Any help would be really appropriate.

Vladimir

Вложения

2014-04-14 13-10-47 199509 - _write__stability_.png

В списке pgsql-performance по дате отправления:

Предыдущее

От: Jeff Janes
Дата: 14 апреля 2014 г., 03:02:10
Сообщение: Re: batch update query performance

Следующее

От: Ryan Johnson
Дата: 14 апреля 2014 г., 15:58:22
Сообщение: SSI slows down over time

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Checkpoint distribution

Вложения

Предыдущее

Следующее