Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

Поиск

Список

Период

Сортировка

От	digoal zhou
Тема	Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
Дата	12 мая 2015 г. 07:11:29
Msg-id	CAKHd5Ce-bnD=gEEdtXiT2_AY7shquTKd0yHXXk5F4zVEKRPX-w@mail.gmail.com обсуждение исходный текст
Ответы	Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
Список	pgsql-hackers

Дерево обсуждения

PostgreSQL (<=9.4) trend to smooth buffer write smooth in a checkpoint_completion_target (checkpoint_timeout or checkpoint_segments), but when we use synchronous_commit=off, there is a little problem for the checkpoint_segments target, because xlog write fast(for full page write which the first page write after checkpoint), so checkpointer cann't sleep and write buffer not smooth.

There is an test:

# stap -DMAXSKIPPED=100000 -v 11111 -e '

global s_var, e_var, stat_var;

/* probe smgr__md__read__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int); */

probe process("/opt/pgsql/bin/postgres").mark("smgr__md__read__start") {

s_var[pid(),1] = gettimeofday_us()

}

/* probe smgr__md__read__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int); */

probe process("/opt/pgsql/bin/postgres").mark("smgr__md__read__done") {

e_var[pid(),1] = gettimeofday_us()

if ( s_var[pid(),1] > 0 )

stat_var[pid(),1] <<< e_var[pid(),1] - s_var[pid(),1]

}

/* probe smgr__md__write__start(ForkNumber, BlockNumber, Oid, Oid, Oid, int); */

probe process("/opt/pgsql/bin/postgres").mark("smgr__md__write__start") {

s_var[pid(),2] = gettimeofday_us()

}

/* probe smgr__md__write__done(ForkNumber, BlockNumber, Oid, Oid, Oid, int, int, int); */

probe process("/opt/pgsql/bin/postgres").mark("smgr__md__write__done") {

e_var[pid(),2] = gettimeofday_us()

if ( s_var[pid(),2] > 0 )

stat_var[pid(),2] <<< e_var[pid(),2] - s_var[pid(),2]

}

probe process("/opt/pgsql/bin/postgres").mark("buffer__sync__start") {

printf("buffer__sync__start num_buffers: %d, dirty_buffers: %d\n", $NBuffers, $num_to_write)

}

probe process("/opt/pgsql/bin/postgres").mark("checkpoint__start") {

printf("checkpoint start\n")

}

probe process("/opt/pgsql/bin/postgres").mark("checkpoint__done") {

printf("checkpoint done\n")

}

probe timer.s(1) {

foreach ([v1,v2] in stat_var +) {

if ( @count(stat_var[v1,v2]) >0 ) {

printf("r1_or_w2 %d, pid: %d, min: %d, max: %d, avg: %d, sum: %d, count: %d\n", v2, v1, @min(stat_var[v1,v2]), @max(stat_var[v1,v2]), @avg(stat_var[v1,v2]), @sum(stat_var[v1,v2]), @count(stat_var[v1,v2]))

}

printf("----------------------------------end-----------------------------\n")

delete s_var

delete e_var

delete stat_var

Use the test table and data:

create table tbl(id primary key,info text,crt_time timestamp);

insert into tbl select generate_series(1,50000000),now(),now();

Use pgbench test it.

$ vi test.sql

\setrandom id 1 50000000

update tbl set crt_time=now() where id = :id ;

$ pgbench -M prepared -n -r -f ./test.sql -P 1 -c 28 -j 28 -T 100000000

When on schedule checkpoint occure , the tps:

progress: 255.0 s, 58152.2 tps, lat 0.462 ms stddev 0.504

progress: 256.0 s, 31382.8 tps, lat 0.844 ms stddev 2.331

progress: 257.0 s, 14615.5 tps, lat 1.863 ms stddev 4.554

progress: 258.0 s, 16258.4 tps, lat 1.652 ms stddev 4.139

progress: 259.0 s, 17814.7 tps, lat 1.526 ms stddev 4.035

progress: 260.0 s, 14573.8 tps, lat 1.825 ms stddev 5.592

progress: 261.0 s, 16736.6 tps, lat 1.600 ms stddev 5.018

progress: 262.0 s, 19060.5 tps, lat 1.448 ms stddev 4.818

progress: 263.0 s, 20553.2 tps, lat 1.290 ms stddev 4.146

progress: 264.0 s, 26223.0 tps, lat 1.042 ms stddev 3.711

progress: 265.0 s, 31953.0 tps, lat 0.836 ms stddev 2.837

progress: 266.0 s, 43396.1 tps, lat 0.627 ms stddev 1.615

progress: 267.0 s, 50487.8 tps, lat 0.533 ms stddev 0.647

progress: 268.0 s, 53537.7 tps, lat 0.502 ms stddev 0.598

progress: 269.0 s, 54259.3 tps, lat 0.496 ms stddev 0.624

progress: 270.0 s, 56139.8 tps, lat 0.479 ms stddev 0.524

The parameters for onschedule checkpoint:

checkpoint_segments = 512

checkpoint_timeout = 5min

checkpoint_completion_target = 0.9

stap's output :

there is 156467 dirty blocks, we can see the buffer write per second, write buffer is not smooth between time target.

but between xlog target.

156467/(4.5*60*0.9) = 579.5 write per second.

checkpoint start

buffer__sync__start num_buffers: 262144, dirty_buffers: 156467

r1_or_w2 2, pid: 19848, min: 41, max: 1471, avg: 49, sum: 425291, count: 8596

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 153, avg: 49, sum: 450597, count: 9078

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 643, avg: 51, sum: 429193, count: 8397

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 1042, avg: 55, sum: 449091, count: 8097

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 254, avg: 52, sum: 296668, count: 5617

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 39, max: 171, avg: 54, sum: 321027, count: 5851

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 138, avg: 60, sum: 300056, count: 4953

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 42, max: 1217, avg: 65, sum: 312859, count: 4748

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 1371, avg: 56, sum: 353905, count: 6304

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 41, max: 358, avg: 58, sum: 236254, count: 4038

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 34, max: 1239, avg: 63, sum: 296906, count: 4703

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 31, max: 17408, avg: 63, sum: 415234, count: 6534

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 31, max: 5486, avg: 57, sum: 190345, count: 3318

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 29, max: 510, avg: 53, sum: 136221, count: 2563

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 32, max: 733, avg: 52, sum: 108327, count: 2070

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 34, max: 382, avg: 53, sum: 96157, count: 1812

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 43, max: 327, avg: 53, sum: 83641, count: 1571

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 33, max: 102, avg: 54, sum: 79991, count: 1468

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 35, max: 88, avg: 53, sum: 74338, count: 1389

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 32, max: 86, avg: 52, sum: 65710, count: 1243

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 30, max: 347, avg: 52, sum: 66866, count: 1263

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 31, max: 93, avg: 54, sum: 75642, count: 1398

----------------------------------end-----------------------------

r1_or_w2 2, pid: 19848, min: 33, max: 100, avg: 51, sum: 62302, count: 1216

......

I think we can add an condition to the IsCheckpointOnSchedule,

if (synchronous_commit != SYNCHRONOUS_COMMIT_OFF)

{

recptr = GetInsertRecPtr();

elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) / XLogSegSize) / CheckPointSegments;

if (progress < elapsed_xlogs)

{

ckpt_cached_elapsed = elapsed_xlogs;

return false;

}

# vi src/backend/postmaster/checkpointer.c

#include "access/xact.h"

* IsCheckpointOnSchedule -- are we on schedule to finish this checkpoint

* in time?

* Compares the current progress against the time/segments elapsed since last

* checkpoint, and returns true if the progress we've made this far is greater

* than the elapsed time/segments.

static bool

IsCheckpointOnSchedule(double progress)

{

XLogRecPtr recptr;

struct timeval now;

double elapsed_xlogs,

elapsed_time;

Assert(ckpt_active);

/* Scale progress according to checkpoint_completion_target. */

progress *= CheckPointCompletionTarget;

* Check against the cached value first. Only do the more expensive

* calculations once we reach the target previously calculated. Since

* neither time or WAL insert pointer moves backwards, a freshly

* calculated value can only be greater than or equal to the cached value.

if (progress < ckpt_cached_elapsed)

return false;

* Check progress against WAL segments written and checkpoint_segments.

* We compare the current WAL insert location against the location

* computed before calling CreateCheckPoint. The code in XLogInsert that

* actually triggers a checkpoint when checkpoint_segments is exceeded

* compares against RedoRecptr, so this is not completely accurate.

* However, it's good enough for our purposes, we're only calculating an

* estimate anyway.

if (!RecoveryInProgress())

{

if (synchronous_commit != SYNCHRONOUS_COMMIT_OFF)

{

recptr = GetInsertRecPtr();

elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) / XLogSegSize) / CheckPointSegments;

if (progress < elapsed_xlogs)

{

ckpt_cached_elapsed = elapsed_xlogs;

return false;

}

* Check progress against time elapsed and checkpoint_timeout.

gettimeofday(&now, NULL);

elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) +

now.tv_usec / 1000000.0) / CheckPointTimeout;

if (progress < elapsed_time)

{

ckpt_cached_elapsed = elapsed_time;

return false;

}

/* It looks like we're on schedule. */

return true;

}

# gmake && gmake install

$ pg_ctl restart -m fast

Test again:

progress: 291.0 s, 63144.9 tps, lat 0.426 ms stddev 0.383

progress: 292.0 s, 55063.7 tps, lat 0.480 ms stddev 1.433

progress: 293.0 s, 12225.3 tps, lat 2.238 ms stddev 4.460

progress: 294.0 s, 16436.4 tps, lat 1.621 ms stddev 4.043

progress: 295.0 s, 18516.5 tps, lat 1.444 ms stddev 3.286

progress: 296.0 s, 21983.7 tps, lat 1.251 ms stddev 2.941

progress: 297.0 s, 25759.7 tps, lat 1.034 ms stddev 2.356

progress: 298.0 s, 33139.4 tps, lat 0.821 ms stddev 1.676

progress: 299.0 s, 41904.9 tps, lat 0.644 ms stddev 1.134

progress: 300.0 s, 52432.9 tps, lat 0.513 ms stddev 0.470

progress: 301.0 s, 57115.4 tps, lat 0.471 ms stddev 0.325

progress: 302.0 s, 59422.1 tps, lat 0.452 ms stddev 0.297

progress: 303.0 s, 59860.5 tps, lat 0.449 ms stddev 0.309

We can see checkpointer wiriter buffer smooth(spread time perid) this time.

checkpoint start

----------------------------------end-----------------------------

buffer__sync__start num_buffers: 262144, dirty_buffers: 156761

r1_or_w2 2, pid: 22334, min: 51, max: 137, avg: 60, sum: 52016, count: 860

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 108, avg: 58, sum: 35526, count: 604

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 145, avg: 71, sum: 39779, count: 559

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 52, max: 172, avg: 79, sum: 47279, count: 594

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 44, max: 160, avg: 63, sum: 36907, count: 581

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 113, avg: 61, sum: 33895, count: 552

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 116, avg: 61, sum: 38177, count: 617

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 113, avg: 62, sum: 34199, count: 550

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 53, max: 109, avg: 65, sum: 39842, count: 606

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 50, max: 118, avg: 64, sum: 35099, count: 545

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 50, max: 107, avg: 64, sum: 39027, count: 606

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 51, max: 114, avg: 62, sum: 34054, count: 545

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 47, max: 106, avg: 63, sum: 38573, count: 605

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 48, max: 101, avg: 62, sum: 38051, count: 607

----------------------------------end-----------------------------

r1_or_w2 2, pid: 22334, min: 42, max: 103, avg: 61, sum: 33596, count: 545

But there is also a little problem, When PostgreSQL write xlog reach checkpoint_segments earlier then checkpoint_timeout, the next checkpoint will start soon, so we must tuning the checkpoint_segments larger when the checkpoint occure busy.

Regards,

Digoal

--
公益是一辈子的事,I'm Digoal,Just Do It.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Alvaro Herrera
Дата: 12 мая 2015 г., 07:03:13
Сообщение: Re: Multi-xacts and our process problem

Следующее

От: Craig Ringer
Дата: 12 мая 2015 г., 07:27:02
Сообщение: RFC: Non-user-resettable SET SESSION AUTHORISATION

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

Предыдущее

Следующее