Re: Redesigning checkpoint_segments

Поиск
Список
Период
Сортировка
От Venkata Balaji N
Тема Re: Redesigning checkpoint_segments
Дата
Msg-id CAEyp7J-BeBcn4dO9judZmxcJrp8Fj23nuF-nuitpd5MTqu1zfg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Redesigning checkpoint_segments  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Redesigning checkpoint_segments  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Hi,

I really like the idea of tuning checkpoint segments based on disk space usage.

I performed series of tests for this patch and would like to share the results. My comments are in-line.

To start with, I applied this patch to the master successfully -


But ... do I understand things correctly that checkpoint wouldn't "kick
in" until you hit checkpoint_wal_size?  If that's the case, isn't real
disk space usage around 2X checkpoint_wal_size if spread checkpoint is
set to 0.9?  Or does checkpoint kick in sometime earlier?

It kicks in earlier, so that the checkpoint *completes* just when checkpoint_wal_size of WAL is used up. So the real disk usage is checkpoint_wal_size.

There is still an internal variable called CheckPointSegments that triggers the checkpoint, but it is now derived from checkpoint_wal_size (see CalculateCheckpointSegments function):

CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 + checkpoint_completion_target)

Yes, i see this happening.

This is the same formula we've always had in the manual for calculating the amount of WAL space used, but in reverse. I.e. we calculate CheckPointSegments based on the desired disk space usage, not the other way round.

As a note, pgBench would be a terrible test for this patch; we really
need something which creates uneven traffic.  I'll see if I can devise
something.

Attached is a rebased version of this patch. Everyone, please try this out on whatever workloads you have, and let me know:

a) How does the auto-tuning of the number of recycled segments work? Does pg_xlog reach a steady-state size, or does it fluctuate a lot?

I performed the tests by executing heavy INSERT operations (INSERTS only) using benchmarksql. I do see that pg_xlog size is increasing at times.

I have inserted about 6GB of data for testing.

Below are the test results.

Test 1 :

In this test, i see removed+recycled segments = 3 (except for the first 3 checkpoint cycles) and has been steady through out until the INSERT operation completed.

Actual calculation of CheckPointSegments = 3.2 ( is getting rounded up to 3 )

pg_xlog size is 128M and has increased to 160M max during the INSERT operation.

shared_buffers = 128M
checkpoint_wal_size = 128M
min_recycle_wal_size = 80M
checkpoint_timeout = 5min


TimeStamp=2015-01-27 09:39:14.325 GMT-10 DB=bsql SID=54c6cfd6.5e4 User=postgres LOG:  statement: update order_line set ol_amount = 0.01;
TimeStamp=2015-01-27 09:39:15.407 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint complete: wrote 5123 buffers (31.3%); 0 transaction log file(s) added, 1 removed, 0 recycled; write=0.593 s, sync=2.492 s, total=3.273 s; sync files=26, longest=0.399 s, average=0.095 s; distance=52653 KB, estimate=52653 KB
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoints are occurring too frequently (3 seconds apart)
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= HINT:  Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint complete: wrote 8145 buffers (49.7%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=0.913 s, sync=1.476 s, total=2.530 s; sync files=4, longest=0.534 s, average=0.369 s; distance=87446 KB, estimate=87446 KB
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoints are occurring too frequently (3 seconds apart)
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= HINT:  Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:23.169 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint complete: wrote 4598 buffers (28.1%); 0 transaction log file(s) added, 3 removed, 2 recycled; write=0.716 s, sync=1.083 s, total=1.957 s; sync files=4, longest=0.486 s, average=0.270 s; distance=47964 KB, estimate=83498 KB
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoints are occurring too frequently (2 seconds apart)
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= HINT:  Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:24.968 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint complete: wrote 3417 buffers (20.9%); 0 transaction log file(s) added, 1 removed, 2 recycled; write=0.539 s, sync=1.059 s, total=1.732 s; sync files=4, longest=0.535 s, average=0.264 s; distance=44814 KB, estimate=79629 KB
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoints are occurring too frequently (2 seconds apart)
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= HINT:  Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:26.879 GMT-10 DB= SID=54bee4a1.3002 User= LOG:  checkpoint complete: wrote 4721 buffers (28.8%); 0 transaction log file(s) added, 1 removed, 2 recycled; write=0.474 s, sync=1.166 s, total=1.761 s; sync files=4, longest=0.583 s, average=0.291 s; distance=49145 KB, estimate=76581 KB

Test 2 :

removed+recycled segments remained 3 even after i increased the checkpoint_wal_size = 144M. This is obviously due to the calculation in CalculateCheckpointSegments() functions.

checkpoint_wal_size = 144M
min_recycle_wal_size = 104M
checkpoint_timeout = 5min
shared_buffers = 1 GB

Actual calculation of CheckPointSegments = 3.6

TimeStamp=2015-01-27 13:54:38.469 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:42.831 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5419 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.408 s, sync=1.820 s, total=4.361 s; sync files=3, longest=1.432 s, average=0.606 s; distance=48175 KB, estimate=49972 KB
TimeStamp=2015-01-27 13:54:44.824 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:49.008 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5570 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.769 s, sync=1.268 s, total=4.184 s; sync files=3, longest=0.843 s, average=0.422 s; distance=51720 KB, estimate=51720 KB
TimeStamp=2015-01-27 13:54:50.754 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:55.127 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5155 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.977 s, sync=1.273 s, total=4.372 s; sync files=3, longest=0.848 s, average=0.424 s; distance=46133 KB, estimate=51161 KB
TimeStamp=2015-01-27 13:54:57.164 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:01.622 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5345 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.598 s, sync=1.290 s, total=4.458 s; sync files=3, longest=0.894 s, average=0.430 s; distance=49604 KB, estimate=51006 KB
TimeStamp=2015-01-27 13:55:03.501 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:07.390 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5482 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.549 s, sync=1.193 s, total=3.889 s; sync files=3, longest=0.837 s, average=0.397 s; distance=49963 KB, estimate=50901 KB
TimeStamp=2015-01-27 13:55:09.381 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:13.626 GMT-10 DB= SID=54c70b57.21a0 User= LOG:  checkpoint complete: wrote 5481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=2.778 s, sync=1.280 s, total=4.244 s; sync
 
Test 3 :

checkpoint_wal_size = 244M
min_recycle_wal_size = 204M
checkpoint_timeout = 5min
shared_buffers = 1 GB

removed+recycled segments remained 6.

Actual calculation of checkpointsegments = 6.1

TimeStamp=2015-01-27 14:02:01.936 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:10.638 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint complete: wrote 14111 buffers (10.8%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=5.527 s, sync=2.719 s, total=8.701 s; sync files=14, longest=1.789 s, average=0.194 s; distance=98617 KB, estimate=99036 KB
TimeStamp=2015-01-27 14:02:14.243 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:22.783 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint complete: wrote 16524 buffers (12.6%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=7.013 s, sync=1.394 s, total=8.540 s; sync files=3, longest=0.867 s, average=0.464 s; distance=98724 KB, estimate=99005 KB
TimeStamp=2015-01-27 14:02:28.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:36.946 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint complete: wrote 16541 buffers (12.6%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=4.899 s, sync=3.801 s, total=8.879 s; sync files=9, longest=2.800 s, average=0.422 s; distance=98719 KB, estimate=98976 KB
TimeStamp=2015-01-27 14:02:40.611 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:48.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint complete: wrote 10998 buffers (8.4%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=4.874 s, sync=1.998 s, total=7.455 s; sync files=3, longest=1.998 s, average=0.666 s; distance=98771 KB, estimate=98956 KB
TimeStamp=2015-01-27 14:02:53.327 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-27 14:03:00.872 GMT-10 DB= SID=54c70d58.22f4 User= LOG:  checkpoint complete: wrote 10640 buffers (8.1%); 0 transaction log file(s) added, 1 removed, 5 recycled; write=5.247 s, sync=2.097 s, total=7.544 s; sync files=3, longest=1.640 s, average=0.699 s; distance=98624 KB, estimate=98923 KB

Test 4 :

This time i tested with wal_keep_segments = 300 (4.8 G)

checkpoint_wal_size = 512MB
min_recycle_wal_size = 80M
wal_keep_segments = 300
checkpoint_timeout = 5min
shared_buffers = 1 GB

Actual calculation of checkpointsegments = 12.8

TimeStamp=2015-01-29 12:51:48.276 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 12:52:04.325 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 20965 buffers (16.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=11.676 s, sync=3.830 s, total=16.049 s; sync files=18, longest=2.991 s, average=0.212 s; distance=196705 KB, estimate=196705 KB
TimeStamp=2015-01-29 12:52:16.068 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 12:52:33.529 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 22009 buffers (16.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=12.705 s, sync=3.559 s, total=17.460 s; sync files=3, longest=3.002 s, average=1.186 s; distance=200401 KB, estimate=200401 KB
TimeStamp=2015-01-29 12:52:43.321 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog

Since the wal_keep_segments is 300, recycling or removing of the transactions logs begins after the required number of wal_keep_segments are retained. Which is 4.8G in this case.

removed+recycled has always been 12 except for the first 3 checkpoint cycles after pg_xlog size reached 4.8G.

TimeStamp=2015-01-29 13:03:29.167 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:03:58.401 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 20316 buffers (15.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=11.963 s, sync=16.840 s, total=29.233 s; sync files=16, longest=15.137 s, average=1.052 s; distance=197432 KB, estimate=197432 KB
TimeStamp=2015-01-29 13:04:05.451 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:04:52.416 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 20280 buffers (15.5%); 0 transaction log file(s) added, 5 removed, 8 recycled; write=10.989 s, sync=35.791 s, total=46.965 s; sync files=10, longest=17.927 s, average=3.579 s; distance=196668 KB, estimate=197356 KB
TimeStamp=2015-01-29 13:04:52.635 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:15.520 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 31394 buffers (24.0%); 0 transaction log file(s) added, 0 removed, 10 recycled; write=10.270 s, sync=12.404 s, total=22.884 s; sync files=17, longest=5.014 s, average=0.729 s; distance=197961 KB, estimate=197961 KB
TimeStamp=2015-01-29 13:05:20.356 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:35.060 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 32731 buffers (25.0%); 0 transaction log file(s) added, 0 removed, 10 recycled; write=11.433 s, sync=3.055 s, total=14.703 s; sync files=13, longest=1.300 s, average=0.235 s; distance=196510 KB, estimate=197816 KB
TimeStamp=2015-01-29 13:05:43.059 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:59.518 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 30264 buffers (23.1%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=10.687 s, sync=5.624 s, total=16.459 s; sync files=12, longest=3.971 s, average=0.468 s; distance=193348 KB, estimate=197369 KB
TimeStamp=2015-01-29 13:06:07.371 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint starting: xlog
TimeStamp=2015-01-29 13:06:23.870 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:  checkpoint complete: wrote 30723 buffers (23.4%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=10.132 s, sync=6.159 s, total=16.498 s; sync file


b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size, intuitive to set?


During my tests, I did not observe the significance of min_recycle_wal_size parameter yet. Ofcourse, i had sufficient disk space for pg_xlog.

I would like to understand more about "min_recycle_wal_size" parameter. In theory, i only understand from the note in the patch that if the disk space usage falls below certain threshold, min_recycle_wal_size number of WALs will be removed to accommodate future pg_xlog segments. I will try to test this out. Please let me know if there is any specific test to understand min_recycle_wal_size behaviour.

I will try to perform some more stress testing with different set of high workloads and will share the results.

I did not review the patch code completely. Will comment once done.

Please share your thoughts on this.

Regards,
Venkata B N

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Possible typo in create_policy.sgml
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Safe memory allocation functions