Re: Separating bgwriter and checkpointer

Поиск
Список
Период
Сортировка
От Dickson S. Guedes
Тема Re: Separating bgwriter and checkpointer
Дата
Msg-id CAHHcrepQNCN3HoXSWAOtsj3HJKp5NoepB2ky3dFgVA4JCXh-Qw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Separating bgwriter and checkpointer  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Separating bgwriter and checkpointer
Список pgsql-hackers
2011/10/4 Simon Riggs <simon@2ndquadrant.com>:
> The problem is the *same* one I fixed in v2, yet now I see I managed
> to somehow exclude that fix from the earlier patch. Slap. Anyway,
> fixed again now.

Ah ok! I started reviewing the v4 patch version, this is my comments:

Submission review
===============

1. The patch applies cleanly to current master (fa56a0c3e) but isn't
in context diff format;

Feature test
==========

1. Since I patched and make installed it, I can see the expected
processes: writer and checkpointer;

2. I did the following tests with the following results:

2.1 Running a long time pgbench didn't emit any assertion failure or
crash and the checkpoint works as before patch:
 LOG:  checkpoint starting: xlog LOG:  checkpoint complete: wrote 300 buffers (9.8%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=26.103 s, sync=6.492
s, total=34.000 s; sync files=13, longest=4.684 s, average=0.499 s LOG:  checkpoint starting: time LOG:  checkpoint
complete:wrote 257 buffers (8.4%); 0 transaction 
log file(s) added, 0 removed, 3 recycled; write=21.819 s, sync=9.610
s, total=32.076 s; sync files=7, longest=6.452 s, average=1.372 s

2.2 Forcing a checkpoint when filesystem has enough free space works
fine while pgbench is running:
 LOG:  checkpoint starting: immediate force wait LOG:  checkpoint complete: wrote 1605 buffers (52.2%); 0 transaction
log file(s) added, 0 removed, 2 recycled; write=0.344 s, sync=22.750
s, total=23.700 s; sync files=10, longest=15.586 s, average=2.275 s

2.3 Forcing a checkpoint when filesystem are full, works as expected:
 LOG:  checkpoint starting: immediate force wait time ERROR:  could not write to file "pg_xlog/xlogtemp.4380": Não há
espaço disponível no dispositivo ERROR:  checkpoint request failed HINT:  Consult recent messages in the server log for
details.STATEMENT:  CHECKPOINT ; ... ERROR:  could not extend file "base/16384/16405": wrote only 4096 of 
8192 bytes at block 10 HINT:  Check free disk space. STATEMENT:  INSERT INTO pgbench_history (tid, bid, aid, delta,
mtime) VALUES (69, 3, 609672, -3063,  CURRENT_TIMESTAMP); PANIC:  could not write to file "pg_xlog/xlogtemp.4528": Não
há
espaço disponível no dispositivo STATEMENT:  END; LOG:  server process (PID 4528) was terminated by signal 6: Aborted

Then I freed some space and started it again and the server ran properly:
  LOG:  database system was shut down at 2011-10-05 00:46:33 BRT  LOG:  database system is ready to accept connections
LOG: autovacuum launcher started  ...  LOG:  checkpoint starting: immediate force wait  LOG:  checkpoint complete:
wrote0 buffers (0.0%); 0 transaction 
log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.181 s; sync files=0, longest=0.000 s, average=0.000 s

2.2 Running a pgbench and interrupting postmaster a few seconds later,
seems to work as expected, returning the output:

... cut ...
LOG:  statement: SELECT abalance FROM pgbench_accounts WHERE aid = 148253;
^CLOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance +
934 WHERE tid = 85;
DEBUG:  postmaster received signal 2
LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
... cut ...
LOG:  disconnection: session time: 0:00:14.917 user=guedes
database=bench host=[local]
LOG:  disconnection: session time: 0:00:14.773 user=guedes
database=bench host=[local]
DEBUG:  server process (PID 1910) exited with exit code 1
DEBUG:  server process (PID 1941) exited with exit code 1
LOG:  shutting down
LOG:  checkpoint starting: shutdown immediate
DEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/0000
DEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/0000
DEBUG:  checkpoint sync: number=1 file=base/16384/16398 time=1075.227 msec
DEBUG:  checkpoint sync: number=2 file=base/16384/16406 time=16.832 msec
DEBUG:  checkpoint sync: number=3 file=base/16384/16397 time=12306.204 msec
DEBUG:  checkpoint sync: number=4 file=base/16384/16397.1 time=2122.141 msec
DEBUG:  checkpoint sync: number=5 file=base/16384/16406_fsm time=32.278 msec
DEBUG:  checkpoint sync: number=6 file=base/16384/16385_fsm time=11.248 msec
DEBUG:  checkpoint sync: number=7 file=base/16384/16388 time=11.083 msec
DEBUG:  checkpoint sync: number=8 file=base/16384/11712 time=11.314 msec
DEBUG:  checkpoint sync: number=9 file=base/16384/16397_vm time=11.103 msec
DEBUG:  checkpoint sync: number=10 file=base/16384/16385 time=19.308 msec
DEBUG:  attempting to remove WAL segments older than log file
000000010000000000000000
DEBUG:  SlruScanDirectory invoking callback on pg_subtrans/0000
LOG:  checkpoint complete: wrote 1659 buffers (54.0%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=15.617
s, total=15.898 s; sync files=10, longest=12.306 s, average=1.561 s
LOG:  database system is shut down

Then I started the server again and it ran properly.


Well, all the tests was running with the default postgresql.conf in my
laptop but I'll setup a more "real world" environment to test for
performance regression. Until now I couldn't notice any significant
difference in TPS before and after patch in a small environment. I'll
post something soon.


Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Seiko Ishida (MP Tech Consulting LLC)"
Дата:
Сообщение: Action requested - Application Softblock implemented | Issue report ID341057
Следующее
От: daveg
Дата:
Сообщение: timezone buglet?