Very slow checkpoints

Поиск

Список

Период

Сортировка

От	Steven Jones
Тема	Very slow checkpoints
Дата	18 марта 2015 г. 11:21:15
Msg-id	COL129-W88EAA235AC3967FF7F5D8792000@phx.gbl обсуждение исходный текст
Ответы	Re: Very slow checkpoints Re: Very slow checkpoints Re: Very slow checkpoints Re: Very slow checkpoints
Список	pgsql-performance

Дерево обсуждения

Hi,

We have a so far (to us) unexplainable issue on our production systems after we roughly doubled the amount of data we import daily. We should be ok on pure theoretical hardware performance, but we are seeing some weird IO counters when the actual throughput of the writes is very low. The use case is as follows:

- typical DW - relatively constant periodic data loads - i.e. heavy write

- we receive large CSV files ~ 5-10Gb every 15 minutes spread out across 5-7 minutes

- Custom ETL scripts process and filter files within < 30 seconds down to about 5Gb CSV ready to load

- 2 loader queues load the files, picking off a file one-by-one

- tables are partitioned daily, indexed on a primary key + timestamp

- system is HP blade; 128Gb RAM, 2x 8-core, 12x 10k RPM RAID1+0 (database) on first controller, 2x 15k RAID1 (xlog) on a different controller

- DB size is ~2.5Tb; rotating load of 30 days keeps the database stable

- filesystem: zfs with lz4 compression

- raw throughput of the database disk is > 700Mbytes/sec sequential and >150Mbytes random for read and roughly half for write in various benchmarks

- CPU load is minimal when copy loads are taking place (i.e. after ETL has finished)

The issue is that the system is constantly checkpointing regardless of various kernel and postgres settings. Having read through most of the history of this list and most of the recommendations on various blogs, we have been unable to find an answer why the checkpoints are being written so slowly. Even when we disable all import processes or if index is dropped, the checkpoint is still taking > 1hour. Stats are pointing to checkpoint sizes of roughly 7Gb which should take < 1min even with full random reads; so even when imports are fully disabled, what is not making sense is why would the checkpointing be taking well over an hour?

One other thing that's noticed, but not measured, i.e. mostly anecdotal is that for a period of <1hr when postgres is restarted, the system performs mostly fine and checkpoints are completing in <5min; so it may be that after a while some (OS/postgres) buffers are filling up and causing this issue?

Full iostat/iotop, configuration, checkpoint stats, etc. are pasted below for completeness. Highlights are:

checkpoint_segments=512

shared_buffers=16GB

checkpoint_timeout=15min

checkpoint_completion_target=0.1

Regards,

Steve

---

Checkpoint stats:

db=# select * from pg_stat_bgwriter;

checkpoints_timed 6

checkpoints_req 3

checkpoint_write_time 26346184

checkpoint_sync_time 142

buffers_checkpoint 4227065

buffers_clean 4139841

maxwritten_clean 8261

buffers_backend 9128583

buffers_backend_fsync 0

buffers_alloc 9311478

stats_reset 2015-03-17 11:14:21.5649

---

postgres log file - checkpoint log entries:

2015-03-17 11:25:25 LOG: checkpoint complete: wrote 855754 buffers (40.8%); 0 transaction log file(s) added, 0 removed, 500 recycled; write=2988.185 s, sync=0.044 s, total=2988.331 s; sync files=110, longest=0.003 s, average=0.000 s

2015-03-17 11:25:25 LOG: checkpoint starting: xlog time

2015-03-17 11:59:54 LOG: parameter "checkpoint_completion_target" changed to "0.9"

2015-03-17 13:30:20 LOG: checkpoint complete: wrote 1012112 buffers (48.3%); 0 transaction log file(s) added, 0 removed, 512 recycled; write=7494.228 s, sync=0.021 s, total=7494.371 s; sync files=119, longest=0.001 s, average=0.000 s

2015-03-17 13:30:20 LOG: checkpoint starting: xlog time

2015-03-17 14:21:53 LOG: parameter "checkpoint_completion_target" changed to "0.1"

2015-03-17 16:00:58 LOG: checkpoint complete: wrote 1411979 buffers (67.3%); 0 transaction log file(s) added, 696 removed, 900 recycled; write=9036.898 s, sync=0.020 s, total=9038.538 s; sync files=109, longest=0.000 s, average=0.000 s

2015-03-17 16:00:58 LOG: checkpoint starting: time

2015-03-17 16:28:40 LOG: checkpoint complete: wrote 345183 buffers (16.5%); 0 transaction log file(s) added, 2001 removed, 0 recycled; write=1660.333 s, sync=0.018 s, total=1661.816 s; sync files=93, longest=0.002 s, average=0.000 s

2015-03-17 17:28:40 LOG: checkpoint starting: time

2015-03-17 18:54:47 LOG: checkpoint complete: wrote 602037 buffers (28.7%); 0 transaction log file(s) added, 0 removed, 500 recycled; write=5166.540 s, sync=0.039 s, total=5166.657 s; sync files=122, longest=0.003 s, average=0.000 s

2015-03-17 18:54:47 LOG: checkpoint starting: xlog time

---

iostat -x snapshot:

avg-cpu: %user %nice %system %iowait %steal %idle

0.50 0.00 2.35 15.09 0.00 82.05

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.00 0.00 0.00 5.00 0.00 2056.00 822.40 0.00 0.00 0.00 0.00 0.00 0.00

sdb 0.00 0.00 1055.00 549.00 41166.50 22840.00 79.81 5.28 3.28 4.94 0.10 0.62 100.00

---

vmstat 60 output

# vmstat 60

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

5 3 877508 1251152 74476 98853728 0 0 87 1891 0 0 1 5 92 2

6 5 877508 915044 74940 99237840 0 0 46588 41857 6993 41784 8 4 76 12

2 4 877508 1676008 75292 98577936 0 0 46847 34540 4778 17175 9 3 75 13

---

sysctl settings for dirty pages

vm.dirty_background_bytes = 0

vm.dirty_background_ratio = 5

vm.dirty_bytes = 0

vm.dirty_expire_centisecs = 3000

vm.dirty_ratio = 10

vm.dirty_writeback_centisecs = 500

---

# free -m

total used free shared buffers cached

Mem: 128905 126654 2250 0 70 95035

-/+ buffers/cache: 31549 97355

Swap: 15255 856 14399

---

postgres settings:

# cat postgresql.conf |grep check

checkpoint_segments = 512 # in logfile segments, min 1, 16MB each

checkpoint_timeout = 15min # range 30s-1h

checkpoint_completion_target = 0.1 # checkpoint target duration, 0.0 - 1.0

checkpoint_warning = 10min # 0 disables

log_checkpoints = on

shared_buffers = 16384MB

bgwriter_lru_maxpages = 500

#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round

wal_level = 'archive'

archive_mode = on

archive_command = 'cd .' # we can also use exit 0, anything that

max_wal_senders = 0

wal_keep_segments = 500

hot_standby = off

---

iotop snapshot:

Total DISK READ: 41.63 M/s | Total DISK WRITE: 31.43 M/s

TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND

32101 be/4 postgres 10.25 M/s 1085.86 K/s 0.00 % 96.80 % postgres: checkpointer process

56661 be/4 postgres 6.84 M/s 591.61 K/s 0.00 % 90.91 % postgres: dbauser db [local] COPY

56751 be/4 postgres 6.97 M/s 838.73 K/s 0.00 % 88.00 % postgres: dbauser db [local] COPY

56744 be/4 postgres 6.13 M/s 958.55 K/s 0.00 % 85.48 % postgres: dbauser db [local] COPY

56621 be/4 postgres 6.77 M/s 1288.05 K/s 0.00 % 83.96 % postgres: dbauser db [local] COPY

32102 be/4 postgres 8.05 M/s 1340.47 K/s 0.00 % 82.47 % postgres: writer process

1005 be/0 root 0.00 B/s 0.00 B/s 0.00 % 5.81 % [txg_sync]

32103 be/4 postgres 0.00 B/s 10.41 M/s 0.00 % 0.52 % postgres: wal writer process

---

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Very slow checkpoints