The following bug has been logged on the website:
Bug reference: 13962
Logged by: Nick Bales
Email address: nick.bales@rackspace.com
PostgreSQL version: 9.3.9
Operating system: CentOS 6.6
Description:
On several of my postgres 9.3.x clusters with streaming replication, the
number of transaction logs(files in pg_xlog, not the archived logs) on the
standby nodes are growing past the theoretical maximum(wal_keep_segments + 2
* checkpoint_segments + 1). Most are appropriately getting recycled at each
restartpoint, as can be seed with the log entry like:
restartpoint complete: wrote 128 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 1 recycled
However, at semi-regular intervals, no xlogs are added, removed, or
recycled:
restartpoint complete: wrote 135 buffers (0.0%); 0 transaction log file(s)
added, 0 removed, 0 recycled
The number of orphaned files for a period directly matches the number of log
occurrences where no files are recycled during a restartpoint. There is
also no log that a new transaction log file is added, even though the number
of xlogs in the directory is growing. Furthermore, once a file is orphaned,
it sticks around for the life of the system, ultimately leading to
exhausting all disk space on long running systems.
Relevant settings:
wal_level = hot_standby
wal_keep_segments = 200
checkpoint_segments = 128
checkpoint_timeout = 5min
checkpoint_completion_target = 0.7
checkpoint_warning = 30s
log_checkpoints = on