On Fri, Nov 03, 2017 at 01:43:32AM +0000, tao tony wrote:
> I had an asynchronous steaming replication HA cluster.Each node had 64G memory.pg is 9.6.2 and deployed on centos 6.
>
> Last month the database was killed by OS kernel for OOM,the checkpoint process was killed.
If you still have logs, was it killed during a large query? Perhaps one using
a hash aggregate?
> I noticed checkpoint process occupied memory for more than 20GB,and it was growing everyday.In the hot-standby
node,therecovering process occupied memory as big as checkpoint process.
"resident" RAM of a postgres subprocess is often just be the fraction of
shared_buffers it's read/written. checkpointer must necessarily read all dirty
pages from s-b and write out to disk (by way of page cache), so that's why its
RSS is nearly 32GB. And the recovery process is continuously writing into s-b.
> Now In the standby node,checkpoint and recovering process used more then 50GB memory as below,and I worried someday
thecluster would be killed by OS again.
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: startup process recovering
00000004000008550000004B
> 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: checkpointer process
>
> shared_buffers = 32GB
Also, what is work_mem ?
Justin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general