Обсуждение: Postgresql Host Swapping Hard With Abundant Free Memory

Поиск

Список

Период

Сортировка

Postgresql Host Swapping Hard With Abundant Free Memory

От

Christian Gough

Дата:

22 апреля 2015 г., 23:45:59

Greetings,

We have been having some pretty painful site outages related to heavy swapping, even though the system may appear to
haveas much as 10GB of free memory. 
It seems that as soon as the system hits ~22GB (of 32GB) of memory usage it starts to swap.  As soon as we go below
~22GB,swap is released.   

During the worst outages we see:

heavy swapping      (10-15GB)
heavy disk IO       (up to 600ms)
heavy CPU load: 4.0 (load over 100+ with 26 cores)
available memory:   (6-8GB)

Here's the host information, it's a dedicated physical host, not a VM.

ubuntu-14.04.1 LTS
Linux db004 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
24 cores, 32GB RAM, 32GB swapfile
RAID 10 with SSDs

Postgres version:

postgresql-9.3.5
postgis-2.1.4 extension

Here's our /etc/sysctl.conf:

kernel.shmmax = 16850395136
kernel.shmall = 4113866
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.swappiness = 0
vm.dirty_ratio = 10           ## maximum 3.2GB dirty cache size
vm.dirty_background_ratio = 5 ## ratio at which disk begins to flush cache (1.6GB)

We are setting our shared_buffers, work_mem and maintenance_work_mem very high which I suspected is reason for
swapping.
The huge mystery to me is that during heavy swap there appears to be ~8-10GB free memory in cache.
We wondered whether that unused memory in cache was being consumed by our memory settings in postgresql.conf:

shared_buffers = 8GB          ## 25% of system memory (32GB)
maintenance_work_mem = 8GB    ## autovacuuming on, 3 workers (defaults)
work_mem = 256MB              ## we have as many as 170 connections

We have average around 170 connections and have been moving everything to pgbouncer to reduce this count.
Rough estimate of cost with work_mem set to 256MB: 170 x 256MB = 43510MB (43GB)

Is it possible that the high work_mem setting is causing the connections to hold on to the extra available memory?
This is our assumption so we plan on dialing down work_mem and maintenance_work_mem to sane values.
We have also ordered more RAM (increase from 32GB to 96GB) but I would like to understand what is happening.

Here are the values we are going to change:

maintenance_work_mem = 1GB
work_mem = 64MB

One last thing, we disabled THP (tranparent huge pages) because we were seeing compaction errors:

[db003.prod:~] root% egrep 'compact_(fail|stall)' /proc/vmstat
compact_stall 34682729
compact_fail 32915396

Here are our competing theories of why we are swapping as much as 10-15GB with 6GB-10GB of free memory:

1.) maintenance_work_mem and work_mem are set too high causing postgres to allocate too much memory
2.) The 6GB - 10GB of free memory observed during swapping is postgresql's shared buffer
3.) Linux filesystem caching is tuned incorrectly, maybe SSD related?
4.) This is a NUMA related issue similar to the Mysql Swap Insanity issue:

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html
http://www.postgresql.org/message-id/CAGTBQpacrSDcN10rTwRbH+AGm2_y0Qao6CJDoyvEp504iFbdrw@mail.gmail.com

Note, we zone_reclaim_mode appears disabled on our database host.

root% cat /proc/sys/vm/zone_reclaim_mode
0

Hoping someone here can help us sort this out because it's a huge mystery for us.
We are going to add more RAM but I'm trying to understand what is happening.

Thank you kindly.

Christian Gough

Re: Postgresql Host Swapping Hard With Abundant Free Memory

От

Jim Nasby

Дата:

23 апреля 2015 г., 02:33:21

On 4/17/15 1:25 PM, Christian Gough wrote:
> Here are our competing theories of why we are swapping as much as 10-15GB with 6GB-10GB of free memory:

How are you determining that you're swapping?
How are you measuring 'free memory'?

> 1.) maintenance_work_mem and work_mem are set too high causing postgres to allocate too much memory

There's not much that uses maintenance_work_mem. (AUTO)VACUUM, CREATE
INDEX... I think that's it. Also, I believe it's internally limited to 1G.

As for work_mem, it's hard to say. As your math shows, if *every*
connection suddenly allocated a full work_mem then you'd be in trouble.
But that would be a pretty extreme situation. Even though a backend can
actually have multiple work_mem consuming operations in use at once, so
it could theoretically use several times work_mem, in reality I think
it's very rare for every backend to use even a fraction of work_mem.

But, there's no need to theorize here; work_mem will be reported as
allocated to each backend and will not be considered free memory by the OS.

> 2.) The 6GB - 10GB of free memory observed during swapping is postgresql's shared buffer

If anything, I'd expect the OS to avoid swapping a shared memory segment
(if not flat-out refuse to swap it).

Shared memory certainly shouldn't be counted as free either. The only
tricky bit here is many OSes will report shared memory as part of the
memory footprint *for each backend*. So if you have 170 backends and 8GB
shared buffers, it could look like you're using 13.6TB of memory (which
you're obviously not).

> 3.) Linux filesystem caching is tuned incorrectly, maybe SSD related?

Possibly, though I don't think it'd be SSD related...

> 4.) This is a NUMA related issue similar to the Mysql Swap Insanity issue:

Since you have that turned off, I don't think so.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Postgresql Host Swapping Hard With Abundant Free Memory

Postgresql Host Swapping Hard With Abundant Free Memory

Re: Postgresql Host Swapping Hard With Abundant Free Memory