Обсуждение: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

Поиск
Список
Период
Сортировка

ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

От
Montana Low
Дата:
I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel 3.16.3. I receive numerous Error: out of memory messages in the log, which are aborting client requests, even though there appears to be 23GB available in the OS cache.

There is no swap on the box. Postgres is behind pgbouncer to protect from the 200 real clients, which limits connections to 32, although there are rarely more than 20 active connections, even though postgres max_connections is set very high for historic reasons. There is also a 4GB java process running on the box.




relevant postgresql.conf:

max_connections = 1000                  # (change requires restart)
shared_buffers = 7GB                    # min 128kB
work_mem = 40MB                         # min 64kB
maintenance_work_mem = 1GB              # min 1MB
effective_cache_size = 20GB



sysctl.conf:

vm.swappiness = 0
vm.overcommit_memory = 2
kernel.shmmax=34359738368
kernel.shmall=8388608



log example:

ERROR:  out of memory
DETAIL:  Failed on request of size 67108864.
STATEMENT:  SELECT  "package_texts".* FROM "package_texts"  WHERE "package_texts"."id" = $1 LIMIT 1



example pg_top, showing 23GB available in cache:

last pid:  6607;  load avg:  3.59,  2.32,  2.61;       up 16+09:17:29 20:49:51
18 processes: 1 running, 17 sleeping
CPU states: 22.5% user,  0.0% nice,  4.9% system, 63.2% idle,  9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps,  1 rollbs/s, 217 buffer r/s, 99 hit%,  11994 row r/s, 3820 row w/s  
DB I/O:     0 reads/s,     0 KB/s,     0 writes/s,     0 KB/s  
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap:



example top showing the only other significant 4GB process on the box:

top - 21:05:09 up 16 days,  9:32,  2 users,  load average: 2.73, 2.91, 2.88
Tasks: 147 total,   3 running, 244 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.1 us,  4.1 sy,  0.0 ni, 62.9 id,  9.8 wa,  0.0 hi,  0.7 si,  0.3 st
KiB Mem:  30827220 total, 30642584 used,   184636 free,     7292 buffers
KiB Swap:        0 total,        0 used,        0 free. 23449636 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                   7407 postgres  20   0 7604928  10172   7932 S  29.6  0.0   2:51.27 postgres
10469 postgres  20   0 7617716 176032 160328 R  11.6  0.6   0:01.48 postgres
10211 postgres  20   0 7630352 237736 208704 S  10.6  0.8   0:03.64 postgres
18202 elastic+  20   0 8726984 4.223g   4248 S   9.6 14.4 883:06.79 java
9711 postgres  20   0 7619500 354188 335856 S   7.0  1.1   0:08.03 postgres
3638 postgres  20   0 7634552 1.162g 1.127g S   6.6  4.0   0:50.42 postgres
 

Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

От
Tom Lane
Дата:
Montana Low <montanalow@gmail.com> writes:
> I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel 3.16.3.
> I receive numerous Error: out of memory messages in the log, which are
> aborting client requests, even though there appears to be 23GB available in
> the OS cache.

Perhaps the postmaster is being started with a ulimit setting that
restricts process size?

            regards, tom lane


Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

От
"Tomas Vondra"
Дата:
Dne 22 Říjen 2014, 0:25, Montana Low napsal(a):
> I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel
> 3.16.3.
> I receive numerous Error: out of memory messages in the log, which are
> aborting client requests, even though there appears to be 23GB available
> in
> the OS cache.
>
> There is no swap on the box. Postgres is behind pgbouncer to protect from
> the 200 real clients, which limits connections to 32, although there are
> rarely more than 20 active connections, even though postgres
> max_connections is set very high for historic reasons. There is also a 4GB
> java process running on the box.
>
>
>
>
> relevant postgresql.conf:
>
> max_connections = 1000                  # (change requires restart)
> shared_buffers = 7GB                    # min 128kB
> work_mem = 40MB                         # min 64kB
> maintenance_work_mem = 1GB              # min 1MB
> effective_cache_size = 20GB
>
>
>
> sysctl.conf:
>
> vm.swappiness = 0
> vm.overcommit_memory = 2

This means you have 'no overcommit', so the amount of memory is limited by
overcommit_ratio + swap. The default value for overcommit_ratio is 50%
RAM, and as you have no swap that effectively means only 50% of the RAM is
available to the system.

If you want to verify this, check /proc/meminfo - see the lines
CommitLimit (the current limit) and Commited_AS (committed address space).
Once the committed_as reaches the limit, it's game over.

There are different ways to fix this, or at least improve that:

(1) increasing the overcommit_ratio (clearly, 50% is way too low -
something 90% might be more appropriate on 30GB RAM without swap)

(2) adding swap (say a small ephemeral drive, with swappiness=10 or
something like that)

Tomas



Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

От
Montana Low
Дата:
I didn't realize that about overcommit_ratio. It was at 50, I've changed it to 95. I'll see if that clears up the problem moving forward.

# cat /proc/meminfo
MemTotal:       30827220 kB
MemFree:          153524 kB
MemAvailable:   17941864 kB
Buffers:            6188 kB
Cached:         24560208 kB
SwapCached:            0 kB
Active:         20971256 kB
Inactive:        8538660 kB
Active(anon):   12460680 kB
Inactive(anon):    36612 kB
Active(file):    8510576 kB
Inactive(file):  8502048 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             50088 kB
Writeback:           160 kB
AnonPages:       4943740 kB
Mapped:          7571496 kB
Shmem:           7553176 kB
Slab:             886428 kB
SReclaimable:     858936 kB
SUnreclaim:        27492 kB
KernelStack:        4208 kB
PageTables:       188352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15413608 kB
Committed_AS:   14690544 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       59012 kB
VmallocChunk:   34359642367 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    31465472 kB
DirectMap2M:           0 kB



# sysctl -a:

vm.admin_reserve_kbytes = 8192

vm.block_dump = 0

vm.dirty_background_bytes = 0

vm.dirty_background_ratio = 10

vm.dirty_bytes = 0

vm.dirty_expire_centisecs = 3000

vm.dirty_ratio = 20

vm.dirty_writeback_centisecs = 500

vm.drop_caches = 0

vm.extfrag_threshold = 500

vm.hugepages_treat_as_movable = 0

vm.hugetlb_shm_group = 0

vm.laptop_mode = 0

vm.legacy_va_layout = 0

vm.lowmem_reserve_ratio = 256 256 32

vm.max_map_count = 65530

vm.min_free_kbytes = 22207

vm.min_slab_ratio = 5

vm.min_unmapped_ratio = 1

vm.mmap_min_addr = 4096

vm.nr_hugepages = 0

vm.nr_hugepages_mempolicy = 0

vm.nr_overcommit_hugepages = 0

vm.nr_pdflush_threads = 0

vm.numa_zonelist_order = default

vm.oom_dump_tasks = 1

vm.oom_kill_allocating_task = 0

vm.overcommit_kbytes = 0

vm.overcommit_memory = 2

vm.overcommit_ratio = 50

vm.page-cluster = 3

vm.panic_on_oom = 0

vm.percpu_pagelist_fraction = 0

vm.scan_unevictable_pages = 0

vm.stat_interval = 1

vm.swappiness = 0

vm.user_reserve_kbytes = 131072

vm.vfs_cache_pressure = 100

vm.zone_reclaim_mode = 0






On Tue, Oct 21, 2014 at 3:46 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> Dne 22 Říjen 2014, 0:25, Montana Low napsal(a):
> > I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel
> > 3.16.3.
> > I receive numerous Error: out of memory messages in the log, which are
> > aborting client requests, even though there appears to be 23GB available
> > in
> > the OS cache.
> >
> > There is no swap on the box. Postgres is behind pgbouncer to protect from
> > the 200 real clients, which limits connections to 32, although there are
> > rarely more than 20 active connections, even though postgres
> > max_connections is set very high for historic reasons. There is also a 4GB
> > java process running on the box.
> >
> >
> >
> >
> > relevant postgresql.conf:
> >
> > max_connections = 1000                  # (change requires restart)
> > shared_buffers = 7GB                    # min 128kB
> > work_mem = 40MB                         # min 64kB
> > maintenance_work_mem = 1GB              # min 1MB
> > effective_cache_size = 20GB
> >
> >
> >
> > sysctl.conf:
> >
> > vm.swappiness = 0
> > vm.overcommit_memory = 2
>
> This means you have 'no overcommit', so the amount of memory is limited by
> overcommit_ratio + swap. The default value for overcommit_ratio is 50%
> RAM, and as you have no swap that effectively means only 50% of the RAM is
> available to the system.
>
> If you want to verify this, check /proc/meminfo - see the lines
> CommitLimit (the current limit) and Commited_AS (committed address space).
> Once the committed_as reaches the limit, it's game over.
>
> There are different ways to fix this, or at least improve that:
>
> (1) increasing the overcommit_ratio (clearly, 50% is way too low -
> something 90% might be more appropriate on 30GB RAM without swap)
>
> (2) adding swap (say a small ephemeral drive, with swappiness=10 or
> something like that)
>
> Tomas
>

Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

От
Montana Low
Дата:
increasing overcommit_ratio to 95 solved the problem, the box is now using it's memory as expected without needing to resort to swap.

On Tue, Oct 21, 2014 at 3:55 PM, Montana Low <montanalow@gmail.com> wrote:
I didn't realize that about overcommit_ratio. It was at 50, I've changed it to 95. I'll see if that clears up the problem moving forward.

# cat /proc/meminfo
MemTotal:       30827220 kB
MemFree:          153524 kB
MemAvailable:   17941864 kB
Buffers:            6188 kB
Cached:         24560208 kB
SwapCached:            0 kB
Active:         20971256 kB
Inactive:        8538660 kB
Active(anon):   12460680 kB
Inactive(anon):    36612 kB
Active(file):    8510576 kB
Inactive(file):  8502048 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             50088 kB
Writeback:           160 kB
AnonPages:       4943740 kB
Mapped:          7571496 kB
Shmem:           7553176 kB
Slab:             886428 kB
SReclaimable:     858936 kB
SUnreclaim:        27492 kB
KernelStack:        4208 kB
PageTables:       188352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15413608 kB
Committed_AS:   14690544 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       59012 kB
VmallocChunk:   34359642367 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    31465472 kB
DirectMap2M:           0 kB



# sysctl -a:

vm.admin_reserve_kbytes = 8192

vm.block_dump = 0

vm.dirty_background_bytes = 0

vm.dirty_background_ratio = 10

vm.dirty_bytes = 0

vm.dirty_expire_centisecs = 3000

vm.dirty_ratio = 20

vm.dirty_writeback_centisecs = 500

vm.drop_caches = 0

vm.extfrag_threshold = 500

vm.hugepages_treat_as_movable = 0

vm.hugetlb_shm_group = 0

vm.laptop_mode = 0

vm.legacy_va_layout = 0

vm.lowmem_reserve_ratio = 256 256 32

vm.max_map_count = 65530

vm.min_free_kbytes = 22207

vm.min_slab_ratio = 5

vm.min_unmapped_ratio = 1

vm.mmap_min_addr = 4096

vm.nr_hugepages = 0

vm.nr_hugepages_mempolicy = 0

vm.nr_overcommit_hugepages = 0

vm.nr_pdflush_threads = 0

vm.numa_zonelist_order = default

vm.oom_dump_tasks = 1

vm.oom_kill_allocating_task = 0

vm.overcommit_kbytes = 0

vm.overcommit_memory = 2

vm.overcommit_ratio = 50

vm.page-cluster = 3

vm.panic_on_oom = 0

vm.percpu_pagelist_fraction = 0

vm.scan_unevictable_pages = 0

vm.stat_interval = 1

vm.swappiness = 0

vm.user_reserve_kbytes = 131072

vm.vfs_cache_pressure = 100

vm.zone_reclaim_mode = 0






On Tue, Oct 21, 2014 at 3:46 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
> Dne 22 Říjen 2014, 0:25, Montana Low napsal(a):
> > I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel
> > 3.16.3.
> > I receive numerous Error: out of memory messages in the log, which are
> > aborting client requests, even though there appears to be 23GB available
> > in
> > the OS cache.
> >
> > There is no swap on the box. Postgres is behind pgbouncer to protect from
> > the 200 real clients, which limits connections to 32, although there are
> > rarely more than 20 active connections, even though postgres
> > max_connections is set very high for historic reasons. There is also a 4GB
> > java process running on the box.
> >
> >
> >
> >
> > relevant postgresql.conf:
> >
> > max_connections = 1000                  # (change requires restart)
> > shared_buffers = 7GB                    # min 128kB
> > work_mem = 40MB                         # min 64kB
> > maintenance_work_mem = 1GB              # min 1MB
> > effective_cache_size = 20GB
> >
> >
> >
> > sysctl.conf:
> >
> > vm.swappiness = 0
> > vm.overcommit_memory = 2
>
> This means you have 'no overcommit', so the amount of memory is limited by
> overcommit_ratio + swap. The default value for overcommit_ratio is 50%
> RAM, and as you have no swap that effectively means only 50% of the RAM is
> available to the system.
>
> If you want to verify this, check /proc/meminfo - see the lines
> CommitLimit (the current limit) and Commited_AS (committed address space).
> Once the committed_as reaches the limit, it's game over.
>
> There are different ways to fix this, or at least improve that:
>
> (1) increasing the overcommit_ratio (clearly, 50% is way too low -
> something 90% might be more appropriate on 30GB RAM without swap)
>
> (2) adding swap (say a small ephemeral drive, with swappiness=10 or
> something like that)
>
> Tomas
>