Memory leak (possibly connected to postgis) leading to server crash

Поиск

Список

Период

Сортировка

От	Roman Cervenak
Тема	Memory leak (possibly connected to postgis) leading to server crash
Дата	6 декабря 2019 г. 09:22:42
Msg-id	CAGjExY1C40d=cqbf6nfeNDCLWk57MhFu3mgR0+KRPb8+_uW_KA@mail.gmail.com обсуждение исходный текст
Ответы	Re: Memory leak (possibly connected to postgis) leading to servercrash
Список	pgsql-bugs

Дерево обсуждения

Hello,

this may be a serious issue and I am not very experienced with reporting this kind of stuff, I will try my best. I believe there may be a memory leak somewhere in postgres/postgis, which can eat up all the available memory, leading to postgres crash (signal 9 kill of backend, postmaster terminating others, recovery).

My setup and workload:

Postgres 12.1 (Ubuntu 12.1-1.pgdg18.04+1) installed via apt on Ubuntu server 18.04 LTS, in VM in Azure cloud.

PostGIS version: 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

VM has 8cores, 56 GB RAM, 7 TB RAID from managed disks (azure network storage)

shared_buffers = 12GB

work_mem = 256MB

maintenance_work_mem = 2GB

I was running 8 clients (on different VMs, over network) using the database, batch processing geographic stuff. Each worker is running one query at a time, so pretty much 8 parallel queries 100% of the time. Queries themselves are fairly short (5-60 seconds), SELECTing rows by geometry index (GIST) and doing stuff like ST_SimplifyPreserveTopology, ST_Buffer, ST_Union on them, essentially using all 8 cores to 100%. There is a lot of things to process, so this was going on for maybe 12 hours, when I noticed (in "htop") that memory usage is unusually high, somewhere about 50 GB. It was suspicious, because it is way higher than it should be with 12 GB shared buffers and 256MB work_mem with 8 clients, and it continued to grow slowly (I could see numbers in RES column in htop slowly rise for all backends). Until eventually:

2019-12-06 00:04:24 UTC [21487-8] LOG: server process (PID 52059) was terminated by signal 9: Killed

2019-12-06 00:04:24 UTC [21487-10] LOG: terminating any other active server processes
2019-12-06 00:04:24 UTC [91091-1] WARNING: terminating connection because of crash of another server process
2019-12-06 00:04:24 UTC [91091-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

...

FATAL: the database system is in recovery mode

...

I realize, this is probably not enough for you. But I will be running this workload again soon, so if this repeats, what should I do to help you diagnose this?

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Memory leak (possibly connected to postgis) leading to server crash