Re: Cannot allocate memory

Поиск
Список
Период
Сортировка
От JOIGNY Michael @Neteven
Тема Re: Cannot allocate memory
Дата
Msg-id e47eec7f-9dbc-cbe3-dd01-32cbee90f332@neteven.com
обсуждение исходный текст
Ответ на Re: Cannot allocate memory  (Hannah Huang <hannah.huang.y@gmail.com>)
Список pgsql-admin

Hi Hannah,

For the read load, this comes from the fact that during the migration, a vacuum was made (backup / restore).

Execution times are basically the same, no major difference.

We don't see any major degradation in performance but rather random memory errors when we get to around ~ 220GB / 377GB memory usage on our server. On average we are at 166GB used.

Here is an example below, memory screenshot and postgres logs at the same time :

memory

We see the first errors of 11:08 a.m. once the first ram peak then around 11:44 a.m. and 11:51 a.m.

2020-09-24 11:08:16.087 CEST [54111]: [2-1] user=,db=,app=,client= DETAIL:  Failed on request of size 2048 in memory context "CacheMemoryContext".                                                                
2020-09-24 11:08:16.088 CEST [54109]: [2-1] user=,db=,app=,client= DETAIL:  Failed on request of size 85408 in memory context "TopMemoryContext".                                                                 
2020-09-24 11:08:16.102 CEST [206338]: [245-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".                                      
2020-09-24 11:08:52.331 CEST [161313]: [62195-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 50331648 in memory context "ExecutorState".                       
2020-09-24 11:08:52.416 CEST [38548]: [18-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 20 in memory context "MessageContext".                                        
2020-09-24 11:09:23.532 CEST [165335]: [146-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 4 in memory context "MessageContext".                                       
2020-09-24 11:09:26.541 CEST [25318]: [130-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".                                       
2020-09-24 11:09:30.013 CEST [57650]: [274-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".
2020-09-24 11:09:33.333 CEST [28889]: [57-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".
2020-09-24 11:09:33.344 CEST [138441]: [75422-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 100663296 in memory context "ExecutorState".


2020-09-24 11:40:42.947 CEST [278349]: [224-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while creating memory context "ExprContext".                                                  
2020-09-24 11:40:42.957 CEST [67802]: [69-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".                                       
2020-09-24 11:40:42.982 CEST [147716]: [86655-1] user=yy,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 50331648 in memory context "ExecutorState".
2020-09-24 11:41:41.390 CEST [29148]: [130-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".            
2020-09-24 11:41:46.677 CEST [68287]: [61-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".             
2020-09-24 11:41:51.623 CEST [139332]: [24-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".
2020-09-24 11:41:56.466 CEST [103039]: [87-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".            
2020-09-24 11:43:27.745 CEST [103028]: [289-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".           
2020-09-24 11:43:34.057 CEST [84831]: [44-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 48 in memory context "MessageContext".             
2020-09-24 11:43:40.780 CEST [43096]: [56-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".             
2020-09-24 11:43:47.387 CEST [99915]: [106-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".            
2020-09-24 11:44:34.639 CEST [101187]: [87-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".            
2020-09-24 11:44:59.846 CEST [99961]: [106-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".            
2020-09-24 11:44:59.858 CEST [182506]: [549-1] user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 24 in memory context "MessageContext".

2020-09-24 11:50:58.903 CEST [276980]: [165-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while creating memory context "ExprContext".           
2020-09-24 11:51:07.226 CEST [67666]: [158-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed while creating memory context "ExprContext".            
2020-09-24 11:51:16.258 CEST [30638]: [187-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 8344 in memory context "ExecutorState".
2020-09-24 11:51:16.261 CEST [214950]: [8-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 8344 in memory context "ExecutorState".
2020-09-24 11:51:16.745 CEST [67802]: [89-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".
2020-09-24 11:51:17.265 CEST [196961]: [27-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 16 in memory context "MessageContext".
2020-09-24 11:51:17.266 CEST [182506]: [31574-1] user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".
2020-09-24 11:51:17.692 CEST [278113]: [126-1] user=xx,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 216 in memory context "MessageContext".
2020-09-24 11:51:19.293 CEST [215439]: [2-1] user=,db=,app=,client= DETAIL:  Failed on request of size 85408 in memory context "TopMemoryContext".
2020-09-24 11:51:19.298 CEST [215438]: [2-1] user=,db=,app=,client= DETAIL:  Failed while creating memory context "dynahash".
2020-09-24 11:51:19.344 CEST [182506]: [31603-1] user=uu_fr,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 40 in memory context "MessageContext".
2020-09-24 11:51:20.218 CEST [215488]: [2-1] user=,db=,app=,client= DETAIL:  Failed on request of size 85408 in memory context "TopMemoryContext".
2020-09-24 11:51:20.229 CEST [215485]: [2-1] user=,db=,app=,client= DETAIL:  Failed on request of size 424 in memory context "CacheMemoryContext".
2020-09-24 11:51:21.233 CEST [215513]: [8-1] user=uu,db=mydb,app=[unknown],client=localhost DETAIL:  Failed on request of size 8344 in memory context "unnamed prepared statement".
2020-09-24 11:51:21.297 CEST [215524]: [2-1] user=,db=,app=,client= DETAIL:  Failed while creating memory context "ExprContext".



I have a feeling that postgres memory is limited, while there is still usable memory left on the server.

Regards.

Le 23/09/2020 à 10:00, Hannah Huang a écrit :


On 23 Sep 2020, at 2:14 pm, Hannah Huang <hannah.huang.y@gmail.com> wrote:



On 23 Sep 2020, at 12:59 am, JOIGNY Michael @Neteven <mjoigny@neteven.com> wrote:

Hi Hannah,

Thanks for your feedback.

Here are the CPU / IO graphs over the same period.

You will find on the graphs the annotation which specifies the moment of the migration.


<cpu.png>

<io_requests.png>

<io_bytes.png>

<io_time.png>


Le 22/09/2020 à 15:10, Hannah Huang a écrit :


On 22 Sep 2020, at 6:20 pm, JOIGNY Michael @Neteven <mjoigny@neteven.com> wrote:

Hi,

I would like to come back to you about my memory problem on postgres 12.

We had the same configuration under postgres 11.8, we disabled JIT (enabled by default under postgres 12) for segfault problems.

To illustrate the change in memory behavior, here is a screenshot before / after migration :
<memory.png>


Do you have any idea what could change the behavior in this way? another parameter enabled by default under postgres 12 like JIT?

Regards.

Le 18/09/2020 à 12:03, JOIGNY Michael @Neteven a écrit :

Hi Community,

I'm asking for your lights because i'm having memory problems with postgres.

Examples of logs :

FATAL: could not fork new process for connection: Cannot allocate memory could not fork new process for connection: Cannot allocate memory
out of memory  DETAIL:  Failed on request of size 32800 in memory context "HashBatchContext".
out of memory DETAIL Failed on request of size 288 in memory context "CacheMemoryContext".


We use postgresql (primary/standby) with pgbouncer as a pooler, and repmgr as replication manager.

We have ~ 2000 connections at the same time with ~ 20/30 are active. (we need to set a high number of connexion on postgres, because our app uses a lot of different users, and each user on each app server needs multiple and constant connexions).

Here is my configuration :

system :
Debian : 9.13
Memory : 380 Go
Postgres : 12.4-1.pgdg90+1
Pgbouncer : 1.14
kernel.shmmax = 202591600640
kernel.shmall = 49460840

postgres :

dynamic_shared_memory_type = posix    # the default is the first option
max_connections = 2600            # (change requires restart)
work_mem = 96MB                # min 64kB
maintenance_work_mem = 8GB        # min 1MB
shared_buffers = 64GB            # min 128kB
temp_buffers = 32MB            # min 800kB
wal_buffers = 16MB            # min 32kB, -1 sets based on shared_buffers
effective_cache_size = 270GB

pgbouncer :

max_client_conn = 6000
default_pool_size = 2590
reserve_pool_size = 5
pool_mode = session

Do you think that our parameters are not correct compared to our configuration? Do you have an idea ?

Best regards.

Michael.

System settings: 
After 9.2, PostgreSQL switched to POSIX shared memory. So now it requires fewer bytes of System V shared memory. 

You don’t need to configure SHMMAX to that high value which is ~188GB, same as SHMALL, I would set them back to default value. 

But, from the second email you sent it seems like an issue related to the version upgrade (from 11 to 12). So I don’t think you should change OS parameters at this moment. I would suggest you provide more stats on CPU (system, user, wa,ni) and disk IO - before and after. 

Thanks, Hannah

I can see:

CPU stats: CPU it spends more time on user job and the wait time on IO decreases, time spent on kernal space is about same as before. 
IO stats: read drops dramatically, write slightly increases. This explains why CPU wait on IO decreases.

I would like to know how is the performance of application queries. For example, the before and after execution time of the most time consuming queries. Are they still about the same? Do you have any database performance analytics tools installed?  I used to use pgbadger to analyze  database logs which would show me the past performance statistics of the database. We now have an understanding about how the OS performance looks like.  It would be good to have a general picture of the database performance from the db layer.

Thanks, 
Hannah





Continue with my question regards application performance - Does database experience performance degradation? Or just simply more failed SQL with the particular memory error thrown out? 

Further question:

How often does the memory errors occur? Are they all caused by one particular SQL or a certain type of SQL? 

Would you mind sending out the log within a particular time range?




Thanks,
Suya

-- 
Cordialement.
--------------------

NETEVEN
Michael JOIGNY
Service Exploitation - Ingénieur Systèmes
190 Avenue Jean Jaures - 75019 Paris
mjoigny@neteven.com / adminsys@neteven.com
www.neteven.com
Вложения

В списке pgsql-admin по дате отправления:

Предыдущее
От: Ganesh Korde
Дата:
Сообщение: Re: The default database account can be accessed without a password
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: HIPAA Business Associate Agreement (BAA)