Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

Поиск

Список

Период

Сортировка

От	pinker
Тема	Re: [GENERAL] core system is getting unresponsive because over 300 cpu load
Дата	11 октября 2017 г. 04:28:52
Msg-id	1507674532671-0.post@n3.nabble.com обсуждение исходный текст
Ответ на	Re: [GENERAL] core system is getting unresponsive because over 300cpu load (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: [GENERAL] core system is getting unresponsive because over 300cpu load Re: [GENERAL] core system is getting unresponsive because over 300cpu load Re: [GENERAL] core system is getting unresponsive because over 300cpu load
Список	pgsql-general

Дерево обсуждения

Tomas Vondra-4 wrote
> What is "CPU load"? Perhaps you mean "load average"?

Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the
graph from yesterday's failure (after 6p.m.):
<http://www.postgresql-archive.org/file/t342733/cpu.png> 
So as one can see connections spikes follow cpu spikes...


Tomas Vondra-4 wrote
> Also, what are the basic system parameters (number of cores, RAM), it's
> difficult to help without knowing that.

I have actually written everything in the first post:
80 CPU and 4 sockets
over 500GB RAM


Tomas Vondra-4 wrote
> Well, 3M transactions over ~2h period is just ~450tps, so nothing
> extreme. Not sure how large the transactions are, of course.

It's quite a lot going on. Most of them are complicated stored procedures.


Tomas Vondra-4 wrote
> Something gets executed on the database. We have no idea what it is, but
> it should be in the system logs. And you should see the process in 'top'
> with large amounts of virtual memory ...

Yes, it would be much easier if it would be just single query from the top,
but the most cpu is eaten by the system itself and I'm not sure why. I
suppose because of page tables size and anon pages is NUMA related.



Tomas Vondra-4 wrote
> Another possibility is a run-away query that consumes a lot of work_mem.

It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
stored procedures with unnecessary WITH clauses (i.e. materialization) and
right after it IN query with results of that (hash).



Tomas Vondra-4 wrote
> Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
> and then you can decide.

Thank you for the tip. I always do it but haven't here,  so the result is
0.992969610990056 - so increasing it is rather pointless.


Tomas Vondra-4 wrote
> You may also make the bgwriter more aggressive - that won't really
> improve the hit ratio, it will only make enough room for the backends.

yes i probably will


Tomas Vondra-4 wrote
> But I don't quite see how this could cause the severe problems you have,
> as I assume this is kinda regular behavior on that system. Hard to say
> without more data.

I can provide you with any data you need :)


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее

От: Victor Yegorov
Дата: 11 октября 2017 г., 04:20:06
Сообщение: Re: [GENERAL] core system is getting unresponsive because over 300cpu load

Следующее

От: John R Pierce
Дата: 11 октября 2017 г., 04:41:26
Сообщение: Re: [GENERAL] core system is getting unresponsive because over 300cpu load

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

Предыдущее

Следующее