Обсуждение: [GENERAL] core system is getting unresponsive because over 300 cpu load
Hi to all! We've got problem with a very serious repetitive incident on our core system. Namely, cpu load spikes to 300-400 and the whole db becomes unresponsive. From db point of view nothing special is happening, memory looks fine, disks io's are ok and the only problem is huge cpu load. Kernel parameters that are increasing with load are always the same: * page tables size * Committed_AS * Active anon <http://www.postgresql-archive.org/file/t342733/pagetables.png> and the total number of connections are increasing very fast (but I suppose it's the symptom not the root cause of cpu load) and exceed max_connections (1000). System: * CentOS Linux release 7.2.1511 (Core) * Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux * postgresql95-9.5.5-1PGDG.rhel7.x86_64 * postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64 * postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64 * postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64 * postgresql95-server-9.5.5-1PGDG.rhel7.x86_64 * 4 sockets/80 cores * vm.dirty_background_bytes = 0 * vm.dirty_background_ratio = 2 * vm.dirty_bytes = 0 * vm.dirty_expire_centisecs = 3000 * vm.dirty_ratio = 20 * vm.dirty_writeback_centisecs = 500 after the first incident we have changed: * increased shared_buffers to 16GB (completely on huge pages. previously 2GB) * adjusted vm.nr_hugepages to 8000 (we've got 2mb pages) * changed vm.overcommit_memory = 2 and vm.overcommit_ratio = 99 * disabled transparent huge pages (they were set before unfortunately to 'always') It's a highly transactional db. Today I've run: select now(), txid_current(); and the results: 3 339 351 transactions between 2017-10-10 14:42 and 2017-10-10 16:24 * db size 1,1TB * RAM over 500GB * biggest tables (the rest isn't big): 369 GB 48 GB 48 GB 34 GB 23 GB 19 GB 19 GB 17 GB 16 GB 12 GB 9910 MB We have captured some of db statistics, for instance bgwriter and buffercache. Today the load spides happened at: 1). 10:44 2). 11:04 (and then several times during a day) The premiere was yesterday about 6PM. What we observed back then was for instance autovacuum process to prevent wraparound on the biggest table (369GB). We did vacuum freeze manually after this happened but before that we gathered statistics with the query: SELECT oid::regclass::text AS table, age(relfrozenxid) AS xid_age, mxid_age(relminmxid) AS mxid_age, least( (SELECT setting::int FROM pg_settings WHERE name = 'autovacuum_freeze_max_age') - age(relfrozenxid), (SELECT setting::int FROM pg_settings WHERE name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid) ) AS tx_before_wraparound_vacuum, pg_size_pretty(pg_total_relation_size(oid)) AS size, pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum FROM pg_class ORDER BY tx_before_wraparound_vacuum; and the biggest table which was vacuumed looked like: 217310511 8156548 -17310511 369 GB 2017-09-30 01:57:33.972068+02 So, from the kernel stats we know that the failure happens when db is trying to alocate some huge amount of pages (page tables size, anons, commited_as). But what is triggering this situation? I suppose it could be lazy autovacuum (just standard settings). So autovacuum had to read whole 369gb yesterday to clean xids. today did the same on some other tables. Another idea is too small shared buffers setting. Today it looked like: <http://www.postgresql-archive.org/file/t342733/buffercache1040.png> c - means count the number after c is the usage count, so c5dirty means here count of dirty pages with usagecount=5 that is the snapshot before and after the failure at 10:44 before and after the spike at 11:04: <http://www.postgresql-archive.org/file/t342733/buffercache1104.png> My interpretation of it is the following:the count of clean buffers with high usagecount is decreasing, the count of buffers with usagecount of 0 and 1 is very unstable -> so the buffers have no time to get older in the shared buffers and are thrown out? bgwriter stats: <http://www.postgresql-archive.org/file/t342733/bgwriter.png> the biggest number of buffers is cleaned by backends - so there is no free buffers with usagecount 0 and LWlocks happen? So increasing shared buffers would be a solution? Please help, it's happening quite often and I'm not sure which way is the right one... -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 2:40 PM, pinker <pinker@onet.eu> wrote: > Hi to all! > > We've got problem with a very serious repetitive incident on our core > system. Namely, cpu load spikes to 300-400 and the whole db becomes > unresponsive. From db point of view nothing special is happening, memory > looks fine, disks io's are ok and the only problem is huge cpu load. Kernel > parameters that are increasing with load are always the same: The solution here is to reduce the number of connections usually via some kind of connection pooling. Any db server will have a max throughput at around the number of cpu cores == connections (give or take a factor of 2). Outside that performance falls off, and has a very sharp knee on the other side as the # of conns goes up. Reduce connections, db runs faster. Increase it slows until it eventually falls over. pgbouncer and pgpool II are useful on the db end, look at pooling options on the app side as well. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Thank you Scott, we are planning to do it today. But are you sure it will help in this case? -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
2017-10-10 23:40 GMT+03:00 pinker <pinker@onet.eu>:
We've got problem with a very serious repetitive incident on our core
system. Namely, cpu load spikes to 300-400 and the whole db becomes
unresponsive. From db point of view nothing special is happening, memory
looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
parameters that are increasing with load are always the same:
Can you provide output of `iostat -myx 10` at the “peak” moments, please?
Also, it'd be good to look in more detailed bgwriter/checkpointer stats.
You can find more details in this post: http://blog.postgresql- consulting.com/2017/03/deep- dive-into-postgres-stats_27. html
(You might want to reset 'shared' stats here.)
Victor Yegorov
On 10/10/2017 10:40 PM, pinker wrote: > Hi to all! > > We've got problem with a very serious repetitive incident on our core > system. Namely, cpu load spikes to 300-400 and the whole db becomes What is "CPU load"? Perhaps you mean "load average"? Also, what are the basic system parameters (number of cores, RAM), it's difficult to help without knowing that. > unresponsive. From db point of view nothing special is happening, memory > looks fine, disks io's are ok and the only problem is huge cpu load. Kernel > parameters that are increasing with load are always the same: > > * page tables size > * Committed_AS > * Active anon > > <http://www.postgresql-archive.org/file/t342733/pagetables.png> > > and the total number of connections are increasing very fast (but I suppose > it's the symptom not the root cause of cpu load) and exceed max_connections > (1000). > I doubt you have 1000 cores in your system, so 1000 connections active at the same time is guaranteed to cause issues. What we see quite often is a minor hiccup (occasional slow query) snowballing into much more serious trouble exactly because of this. Queries get slower for some reason, application starts opening more connections (through a built-in connection pool) to run more queries, that further increases pressure, slows the queries even more, ... As Scott suggested, you should consider using a connection pool. > System: > * CentOS Linux release 7.2.1511 (Core) > * Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 > x86_64 x86_64 x86_64 GNU/Linux > * postgresql95-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-server-9.5.5-1PGDG.rhel7.x86_64 > > * 4 sockets/80 cores > * vm.dirty_background_bytes = 0 > * vm.dirty_background_ratio = 2 > * vm.dirty_bytes = 0 > * vm.dirty_expire_centisecs = 3000 > * vm.dirty_ratio = 20 > * vm.dirty_writeback_centisecs = 500 > > after the first incident we have changed: > * increased shared_buffers to 16GB (completely on huge pages. previously > 2GB) > * adjusted vm.nr_hugepages to 8000 (we've got 2mb pages) > * changed vm.overcommit_memory = 2 and vm.overcommit_ratio = 99 > * disabled transparent huge pages (they were set before unfortunately to > 'always') > > > It's a highly transactional db. Today I've run: > select now(), txid_current(); > and the results: > 3 339 351 transactions between 2017-10-10 14:42 and 2017-10-10 16:24 > Well, 3M transactions over ~2h period is just ~450tps, so nothing extreme. Not sure how large the transactions are, of course. > ... snip ... > > So, from the kernel stats we know that the failure happens when db is trying > to alocate some huge amount of pages (page tables size, anons, commited_as). > But what is triggering this situation? > Something gets executed on the database. We have no idea what it is, but it should be in the system logs. And you should see the process in 'top' with large amounts of virtual memory ... > I suppose it could be lazy autovacuum (just standard settings). So > autovacuum had to read whole 369gb yesterday to clean xids. today did the > same on some other tables. Possible, but it shouldn't allocate more than maintenance_work_mem. So I don't why it would allocate so much virtual memory. Another possibility is a run-away query that consumes a lot of work_mem. > Another idea is too small shared buffers setting. ... snip ... > bgwriter stats: > <http://www.postgresql-archive.org/file/t342733/bgwriter.png> > Yes, this suggests you probably have shared_buffers set too low, but it's impossible to say if increasing the size will help - perhaps your active set (part of DB you regularly access) is way too big. Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read), and then you can decide. You may also make the bgwriter more aggressive - that won't really improve the hit ratio, it will only make enough room for the backends. But I don't quite see how this could cause the severe problems you have, as I assume this is kinda regular behavior on that system. Hard to say without more data. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Victor Yegorov wrote > Can you provide output of `iostat -myx 10` at the “peak” moments, please? sure, please find it here: https://pastebin.com/f2Pv6hDL Victor Yegorov wrote > Also, it'd be good to look in more detailed bgwriter/checkpointer stats. > You can find more details in this post: http://blog.postgresql- > consulting.com/2017/03/deep-dive-into-postgres-stats_27.html > (You might want to reset 'shared' stats here.) thank you for the link, it's really nice explanation. Here you'll find the full bgwriter stats: https://pastebin.com/VA8pyfXj -- Victor Yegorov -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 3:53 PM, pinker <pinker@onet.eu> wrote: > Victor Yegorov wrote >> Can you provide output of `iostat -myx 10` at the “peak” moments, please? > > sure, please find it here: > https://pastebin.com/f2Pv6hDL Ouch, unless I'm reading that wrong, your IO subsystem seems to be REALLY slow. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Scott Marlowe-2 wrote > Ouch, unless I'm reading that wrong, your IO subsystem seems to be REALLY > slow. it's a huge array where a lot is happening, for instance data snapshots :/ the lun on which is this db is dm-7. I'm a DBA with null knowledge about arrays so any advice will be much appreciated :) -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
2017-10-11 0:53 GMT+03:00 pinker <pinker@onet.eu>:
> Can you provide output of `iostat -myx 10` at the “peak” moments, please?
sure, please find it here:
https://pastebin.com/f2Pv6hDL
Looks like `sdg` and `sdm` are the ones used most.
Can you describe what's on those devices? Do you have WAL and DB sitting together?
Where DB log files are stored?
Here you'll find the
full bgwriter stats: https://pastebin.com/VA8pyfXj
Can you, please, provide the output of this query (linked from the article mentioned): https://gist.github.com/lesovsky/4587d70f169739c01d4525027c087d14
And also this query:
SELECT name,version,source FROM pg_settings WHERE source NOT IN ('default','override');
Victor Yegorov
Tomas Vondra-4 wrote > What is "CPU load"? Perhaps you mean "load average"? Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the graph from yesterday's failure (after 6p.m.): <http://www.postgresql-archive.org/file/t342733/cpu.png> So as one can see connections spikes follow cpu spikes... Tomas Vondra-4 wrote > Also, what are the basic system parameters (number of cores, RAM), it's > difficult to help without knowing that. I have actually written everything in the first post: 80 CPU and 4 sockets over 500GB RAM Tomas Vondra-4 wrote > Well, 3M transactions over ~2h period is just ~450tps, so nothing > extreme. Not sure how large the transactions are, of course. It's quite a lot going on. Most of them are complicated stored procedures. Tomas Vondra-4 wrote > Something gets executed on the database. We have no idea what it is, but > it should be in the system logs. And you should see the process in 'top' > with large amounts of virtual memory ... Yes, it would be much easier if it would be just single query from the top, but the most cpu is eaten by the system itself and I'm not sure why. I suppose because of page tables size and anon pages is NUMA related. Tomas Vondra-4 wrote > Another possibility is a run-away query that consumes a lot of work_mem. It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of stored procedures with unnecessary WITH clauses (i.e. materialization) and right after it IN query with results of that (hash). Tomas Vondra-4 wrote > Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read), > and then you can decide. Thank you for the tip. I always do it but haven't here, so the result is 0.992969610990056 - so increasing it is rather pointless. Tomas Vondra-4 wrote > You may also make the bgwriter more aggressive - that won't really > improve the hit ratio, it will only make enough room for the backends. yes i probably will Tomas Vondra-4 wrote > But I don't quite see how this could cause the severe problems you have, > as I assume this is kinda regular behavior on that system. Hard to say > without more data. I can provide you with any data you need :) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/10/2017 3:28 PM, pinker wrote: > It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of > stored procedures with unnecessary WITH clauses (i.e. materialization) and > right after it IN query with results of that (hash). 1000 connections all doing queries that need 1 work_mem each will consume 1000*350MB == 350GB of your ram. many queries use several work_mem's. if the vast majority of your operations are OLTP and only access a few rows, then large work_mem is NOT a good idea. If you're doing large aggregate operations like OLAP for reporting or whatever, then thats another story, but generally doing that sort of thing does NOT use 1000 connections. -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Victor Yegorov wrote > Looks like `sdg` and `sdm` are the ones used most. > Can you describe what's on those devices? Do you have WAL and DB sitting > together? > Where DB log files are stored? it's multipath with the same LUN for PGDATA and pg_log, but separate one for xlogs and archives. mpatha dm-4 IBM ,2145 size=2.0T features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 7:0:1:2 sdg 8:96 active undef running | `- 8:0:1:2 sdm 8:192 active undef running `-+- policy='round-robin 0' prio=0 status=enabled |- 7:0:0:2 sdd 8:48 active undef running `- 8:0:0:2 sdj 8:144 active undef running Victor Yegorov wrote > Can you, please, provide the output of this query (linked from the article > mentioned): > https://gist.github.com/lesovsky/4587d70f169739c01d4525027c087d14 00:26:51.226024|120 days 03:05:37.987175|0.6|7.99|300.63|0.46|12673500.4|162.00|0.34|0.51|0.37|1.22|26.721|27.7|41.8|30.6|4.47|34.27|--------------------------------------|21532|124|6510377185|9920323|449049896|677360078|2321057|495798075|0 Victor Yegorov wrote > And also this query: > SELECT name,version,source FROM pg_settings WHERE source NOT IN > ('default','override'); application_name | client | psql archive_command | configuration file | <deleted> archive_mode | configuration file | on autovacuum | configuration file | on autovacuum_max_workers | configuration file | 10 checkpoint_completion_target | configuration file | 0.9 checkpoint_timeout | configuration file | 480 client_encoding | client | UTF8 DateStyle | configuration file | ISO, MDY default_statistics_target | configuration file | 350 default_text_search_config | configuration file | pg_catalog.english effective_cache_size | configuration file | 52428800 enable_indexscan | configuration file | on huge_pages | configuration file | on lc_messages | configuration file | en_US.UTF-8 lc_monetary | configuration file | en_US.UTF-8 lc_numeric | configuration file | en_US.UTF-8 lc_time | configuration file | en_US.UTF-8 listen_addresses | configuration file | * log_autovacuum_min_duration | configuration file | 0 log_checkpoints | configuration file | on log_connections | configuration file | on log_destination | configuration file | stderr log_directory | configuration file | pg_log log_disconnections | configuration file | on log_duration | configuration file | off log_filename | configuration file | postgresql-%a.log log_line_prefix | configuration file | %t [%p]: [%l-1] user=%u,db=%d log_lock_waits | configuration file | on log_min_duration_statement | configuration file | 0 log_rotation_age | configuration file | 1440 log_rotation_size | configuration file | 0 log_temp_files | configuration file | 0 log_timezone | configuration file | Poland log_truncate_on_rotation | configuration file | on logging_collector | configuration file | on maintenance_work_mem | configuration file | 2097152 max_connections | configuration file | 1000 max_stack_depth | environment variable | 2048 max_wal_senders | configuration file | 10 max_wal_size | configuration file | 640 random_page_cost | configuration file | 1 shared_buffers | configuration file | 2097152 temp_buffers | configuration file | 16384 TimeZone | configuration file | Poland track_functions | configuration file | all track_io_timing | configuration file | off wal_buffers | configuration file | 2048 wal_keep_segments | configuration file | 150 wal_level | configuration file | hot_standby work_mem | configuration file | 393216+ -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/11/2017 12:28 AM, pinker wrote: > Tomas Vondra-4 wrote >> What is "CPU load"? Perhaps you mean "load average"? > > Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the > graph from yesterday's failure (after 6p.m.): > <http://www.postgresql-archive.org/file/t342733/cpu.png> > So as one can see connections spikes follow cpu spikes... > I'm probably a bit dumb (after all, it's 1AM over here), but can you explain the CPU chart? I'd understand percentages (say, 75% CPU used) but what do the seconds / fractions mean? E.g. when the system time reaches 5 seconds, what does that mean? > > Tomas Vondra-4 wrote >> Also, what are the basic system parameters (number of cores, RAM), it's >> difficult to help without knowing that. > > I have actually written everything in the first post: > 80 CPU and 4 sockets > over 500GB RAM > Apologies, missed that bit. > > Tomas Vondra-4 wrote >> Well, 3M transactions over ~2h period is just ~450tps, so nothing >> extreme. Not sure how large the transactions are, of course. > > It's quite a lot going on. Most of them are complicated stored procedures. > OK. > > Tomas Vondra-4 wrote >> Something gets executed on the database. We have no idea what it is, but >> it should be in the system logs. And you should see the process in 'top' >> with large amounts of virtual memory ... > > Yes, it would be much easier if it would be just single query from the top, > but the most cpu is eaten by the system itself and I'm not sure why. I > suppose because of page tables size and anon pages is NUMA related. > Have you tried profiling using perf? That usually identifies hot spots pretty quickly - either in PostgreSQL code or in the kernel. > > Tomas Vondra-4 wrote >> Another possibility is a run-away query that consumes a lot of work_mem. > > It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of > stored procedures with unnecessary WITH clauses (i.e. materialization) and > right after it IN query with results of that (hash). > Depends on how much data is in the CTEs. We don't really allocate all of work_mem at once, but bit by bit. > > Tomas Vondra-4 wrote >> Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read), >> and then you can decide. > > Thank you for the tip. I always do it but haven't here, so the result is > 0.992969610990056 - so increasing it is rather pointless. > Yeah. > > Tomas Vondra-4 wrote >> You may also make the bgwriter more aggressive - that won't really >> improve the hit ratio, it will only make enough room for the backends. > > yes i probably will > On the other hand, the numbers are rather low. I mean, the backends seems to be evicting ~15k buffers over 5-minute period, which is pretty much nothing (~400kB/s). I wouldn't bother by tuning this. > > Tomas Vondra-4 wrote >> But I don't quite see how this could cause the severe problems you have, >> as I assume this is kinda regular behavior on that system. Hard to say >> without more data. > > I can provide you with any data you need :) > What I meant is that if the system evicts this amount of buffers all the time (i.e. there doesn't seem to be any sudden spike), then it's unlikely to be the cause (or related to it). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hi, On 2017-10-10 13:40:07 -0700, pinker wrote: > and the total number of connections are increasing very fast (but I suppose > it's the symptom not the root cause of cpu load) and exceed max_connections > (1000). Others mentioned already that that's worth improving. > System: > * CentOS Linux release 7.2.1511 (Core) > * Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 > x86_64 x86_64 x86_64 GNU/Linux Some versions of this kernel have had serious problems with transparent hugepages. I'd try turning that off. I think it defaults to off even in that version, but also make sure zone_reclaim_mode is disabled. > * postgresql95-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64 > * postgresql95-server-9.5.5-1PGDG.rhel7.x86_64 > > * 4 sockets/80 cores 9.6 has quite some scalability improvements over 9.5. I don't know whether it's feasible for you to update, but if so, It's worth trying. How about taking perf profile to investigate? > * vm.dirty_background_bytes = 0 > * vm.dirty_background_ratio = 2 > * vm.dirty_bytes = 0 > * vm.dirty_expire_centisecs = 3000 > * vm.dirty_ratio = 20 > * vm.dirty_writeback_centisecs = 500 I'd suggest monitoring /proc/meminfo for the amount of Dirty and Writeback memory, and see whether rapid changes therein coincide with periodds of slowdown. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Tomas Vondra-4 wrote > I'm probably a bit dumb (after all, it's 1AM over here), but can you > explain the CPU chart? I'd understand percentages (say, 75% CPU used) > but what do the seconds / fractions mean? E.g. when the system time > reaches 5 seconds, what does that mean? hehe, no you've just spotted a mistake, it suppose to be 50 cores :) out of 80 in total Tomas Vondra-4 wrote > Have you tried profiling using perf? That usually identifies hot spots > pretty quickly - either in PostgreSQL code or in the kernel. I was always afraid because of overhead, but maybe it's time to start ... Tomas Vondra-4 wrote > What I meant is that if the system evicts this amount of buffers all the > time (i.e. there doesn't seem to be any sudden spike), then it's > unlikely to be the cause (or related to it). I was actually been thinking about scenario where different sessions want to at one time read/write from or to many different relfilenodes, what could cause page swap between shared buffers and os cache? we see that context switches on cpu are increasing as well. kernel documentation says that using page tables instead of Translation Lookaside Buffer (TLB) is very costly and on some blogs have seen recomendations that using huge pages (so more addresses can fit in TLB) will help here but postgresql, unlike oracle, cannot use it for anything else than page buffering (so 16gb) ... so process memory still needs to use 4k pages. or memory fragmentation? -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Andres Freund wrote > Others mentioned already that that's worth improving. Yes, we are just setting up pgbouncer Andres Freund wrote > Some versions of this kernel have had serious problems with transparent > hugepages. I'd try turning that off. I think it defaults to off even in > that version, but also make sure zone_reclaim_mode is disabled. Yes, I'm aware of that so always set it to never. but thank you for the zone_reclaim_mode. Andres Freund wrote > 9.6 has quite some scalability improvements over 9.5. I don't know > whether it's feasible for you to update, but if so, It's worth trying. > > How about taking perf profile to investigate? Both are on my to do list :) Andres Freund wrote > I'd suggest monitoring /proc/meminfo for the amount of Dirty and > Writeback memory, and see whether rapid changes therein coincide with > periodds of slowdown. yes, I was monitoring it the whole day and that's the reason why I've changed dirty_background_ratio but both of them were flat - without any bigger spikes. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 01:40:07PM -0700, pinker wrote: > Hi to all! > > We've got problem with a very serious repetitive incident on our core > system. Namely, cpu load spikes to 300-400 and the whole db becomes > unresponsive. From db point of view nothing special is happening, memory > looks fine, disks io's are ok and the only problem is huge cpu load. Kernel > parameters that are increasing with load are always the same: > * disabled transparent huge pages (they were set before unfortunately to > 'always') Did you also try disabling KSM ? echo 2 |sudo tee /sys/kernel/mm/ksm/run I believe for us that was affecting a postgres VM(QEMU/KVM) and maybe not postgres itself. Worth a try ? https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com Justin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On 10/11/2017 02:26 AM, pinker wrote: > Tomas Vondra-4 wrote >> I'm probably a bit dumb (after all, it's 1AM over here), but can you >> explain the CPU chart? I'd understand percentages (say, 75% CPU used) >> but what do the seconds / fractions mean? E.g. when the system time >> reaches 5 seconds, what does that mean? > > hehe, no you've just spotted a mistake, it suppose to be 50 cores :) > out of 80 in total > Ah, so it should say '50 cores' instead of '5s'? Well, that's busy system I guess. > > Tomas Vondra-4 wrote >> Have you tried profiling using perf? That usually identifies hot spots >> pretty quickly - either in PostgreSQL code or in the kernel. > > I was always afraid because of overhead, but maybe it's time to start ... > I don't follow. If you're not in trouble, a little bit of additional overhead is not an issue (but you generally don't need profiling at that moment). If you're already in trouble, then spending a bit of CPU time on basic CPU profile is certainly worth it. > > Tomas Vondra-4 wrote >> What I meant is that if the system evicts this amount of buffers all the >> time (i.e. there doesn't seem to be any sudden spike), then it's >> unlikely to be the cause (or related to it). > > I was actually been thinking about scenario where different sessions > want to at one time read/write from or to many different relfilenodes, > what could cause page swap between shared buffers and os cache? Perhaps. If the sessions only do reads, that would not be visible in buffer_backends I believe (not sure ATM, would have to check source). But it'd be visible in buffers_alloc and certainly in blks_read. > we see that context switches on cpu are increasing as well. kernel > documentation says that using page tables instead of Translation > Lookaside Buffer (TLB) is very costly and on some blogs have seen > recomendations that using huge pages (so more addresses can fit in > TLB) will help here but postgresql, unlike oracle, cannot use it for > anything else than page buffering (so 16gb) ... so process memory > still needs to use 4k pages. > The context switches are likely due to large number of runnable processes competing for the CPU. Also, memory bandwidth is increasingly an issue on big boxes ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Tue, Oct 10, 2017 at 4:28 PM, pinker <pinker@onet.eu> wrote: > > Yes, it would be much easier if it would be just single query from the top, > but the most cpu is eaten by the system itself and I'm not sure why. You are experiencing a context switch storm. The OS is spending so much time trying to switch between 1,000+ processes it doesn't have any time left to do much else. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Any chance this situation has something to do with the latest reveal of meltdown vulnerability? https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html