Re: CPU spikes and transactions

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: CPU spikes and transactions
Дата
Msg-id CAHyXU0yXYpCXN4=81ZDRQu-oGzrcq2qNAXDpyz4oiQPPAGk4ew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: CPU spikes and transactions  (Dave Owens <dave@teamunify.com>)
Список pgsql-performance
On Tue, May 13, 2014 at 6:04 PM, Dave Owens <dave@teamunify.com> wrote:
Hi,

Apologies for resurrecting this old thread, but it seems like this is better than starting a new conversation.

We are now running 9.1.13 and have doubled the CPU and memory.  So 2x 16 Opteron 6276 (32 cores total), and 64GB memory.  shared_buffers set to 20G, effective_cache_size set to 40GB.

We were able to record perf data during the latest incident of high CPU utilization. perf report is below:

Samples: 31M of event 'cycles', Event count (approx.): 16289978380877 
 44.74%       postmaster  [kernel.kallsyms]             [k] _spin_lock_irqsave                                     
 15.03%       postmaster  postgres                      [.] 0x00000000002ea937                                     
  3.14%       postmaster  postgres                      [.] s_lock                                                 
  2.30%       postmaster  [kernel.kallsyms]             [k] compaction_alloc                                       
  2.21%       postmaster  postgres                      [.] HeapTupleSatisfiesMVCC                                 
  1.75%       postmaster  postgres                      [.] hash_search_with_hash_value                            
  1.25%       postmaster  postgres                      [.] ExecScanHashBucket                                     
  1.20%       postmaster  postgres                      [.] SHMQueueNext                                           
  1.05%       postmaster  postgres                      [.] slot_getattr                                           
  1.04%             init  [kernel.kallsyms]             [k] native_safe_halt                                       
  0.73%       postmaster  postgres                      [.] LWLockAcquire                                          
  0.59%       postmaster  [kernel.kallsyms]             [k] page_fault                                             
  0.52%       postmaster  postgres                      [.] ExecQual                                               
  0.40%       postmaster  postgres                      [.] ExecStoreTuple                                         
  0.38%       postmaster  postgres                      [.] ExecScan                                               
  0.37%       postmaster  postgres                      [.] check_stack_depth                                      
  0.35%       postmaster  postgres                      [.] SearchCatCache                                         
  0.35%       postmaster  postgres                      [.] CheckForSerializableConflictOut                        
  0.34%       postmaster  postgres                      [.] LWLockRelease                                          
  0.30%       postmaster  postgres                      [.] _bt_checkkeys                                          
  0.28%       postmaster  libc-2.12.so                  [.] memcpy                                                 
  0.27%       postmaster  [kernel.kallsyms]             [k] get_pageblock_flags_group                              
  0.27%       postmaster  postgres                      [.] int4eq                                                 
  0.27%       postmaster  postgres                      [.] heap_page_prune_opt                                    
  0.27%       postmaster  postgres                      [.] pgstat_init_function_usage                             
  0.26%       postmaster  [kernel.kallsyms]             [k] _spin_lock                                             
  0.25%       postmaster  postgres                      [.] _bt_compare                                            
  0.24%       postmaster  postgres                      [.] pgstat_end_function_usage

...please let me know if we need to produce the report differently to be useful.

We will begin reducing shared_buffers incrementally over the coming days.


This is definitely pointing at THP compaction which is increasingly emerging as a possible culprit for suddenly occurring (and just as suddenly resolving) cpu spikes.  The evidence I see is:

*) Lots of time in kernel
*) "compaction_alloc"
*) otherwise normal postgres profile (not lots of time in s_lock, LWLock, or other weird things)


Please check the value of THP (see here: http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadaoop-workloads/) and various other workloads.   If it is enabled consider disabling it...this will revert to pre linux 6 behavior.  If you are going to attack this from the point of view of lowering shared buffers, do not bother with incremental...head straight for 2GB or it's unlikely the problem will be fixed.   THP compaction is not a postgres problem...mysql is affected as is other server platforms.  If THP is indeed causing the problem, it couldn't hurt to get on the horn withe linux guys.  Last I heard they claimed this kind of thing was fixed but I don't know where things stand now.

merlin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Dave Owens
Дата:
Сообщение: Re: CPU spikes and transactions
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: CPU spikes and transactions