Re: Cpu usage 100% on slave. s_lock problem.

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Cpu usage 100% on slave. s_lock problem.
Дата
Msg-id CAHyXU0xAKkjRA03GXP71yqqqpdxepH5p5qZwddjPTOWQOkjoHQ@mail.gmail.com
обсуждение исходный текст
Ответ на Cpu usage 100% on slave. s_lock problem.  (Дмитрий Дегтярёв <degtyaryov@gmail.com>)
Ответы Re: Cpu usage 100% on slave. s_lock problem.
Список pgsql-performance
On Tue, Aug 27, 2013 at 2:57 AM, Дмитрий Дегтярёв <degtyaryov@gmail.com> wrote:
> Hello.
>
> Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620
> Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.
>
> $ cat /etc/fedora-release
> Fedora release 19
>
> $ postgres --version
> postgres (PostgreSQL) 9.2.4
>
> Data ~220Gb and Indexes ~140Gb
>
> iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec
>
> On each installed pg_bouncer. Pool size 24.
>
> On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% -
> this is no problem
> $ perf top
>  21,14%  [kernel]                 [k] isolate_freepages_block
>  12,27%  [unknown]                [.] 0x00007fc1bb303be0
>   5,93%  postgres                 [.] hash_search_with_hash_value
>   4,85%  libbz2.so.1.0.6          [.] 0x000000000000a6e0
>   2,70%  postgres                 [.] PinBuffer
>   2,34%  postgres                 [.] slot_deform_tuple
>   1,92%  libbz2.so.1.0.6          [.] BZ2_compressBlock
>   1,85%  postgres                 [.] LWLockAcquire
>   1,69%  postgres                 [.] heap_page_prune_opt
>   1,48%  postgres                 [.] _bt_checkkeys
>   1,40%  [kernel]                 [k] page_fault
>   1,36%  postgres                 [.] _bt_compare
>   1,23%  postgres                 [.] heap_hot_search_buffer
>   1,19%  [kernel]                 [k] get_pageblock_flags_group
>   1,18%  postgres                 [.] AllocSetAlloc
>
> On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU
> ~95% - this is problem
> $ perf top
>  30,10%  postgres               [.] s_lock
>  22,90%  [unknown]              [.] 0x0000000000729cfe
>   4,98%  postgres               [.] RecoveryInProgress.part.9
>   4,89%  postgres               [.] LWLockAcquire
>   4,57%  postgres               [.] hash_search_with_hash_value
>   3,50%  postgres               [.] PinBuffer
>   2,31%  postgres               [.] heap_page_prune_opt


It looks like you're hitting spinlock connection inside
heap_page_prune_opt().  Which is commented:
 * Note: this is called quite often.  It's important that it fall out quickly
 * if there's not any use in pruning.

This in turn calls RecoveryInProgress() which spinlocks in order to
get a guaranteed result.  At that call site, we are told:
/*
* We can't write WAL in recovery mode, so there's no point trying to
* clean the page. The master will likely issue a cleaning WAL record soon
* anyway, so this is no particular loss.
*/

So ISTM it's necessary to pedantically check  RecoveryInProgress on
each and every call of this routine (or at least, we should be able to
reduce the number of spinlocks).

Hm, what if we exposed LocalRecoveryInProgress() through a function
which would approximately satisfy the condition
"MightRecoveryInProgress()" in the basis the condition only moves in
one direction?  That could lead to optimization around the spinlock in
hot path cases like this where getting 'TRUE' incorrectly is mostly
harmless...

merlin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Rafael Martinez
Дата:
Сообщение: Re: SQL statement over 500% slower with 9.2 compared with 9.1
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Cpu usage 100% on slave. s_lock problem.