On Tue, Nov 10, 2015 at 8:26 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Sun, Nov 8, 2015 at 7:22 AM, 莎士比亚说: <657985552@qq.com> wrote:
>> Hi moran and others;
>>
>> yesterday i get the pg problem again, and i use perf top Observation
>> follows:
>> PerfTop: 11574 irqs/sec kernel: 2.2% exact: 0.0% [4000Hz cycles],
>> (all, 32 CPUs)
>> 81.39% postgres [.] s_lock
>> 5.42% postgres [.] LWLockAcquire
>> 4.59% postgres [.] LWLockRelease
>> 3.06% postgres [.] TransactionIdIsInProgress
>> 0.38% postgres [.] PinBuffer
>> 0.31% postgres [.] TransactionIdPrecedes
>> 0.27% postgres [.] UnpinBuffer
>> 0.19% postgres [.] TransactionIdIsCurrentTransactionId
>> 0.16% postgres [.] heap_hot_search_buffer
>> 0.15% [kernel] [k] number.isra.1
>> 0.14% [kernel] [k] kallsyms_expand_symbol.constprop.1
>> 0.10% [kernel] [k] module_get_kallsym
>> 0.10% libc-2.17.so [.] __strcmp_sse42
>> 0.09% [kernel] [k] _raw_spin_lock
>> 0.09% postgres [.] hash_search_with_hash_value
>>
>> is spin lock problem ? I need everyone's help to solve the problem.thsnks!
>
> Yup, spinlock problem. These can be difficult. What weneed now is
> some metrics. They must be captured during load event:
>
> *) number of active queries with average duration:
> SELECT query, count(*), avg(now() - query_start) FROM
> pg_stat_activity where state != 'idle' GROUP BY 1;
>
> *) context switches via "vmstat 1" get a snapshot during load and
> during non load time for comparison
>
> Solution to this problem will probably be one or more of:
> 1) Significantly downgrade shared_buffers (say, to 4GB)
>
> 2) Upgrade database to 9.4 and hope for the best
>
> 3) Capture backtrace during load event to determine exactly which path
> is going into spinlock
>
> 4) Install pgbouncer or another connection pooler to limit active
> queries on database
>
> 5) Install experimental patches to custom compiled database to test
> and verify a hypothetical fix
>
> Out of those 5 things, which are possible for you to do? Best case
> scenario is that you have non-essential server that reproduces the
> issue.
Can we also see output of:
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
merlin