BUG #9342: CPU / Memory Run-away

Поиск
Список
Период
Сортировка
От cnielsen@atlassian.com
Тема BUG #9342: CPU / Memory Run-away
Дата
Msg-id 20140224204749.29537.62919@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #9342: CPU / Memory Run-away  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: BUG #9342: CPU / Memory Run-away  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      9342
Logged by:          Christopher Nielsen
Email address:      cnielsen@atlassian.com
PostgreSQL version: 9.2.6
Operating system:   CentOS 6.5
Description:


Hello,

Thanks very much for authoring such a full-featured and useful database.
It’s been a rock-solid component of many applications I’ve worked with, and
am very glad to see it grow.

There is an issue we’ve run into recently, that could be a database bug, and
it would be great to get someone’s input.

In the last few months, and more frequently, recently, we have seen Postgres
run away with CPU and disk resources.

When this occurs, performance becomes degraded, and response time for all
queries increases.

This seems to happen, regardless of query-load, and often happens when
traffic to the application is very low.

Here is some more information that characterizes the issue we found.

Some symptoms include:

    * The machine load is very high (many times of the number of cpus
available)
    * The CPU is 100% utilized, with 85% user and 15% sys
    * We do not see increase in iowait, that we might expect from a storage
issue
    * All query results take significantly longer to be delivered
    * The issue can be worked around, by restarting the database
        - This gives us downtime we would like to avoid
    * On strace, we often workers stuck on semop syscalls

Our setup, looks like the below:

    System Properties:

        PG Version:     9.2.6
        OS and Version: CentOS 6.5 running 2.6.32-431.3.1
        CPU:            80 cores (Intel Xeon)
        Memory:         252 GB

    Postgresql Setup:

        * 1 master and 3 slaves streaming replication from the master
        * WALs are archived using rsync in the archive_command
        * WAL files on the master are stored on a separate partition
        * Both pgdata and pg_xlog partitions have CFQ IO scheduler set.
            - We plan to switch both to either deadline or noop,
              possibly if that could help

We have core and stack traces available, and would love to collaborate with
anyone who has time to look into the issue with us, anytime.

Thanks very much again,

Chris

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #9337: SSPI/GSSAPI with mismatched user names
Следующее
От: Christopher Browne
Дата:
Сообщение: Re: Problem with PostgreSQL 9.2.7 and make check on AIX 7.1