Обсуждение: Re: cpu bound postgresql setup. Firstly many thanks for responding. I am concerned because the load averages have increased and users complaining of slowness. I do not change settings frequenly. I was curious if there is any half dead component in th

От:
Rajesh Kumar Mallah
Дата:

On 6/23/10, Kevin Grittner <> wrote:
> Rajesh Kumar Mallah <> wrote:
>> PasteBin for the vmstat output
>> http://pastebin.com/mpHCW9gt
>>
>> On Wed, Jun 23, 2010 at 8:22 PM, Rajesh Kumar Mallah
>> <> wrote:
>>> Dear List ,
>>>
>>> I observe that my postgresql (ver 8.4.2) dedicated server has
>>> turned cpu bound and there is a high load average in the server >
>>> 50 usually.
>>> The server has 2 Quad Core CPUs already and there are 6 or 8
>>> drives in raid 10 , there is negligable i/o wait. There is 32GB
>>> ram and no swapping.
>>>
>>> When i strace processes at random i see lot of lseek
>>> (XXX,0,SEEK_END) calls which i feel were not that frequent
>>> before. can any pointers be got for investigating the high cpu
>>> usage by postgresql processes.
>
> I'm not clear on what problem you are experiencing.  Using a lot of
> your hardware's capacity isn't a problem in itself -- are you
> getting poor response time?  Poor throughput?  Some other problem?
> Is it continuous, or only when certain queries run?
>
> One thing that is apparent is that you might want to use a
> connection pool, or if you're already using one you might want to
> configure it to reduce the maximum number of active queries.  With
> eight cores and eight drives, your best throughput is going to be at
> somewhere around 24 active connections, and you appear to be going
> to at least twice that.
>
> If you can provide a copy of your postgresql.conf settings (without
> comments) and an EXPLAIN ANALYZE of a slow query, along with the
> schema information for the tables used by the query, you'll probably
> get useful advice on how to adjust your configuration, indexing, or
> query code to improve performance.
>
> -Kevin
>

--
Sent from Gmail for mobile | mobile.google.com

От:
"Kevin Grittner"
Дата:

Your response somehow landed in the subject line, apparently
truncated.  I'll extract that to the message body and reply to what
made it through.

Rajesh Kumar Mallah <> wrote:

> Firstly many thanks for responding. I am concerned because the
> load averages have increased and users complaining  of slowness.

If performance has gotten worse, then something has changed.  It
would be helpful to know what.  More users?  New software?  Database
growth?  Database bloat?  (etc.)

> I do not change settings frequenly.

That doesn't mean your current settings can't be changed to make
things better.

> I was curious if there is  any half dead component in th

Have you reviewed what shows up if you run (as a database
superuser)?:

  select * from pg_stat_activity;

You might want to review this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

От:
Rajesh Kumar Mallah
Дата:

Dear List,

1. It was found that too many stray queries were getting generated
from rouge users and bots
    we controlled using some manual methods.

2. We have made application changes and some significant changes have been done.

3. we use xfs  and our controller has BBU , we changed barriers=1 to
barriers=0 as
    i learnt that having barriers=1 on xfs  and fsync  as the sync
method, the advantage
    of BBU is lost unless barriers is = 0 (correct me if my
understanding is wrong)

4. We had implemented partitioning using exclusion constraints ,
parent relnship
    was removed from quite a lot of old partition tables.

our postgresql.conf

--------------------------------------
# cat postgresql.conf  | grep -v "^\s*#" | grep -v "^\s*$"

listen_addresses = '*'          # what IP address(es) to listen on;
port = 5432                             # (change requires restart)
max_connections = 300                   # (change requires restart)
shared_buffers = 10GB                   # min 128kB
work_mem = 4GB                          # min 64kB
fsync = on                              # turns forced synchronization on or off
synchronous_commit = on         # immediate fsync at commit
checkpoint_segments = 30                # in logfile segments, min 1, 16MB each
archive_mode = on               # allows archiving to be done
archive_command = '/opt/scripts/archive_wal.sh %p %f '
archive_timeout = 600           # force a logfile segment switch after this
effective_cache_size = 18GB
constraint_exclusion = on       # on, off, or partition
logging_collector = on          # Enable capturing of stderr and csvlog
log_directory = '/var/log/postgresql'           # directory where log
files are written,
log_filename = 'postgresql.log' # log file name pattern,
log_truncate_on_rotation = on           # If on, an existing log file of the
log_rotation_age = 1d                   # Automatic rotation of logfiles will
log_error_verbosity = verbose           # terse, default, or verbose messages
log_min_duration_statement = 5000       # -1 is disabled, 0 logs all statements
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'                     # locale for system
error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting
default_text_search_config = 'pg_catalog.english'
add_missing_from = on
custom_variable_classes = 'general'             # list of custom
variable class names
general.report_level = ''
general.disable_audittrail2 = ''
general.employee=''


Also i would like to apologize that some of the discussions on this problem
 inadvertently became private between me & kevin.


On Thu, Jun 24, 2010 at 12:10 AM, Rajesh Kumar Mallah
<> wrote:
> It was nice to go through the interesting posting guidelines.  i shall
> be analyzing the slow queries more objectively tomorrow during the
> peak hours. I really hope it sould be possible to track down the
> problem.
>
> On 6/23/10, Kevin Grittner <> wrote:
>> Rajesh Kumar Mallah <> wrote:
>>
>>> did you suggest at some point that number of backend per core
>>> should be preferebly 3 ?
>>
>> I've found the number of *active* backends is optimal around (2 *
>> cores) + spindles.  You said you had eight cores and eight or ten
>> spindles, so I figure a connection pool limited to somewhere around
>> 24 active connections is ideal.  (Depending on how you set up your
>> pool, you may need a higher total number of connections to keep 24
>> active.)
>>
>> -Kevin
>>
>
> --
> Sent from Gmail for mobile | mobile.google.com
>

От:
"Kevin Grittner"
Дата:

I'm not clear whether you still have a problem, or whether the
changes you mention solved your issues.  I'll comment on potential
issues that leap out at me.

Rajesh Kumar Mallah <> wrote:

> 3. we use xfs  and our controller has BBU , we changed barriers=1
> to barriers=0 as i learnt that having barriers=1 on xfs  and fsync
> as the sync method, the advantage of BBU is lost unless barriers
> is = 0 (correct me if my understanding is wrong)

We use noatime,nobarrier in /etc/fstab.  I'm not sure where you're
setting that, but if you have a controller with BBU, you want to set
it to whichever disables write barriers.

> max_connections = 300

As I've previously mentioned, I would use a connection pool, in
which case this wouldn't need to be that high.

> work_mem = 4GB

That's pretty high.  That much memory can be used by each active
connection, potentially for each of several parts of the active
query on each connection.  You should probably set this much lower
in postgresql.conf and boost it if necessary for individual queries.

> effective_cache_size = 18GB

With 32GB RAM on the machine, I would probably set this higher --
somewhere in the 24GB to 30GB range, unless you have specific
reasons to believe otherwise.  It's not that critical, though.

> add_missing_from = on

Why?  There has been discussion of eliminating this option -- do you
have queries which rely on the non-standard syntax this enables?

> Also i would like to apologize that some of the discussions on
> this problem inadvertently became private between me & kevin.

Oops.  I failed to notice that.  Thanks for bringing it back to the
list.  (It's definitely in your best interest to keep it in front of
all the other folks here, some of whom regularly catch things I miss
or get wrong.)

If you still do have slow queries, please follow up with details.

-Kevin

От:
Rajesh Kumar Mallah
Дата:

On Thu, Jun 24, 2010 at 8:57 PM, Kevin Grittner
<> wrote:
> I'm not clear whether you still have a problem, or whether the
> changes you mention solved your issues.  I'll comment on potential
> issues that leap out at me.

It shall require more observation to know if the "problem" is solved.
my  "problem"  was high load average in the server . We find that
when ldavg is between 10-20 responses of applications were acceptable
ldavg  > 40 makes things slower.

What prompted me to post to list is that the server transitioned from
being IO bound to CPU bound and 90% of syscalls being
lseek(XXX, 0, SEEK_END) = YYYYYYY

>
> Rajesh Kumar Mallah <> wrote:
>
>> 3. we use xfs  and our controller has BBU , we changed barriers=1
>> to barriers=0 as i learnt that having barriers=1 on xfs  and fsync
>> as the sync method, the advantage of BBU is lost unless barriers
>> is = 0 (correct me if my understanding is wrong)
>
> We use noatime,nobarrier in /etc/fstab.  I'm not sure where you're
> setting that, but if you have a controller with BBU, you want to set
> it to whichever disables write barriers.

as per suggestion in discussions on some other thread I set it
in /etc/fstab.

>
>> max_connections = 300
>
> As I've previously mentioned, I would use a connection pool, in
> which case this wouldn't need to be that high.

We do use connection pooling provided to mod_perl server
via Apache::DBI::Cache. If i reduce this i *get* "too many
connections from non-superuser ... "  error. Will pgpool - I/II
still applicable in this scenario ?


>
>> work_mem = 4GB
>
> That's pretty high.  That much memory can be used by each active
> connection, potentially for each of several parts of the active
> query on each connection.  You should probably set this much lower
> in postgresql.conf and boost it if necessary for individual queries.

hmmm.. it was 8GB for many months !

i shall reduce it further, but will it not result in usage of too many
temp files
and saturate i/o?



>
>> effective_cache_size = 18GB
>
> With 32GB RAM on the machine, I would probably set this higher --
> somewhere in the 24GB to 30GB range, unless you have specific
> reasons to believe otherwise.  It's not that critical, though.

i do not remember well but there is a system view that (i think)
guides at what stage the marginal returns of increasing it
starts disappearing , i had set it a few years back.


>
>> add_missing_from = on
>
> Why?  There has been discussion of eliminating this option -- do you
> have queries which rely on the non-standard syntax this enables?

unfortunately yes.

>
>> Also i would like to apologize that some of the discussions on
>> this problem inadvertently became private between me & kevin.
>
> Oops.  I failed to notice that.  Thanks for bringing it back to the
> list.  (It's definitely in your best interest to keep it in front of
> all the other folks here, some of whom regularly catch things I miss
> or get wrong.)
>
> If you still do have slow queries, please follow up with details.


I have now set log_min_duration_statement = 5000
and there are few queries that come to logs.

please comment on the connection pooling aspect.

Warm  Regards
Rajesh Kumar Mallah.

>
> -Kevin
>

От:
Rajesh Kumar Mallah
Дата:

>i do not remember well but there is a system view that (i think)
>guides at what stage the marginal returns of increasing it
>starts disappearing , i had set it a few years back.

Sorry the above comment was regarding setting shared_buffers
not effective_cache_size.



On Thu, Jun 24, 2010 at 10:55 PM, Rajesh Kumar Mallah
<> wrote:
> On Thu, Jun 24, 2010 at 8:57 PM, Kevin Grittner
> <> wrote:
>> I'm not clear whether you still have a problem, or whether the
>> changes you mention solved your issues.  I'll comment on potential
>> issues that leap out at me.
>
> It shall require more observation to know if the "problem" is solved.
> my  "problem"  was high load average in the server . We find that
> when ldavg is between 10-20 responses of applications were acceptable
> ldavg  > 40 makes things slower.
>
> What prompted me to post to list is that the server transitioned from
> being IO bound to CPU bound and 90% of syscalls being
> lseek(XXX, 0, SEEK_END) = YYYYYYY
>
>>
>> Rajesh Kumar Mallah <> wrote:
>>
>>> 3. we use xfs  and our controller has BBU , we changed barriers=1
>>> to barriers=0 as i learnt that having barriers=1 on xfs  and fsync
>>> as the sync method, the advantage of BBU is lost unless barriers
>>> is = 0 (correct me if my understanding is wrong)
>>
>> We use noatime,nobarrier in /etc/fstab.  I'm not sure where you're
>> setting that, but if you have a controller with BBU, you want to set
>> it to whichever disables write barriers.
>
> as per suggestion in discussions on some other thread I set it
> in /etc/fstab.
>
>>
>>> max_connections = 300
>>
>> As I've previously mentioned, I would use a connection pool, in
>> which case this wouldn't need to be that high.
>
> We do use connection pooling provided to mod_perl server
> via Apache::DBI::Cache. If i reduce this i *get* "too many
> connections from non-superuser ... "  error. Will pgpool - I/II
> still applicable in this scenario ?
>
>
>>
>>> work_mem = 4GB
>>
>> That's pretty high.  That much memory can be used by each active
>> connection, potentially for each of several parts of the active
>> query on each connection.  You should probably set this much lower
>> in postgresql.conf and boost it if necessary for individual queries.
>
> hmmm.. it was 8GB for many months !
>
> i shall reduce it further, but will it not result in usage of too many
> temp files
> and saturate i/o?
>
>
>
>>
>>> effective_cache_size = 18GB
>>
>> With 32GB RAM on the machine, I would probably set this higher --
>> somewhere in the 24GB to 30GB range, unless you have specific
>> reasons to believe otherwise.  It's not that critical, though.
>
> i do not remember well but there is a system view that (i think)
> guides at what stage the marginal returns of increasing it
> starts disappearing , i had set it a few years back.
>
>
>>
>>> add_missing_from = on
>>
>> Why?  There has been discussion of eliminating this option -- do you
>> have queries which rely on the non-standard syntax this enables?
>
> unfortunately yes.
>
>>
>>> Also i would like to apologize that some of the discussions on
>>> this problem inadvertently became private between me & kevin.
>>
>> Oops.  I failed to notice that.  Thanks for bringing it back to the
>> list.  (It's definitely in your best interest to keep it in front of
>> all the other folks here, some of whom regularly catch things I miss
>> or get wrong.)
>>
>> If you still do have slow queries, please follow up with details.
>
>
> I have now set log_min_duration_statement = 5000
> and there are few queries that come to logs.
>
> please comment on the connection pooling aspect.
>
> Warm  Regards
> Rajesh Kumar Mallah.
>
>>
>> -Kevin
>>
>

От:
"Benjamin Krajmalnik"
Дата:

Rajesh,

I had a similar situation a few weeks ago whereby performance all of a
sudden decreased.
The one tunable which resolved the problem in my case was increasing the
number of checkpoint segments.
After increasing them, everything went back to its normal state.


> -----Original Message-----
> From: 
[mailto:pgsql-performance-
> ] On Behalf Of Rajesh Kumar Mallah
> Sent: Thursday, June 24, 2010 11:27 AM
> To: Kevin Grittner
> Cc: 
> Subject: Re: [PERFORM] cpu bound postgresql setup.
>
> >i do not remember well but there is a system view that (i think)
> >guides at what stage the marginal returns of increasing it
> >starts disappearing , i had set it a few years back.
>
> Sorry the above comment was regarding setting shared_buffers
> not effective_cache_size.
>
>
>
> On Thu, Jun 24, 2010 at 10:55 PM, Rajesh Kumar Mallah
> <> wrote:
> > On Thu, Jun 24, 2010 at 8:57 PM, Kevin Grittner
> > <> wrote:
> >> I'm not clear whether you still have a problem, or whether the
> >> changes you mention solved your issues.  I'll comment on potential
> >> issues that leap out at me.
> >
> > It shall require more observation to know if the "problem" is
solved.
> > my  "problem"  was high load average in the server . We find that
> > when ldavg is between 10-20 responses of applications were
acceptable
> > ldavg  > 40 makes things slower.
> >
> > What prompted me to post to list is that the server transitioned
from
> > being IO bound to CPU bound and 90% of syscalls being
> > lseek(XXX, 0, SEEK_END) = YYYYYYY
> >
> >>
> >> Rajesh Kumar Mallah <> wrote:
> >>
> >>> 3. we use xfs  and our controller has BBU , we changed barriers=1
> >>> to barriers=0 as i learnt that having barriers=1 on xfs  and fsync
> >>> as the sync method, the advantage of BBU is lost unless barriers
> >>> is = 0 (correct me if my understanding is wrong)
> >>
> >> We use noatime,nobarrier in /etc/fstab.  I'm not sure where you're
> >> setting that, but if you have a controller with BBU, you want to
set
> >> it to whichever disables write barriers.
> >
> > as per suggestion in discussions on some other thread I set it
> > in /etc/fstab.
> >
> >>
> >>> max_connections = 300
> >>
> >> As I've previously mentioned, I would use a connection pool, in
> >> which case this wouldn't need to be that high.
> >
> > We do use connection pooling provided to mod_perl server
> > via Apache::DBI::Cache. If i reduce this i *get* "too many
> > connections from non-superuser ... "  error. Will pgpool - I/II
> > still applicable in this scenario ?
> >
> >
> >>
> >>> work_mem = 4GB
> >>
> >> That's pretty high.  That much memory can be used by each active
> >> connection, potentially for each of several parts of the active
> >> query on each connection.  You should probably set this much lower
> >> in postgresql.conf and boost it if necessary for individual
queries.
> >
> > hmmm.. it was 8GB for many months !
> >
> > i shall reduce it further, but will it not result in usage of too
> many
> > temp files
> > and saturate i/o?
> >
> >
> >
> >>
> >>> effective_cache_size = 18GB
> >>
> >> With 32GB RAM on the machine, I would probably set this higher --
> >> somewhere in the 24GB to 30GB range, unless you have specific
> >> reasons to believe otherwise.  It's not that critical, though.
> >
> > i do not remember well but there is a system view that (i think)
> > guides at what stage the marginal returns of increasing it
> > starts disappearing , i had set it a few years back.
> >
> >
> >>
> >>> add_missing_from = on
> >>
> >> Why?  There has been discussion of eliminating this option -- do
you
> >> have queries which rely on the non-standard syntax this enables?
> >
> > unfortunately yes.
> >
> >>
> >>> Also i would like to apologize that some of the discussions on
> >>> this problem inadvertently became private between me & kevin.
> >>
> >> Oops.  I failed to notice that.  Thanks for bringing it back to the
> >> list.  (It's definitely in your best interest to keep it in front
of
> >> all the other folks here, some of whom regularly catch things I
miss
> >> or get wrong.)
> >>
> >> If you still do have slow queries, please follow up with details.
> >
> >
> > I have now set log_min_duration_statement = 5000
> > and there are few queries that come to logs.
> >
> > please comment on the connection pooling aspect.
> >
> > Warm  Regards
> > Rajesh Kumar Mallah.
> >
> >>
> >> -Kevin
> >>
> >
>
> --
> Sent via pgsql-performance mailing list (pgsql-
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

От:
Alvaro Herrera
Дата:

Excerpts from Rajesh Kumar Mallah's message of jue jun 24 13:25:32 -0400 2010:

> What prompted me to post to list is that the server transitioned from
> being IO bound to CPU bound and 90% of syscalls being
> lseek(XXX, 0, SEEK_END) = YYYYYYY

It could be useful to find out what file is being seeked.  Correlate the
XXX with files in /proc/<pid>/fd (at least on Linux) to find out more.

--
Álvaro Herrera <>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

От:
"Kevin Grittner"
Дата:

Rajesh Kumar Mallah <> wrote:
> Kevin Grittner <> wrote:

>>> max_connections = 300
>>
>> As I've previously mentioned, I would use a connection pool, in
>> which case this wouldn't need to be that high.
>
> We do use connection pooling provided to mod_perl server
> via Apache::DBI::Cache. If i reduce this i *get* "too many
> connections from non-superuser ... "  error. Will pgpool - I/II
> still applicable in this scenario ?

Yeah, you can't reduce this setting without first having a
connection pool in place which will limit how many connections are
in use.  We haven't used any of the external connection pool
products for PostgreSQL yet, because when we converted to PostgreSQL
we were already using a pool built into our application framework.
This pool queues requests for database transactions and has one
thread per connection in the database pool to pull and service
objects which encapsulate the logic of the database transaction.

We're moving to new development techniques, since that framework is
over ten years old now, but the overall approach is going to stay
the same -- because it has worked so well for us.  By queuing
requests beyond the number which can keep all the server's resources
busy, we avoid wasting resources on excessive context switching and
(probably more significant) contention for locks.  At one point our
busiest server started to suffer performance problems under load,
and we were able to fix them by simple configuring the connection
pool to half its previous size -- both response time and throughput
improved.

-Kevin

От:
Bruce Momjian
Дата:

Benjamin Krajmalnik wrote:
> Rajesh,
>
> I had a similar situation a few weeks ago whereby performance all of a
> sudden decreased.
> The one tunable which resolved the problem in my case was increasing the
> number of checkpoint segments.
> After increasing them, everything went back to its normal state.

Did you get a db server log message suggesting in increasing that
setting? I hope so.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +

От:
"Benjamin Krajmalnik"
Дата:

Bruce,
Unfortunately not.  The behavior I had was ebbs and flows.  On FreeBSD,
I was seeing a lot of kernel wait states in top.  So every few minutes,
responsiveness of the db was pretty bad.  8.4.4/amd64 on FreeBSD 7.2

> -----Original Message-----
> From: Bruce Momjian [mailto:]
> Sent: Monday, June 28, 2010 3:45 PM
> To: Benjamin Krajmalnik
> Cc: Rajesh Kumar Mallah; Kevin Grittner; pgsql-
> 
> Subject: Re: [PERFORM] cpu bound postgresql setup.
>
> Benjamin Krajmalnik wrote:
> > Rajesh,
> >
> > I had a similar situation a few weeks ago whereby performance all of
> a
> > sudden decreased.
> > The one tunable which resolved the problem in my case was increasing
> the
> > number of checkpoint segments.
> > After increasing them, everything went back to its normal state.
>
> Did you get a db server log message suggesting in increasing that
> setting? I hope so.
>
> --
>   Bruce Momjian  <>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + None of us is going to be here forever. +

От:
Bruce Momjian
Дата:

Benjamin Krajmalnik wrote:
> > -----Original Message-----
> > From: Bruce Momjian [mailto:]
> > Sent: Monday, June 28, 2010 3:45 PM
> > To: Benjamin Krajmalnik
> > Cc: Rajesh Kumar Mallah; Kevin Grittner; pgsql-
> > 
> > Subject: Re: [PERFORM] cpu bound postgresql setup.
> >
> > Benjamin Krajmalnik wrote:
> > > Rajesh,
> > >
> > > I had a similar situation a few weeks ago whereby performance all of
> > a
> > > sudden decreased.
> > > The one tunable which resolved the problem in my case was increasing
> > the
> > > number of checkpoint segments.
> > > After increasing them, everything went back to its normal state.
> >
> > Did you get a db server log message suggesting in increasing that
> > setting? I hope so.

> Unfortunately not.  The behavior I had was ebbs and flows.  On FreeBSD,
> I was seeing a lot of kernel wait states in top.  So every few minutes,
> responsiveness of the db was pretty bad.  8.4.4/amd64 on FreeBSD 7.2

Bummer.  What is supposed to happen is if you are checkpointing too
frequently ( < 30 seconds), you get the suggestion about increasing
checkpoint_segments. I have not heard of cases where you are not
checkpointing too frequently, and increasing checkpoint_segments helps.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + None of us is going to be here forever. +