Обсуждение: UPDATES hang every 5 minutes

Поиск
Список
Период
Сортировка

UPDATES hang every 5 minutes

От
mr19
Дата:
I have a process that updates ~ 1500 rows in a table once a second.  Every 5
minutes (almost exactly) the update takes ~ 15 seconds (normally < 1).  I
have run htop/top on the machine during this time period and do not see
anything unusual.  I am running postgres 8.1.8 on a FC6 box.

Any type of internal accounting/logging set to 5 minute intervals by default
within postgres?  Right now I am looking at pg_stats activity to see if
there is anything there but so far have had no luck.

TIA
Marc
--
View this message in context: http://www.nabble.com/UPDATES-hang-every-5-minutes-tf4231199.html#a12037580
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: UPDATES hang every 5 minutes

От
Steve Crawford
Дата:
mr19 wrote:
> I have a process that updates ~ 1500 rows in a table once a second.  Every 5
> minutes (almost exactly) the update takes ~ 15 seconds (normally < 1)....

autovacuum_naptime perhaps?

Cheers,
Steve


Re: UPDATES hang every 5 minutes

От
Tom Lane
Дата:
mr19 <marc_rossi@yahoo.com> writes:
> I have a process that updates ~ 1500 rows in a table once a second.  Every 5
> minutes (almost exactly) the update takes ~ 15 seconds (normally < 1).

Checkpoints?

> I have run htop/top on the machine during this time period and do not see
> anything unusual.

Try watching "vmstat 1" --- your machine is probably I/O bound not
CPU bound.

            regards, tom lane

Re: UPDATES hang every 5 minutes

От
Joseph S
Дата:
Keep an eye on  pg_stat_activity and pg_locks to see if any lock
contention is going on.

mr19 wrote:
> I have a process that updates ~ 1500 rows in a table once a second.  Every 5
> minutes (almost exactly) the update takes ~ 15 seconds (normally < 1).  I
> have run htop/top on the machine during this time period and do not see
> anything unusual.  I am running postgres 8.1.8 on a FC6 box.
>
> Any type of internal accounting/logging set to 5 minute intervals by default
> within postgres?  Right now I am looking at pg_stats activity to see if
> there is anything there but so far have had no luck.
>
> TIA
> Marc

Re: UPDATES hang every 5 minutes

От
Greg Smith
Дата:
On Tue, 7 Aug 2007, mr19 wrote:

> I have a process that updates ~ 1500 rows in a table once a second.  Every 5
> minutes (almost exactly) the update takes ~ 15 seconds (normally < 1).

Lots of updates will trigger checkpoints and, if you have auto-vacuum
turned on, regular vacuum activity--either of which could contribute to
your long delays.  A 15 second long pause sounds more like checkpoints to
me.  Try increasing checkpoint_warning in your postgresql.conf file to its
maximum of 3600 and restart the server when you can tolerate a small
service disruption; that will get you a note in the logs every time one
happens so you can see if they line up with the slowdowns.

You may want to crank up log_min_messages to see what's going on as well,
at least to INFO as a start.  If you can tolerate the verbosity for a bit
while testing, you'll probably find out more than ever wanted to know
about what internals are happening during your slowdown point if you
increase that to DEBUG2, then set log_min_duration_statement to 2 seconds
so it logs all the long updates.  Probably want to put "%m" somewhere in
your log_line_prefix as well so you can see timestamps on all that data.

To get more specific advice, report if you see anything interesting in
your log files and send some details about the PostgreSQL version you're
using and the settings in the postgresql.conf file that you've changed
from their defaults.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: UPDATES hang every 5 minutes

От
Tom Lane
Дата:
Greg Smith <gsmith@gregsmith.com> writes:
> Try increasing checkpoint_warning in your postgresql.conf file to its
> maximum of 3600 and restart the server when you can tolerate a small
> service disruption;

You don't need a server restart to change checkpoint_warning --- SIGHUP
("pg_ctl reload") should be enough.

            regards, tom lane

Re: UPDATES hang every 5 minutes

От
"Scott Marlowe"
Дата:
On 8/9/07, Greg Smith <gsmith@gregsmith.com> wrote:
> On Tue, 7 Aug 2007, mr19 wrote:
>
> > I have a process that updates ~ 1500 rows in a table once a second.  Every 5
> > minutes (almost exactly) the update takes ~ 15 seconds (normally < 1).
>
> Lots of updates will trigger checkpoints and, if you have auto-vacuum
> turned on, regular vacuum activity--either of which could contribute to
> your long delays.  A 15 second long pause sounds more like checkpoints to
> me.  Try increasing checkpoint_warning in your postgresql.conf file to its
> maximum of 3600 and restart the server when you can tolerate a small
> service disruption; that will get you a note in the logs every time one
> happens so you can see if they line up with the slowdowns.

Wouldn't that be the other way around, set checkpoint_warning to 1 so
it triggers every time the checkpoint happens?

Note you don't have to stop / restart, just reload, and the default
checkpoint timeout is 5 minutes.

Would increasing the checkpoint_timeout and adjusting the bgwriter
settings help here?

Re: UPDATES hang every 5 minutes

От
Greg Smith
Дата:
On Thu, 9 Aug 2007, Scott Marlowe wrote:

> Wouldn't that be the other way around, set checkpoint_warning to 1 so
> it triggers every time the checkpoint happens?

The log message appears if the checkpoints happen more frequently than the
value, so setting to 1 would only trigger a warning if you got a
checkpoint more than once a second.  Using the max of 3600 will spit out a
warning every time there's a checkpoint as long as they happen more than
once per hour, which means in any normal configurations you'll get every
one of them logged.

> Note you don't have to stop / restart, just reload, and the default
> checkpoint timeout is 5 minutes.

I suggested a couple of other changes as well which is why I suggested
touching things during a maintenance window, but as you and Tom have
pointed out you can adjust checkpoint_warning without taking the server
down.  The fact that the default timeout matches exactly when he's seeing
the slowdowns, once every five minutes, is almost certainly the smoking
gun that this is a checkpoint issue, but Marc should confirm that before
just assuming it's the case.

> Would increasing the checkpoint_timeout and adjusting the bgwriter
> settings help here?

That's why I suggested he give some more configuration info before anyone
could say how to solve the problem.  For all we know, moving
checkpoint_timeout upward will just shift the period between pauses to one
based on checkpoint_segments instead, which might not be a big difference.
Also, increasing the timeout has the potential to make the spikes even
longer when they finally do happen.

Engineering the pauses out using the background writer in this sort of
situation (very frequent updates) can be very hard to do.  There's new
code coming in 8.3 that addresses this issue head-on, it can be tricky to
accomplish in the current production releases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: UPDATES hang every 5 minutes

От
Marc Rossi
Дата:
Thanks for all the replies.  As of right now I think I have it narrowed down to checkpoints based on the iostat
activityI see when the hangs occur as well as the checkpoint_timeout defaulting to 5 min. 

I've upped checkpoint_warnings to 3600 to confirm but also made a few other changes.  I moved the pg_xlog dir to
anotherdisk (unfortunately it is the root volume) as well as made changes to the bgwriter settings as shown below
(takenfrom a post in the pgsql-performance list) 

bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers scanned/round
 bgwriter_lru_maxpages = 200             # 0-1000 buffers max written/round
 bgwriter_all_percent = 10.0             # 0-100% of all buffers scanned/round
 bgwriter_all_maxpages = 600             # 0-1000 buffers max written/round

I won't know until tomorrow if this solves anything but will post back to this thread for others who may experience the
sameproblem in the future. 

Thanks again,
Marc


----- Original Message ----
From: Greg Smith <gsmith@gregsmith.com>
Cc: mr19 <marc_rossi@yahoo.com>; pgsql-general@postgresql.org
Sent: Thursday, August 9, 2007 11:10:42 PM
Subject: Re: [GENERAL] UPDATES hang every 5 minutes

On Thu, 9 Aug 2007, Scott Marlowe wrote:

> Wouldn't that be the other way around, set checkpoint_warning to 1 so
> it triggers every time the checkpoint happens?

The log message appears if the checkpoints happen more frequently than the
value, so setting to 1 would only trigger a warning if you got a
checkpoint more than once a second.  Using the max of 3600 will spit out a
warning every time there's a checkpoint as long as they happen more than
once per hour, which means in any normal configurations you'll get every
one of them logged.

> Note you don't have to stop / restart, just reload, and the default
> checkpoint timeout is 5 minutes.

I suggested a couple of other changes as well which is why I suggested
touching things during a maintenance window, but as you and Tom have
pointed out you can adjust checkpoint_warning without taking the server
down.  The fact that the default timeout matches exactly when he's seeing
the slowdowns, once every five minutes, is almost certainly the smoking
gun that this is a checkpoint issue, but Marc should confirm that before
just assuming it's the case.

> Would increasing the checkpoint_timeout and adjusting the bgwriter
> settings help here?

That's why I suggested he give some more configuration info before anyone
could say how to solve the problem.  For all we know, moving
checkpoint_timeout upward will just shift the period between pauses to one
based on checkpoint_segments instead, which might not be a big difference.
Also, increasing the timeout has the potential to make the spikes even
longer when they finally do happen.

Engineering the pauses out using the background writer in this sort of
situation (very frequent updates) can be very hard to do.  There's new
code coming in 8.3 that addresses this issue head-on, it can be tricky to
accomplish in the current production releases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD




Re: UPDATES hang every 5 minutes

От
Greg Smith
Дата:
On Thu, 9 Aug 2007, Marc Rossi wrote:

> as well as made changes to the bgwriter settings as shown below (taken
> from a post in the pgsql-performance list)
>bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers scanned/round
>bgwriter_lru_maxpages = 200             # 0-1000 buffers max written/round
>bgwriter_all_percent = 10.0             # 0-100% of all buffers scanned/round
>bgwriter_all_maxpages = 600             # 0-1000 buffers max written/round

Be warned that these are settings from a much more powerful server than it
sounds like you have, and I wouldn't be surprised to find your average
performance tanks as a result.  Making the background writer this
aggressive will waste a lot of I/O, and unless you've got a lot of spare
I/O to waste (which was the case on the source of this tuning) it can make
your problem worse.  I'd hesitate to recommend setting either percentage
over 5% or either maxpages>100 as a first step on a production system.
You may be in for a bad day tomorrow.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: UPDATES hang every 5 minutes

От
Marc Rossi
Дата:
Greg -

Thanks for the heads up.  The box in question is a dual cpu (xeon dual cores) with 8 gig & a pair of 10k 146gb raid 1
arrays. I have the pg_xlog dir on one array (along with the OS) & the rest of the data on the other array by itself. 

Given that this is a production system I'm going to tone things down a bit as you suggested prior to the open today.
WhileI don't like the 10-20 second pauses every 5 minutes it's a system I need to have running and I'd rather not take
thechance of bringing the system to its knees.   

A couple of quick questions.  On the fly I can change these params and use 'pg_ctl reload" to put them in effect,
correct?  That way I can play a little today and see what the effects are.  Also, I have my checkpoint_segments set to
10,if I were to lower this (say 5) would this possible have the effect of checkpointing a little more often with less
data? (right now I hit the checkpoint_timeout). 


Thanks again,
Marc

----- Original Message ----
From: Greg Smith <gsmith@gregsmith.com>
To: Marc Rossi <marc_rossi@yahoo.com>
Cc: pgsql-general@postgresql.org
Sent: Friday, August 10, 2007 2:36:28 AM
Subject: Re: [GENERAL] UPDATES hang every 5 minutes

On Thu, 9 Aug 2007, Marc Rossi wrote:

> as well as made changes to the bgwriter settings as shown below (taken
> from a post in the pgsql-performance list)
>bgwriter_lru_percent = 20.0             # 0-100% of LRU buffers scanned/round
>bgwriter_lru_maxpages = 200             # 0-1000 buffers max written/round
>bgwriter_all_percent = 10.0             # 0-100% of all buffers scanned/round
>bgwriter_all_maxpages = 600             # 0-1000 buffers max written/round

Be warned that these are settings from a much more powerful server than it
sounds like you have, and I wouldn't be surprised to find your average
performance tanks as a result.  Making the background writer this
aggressive will waste a lot of I/O, and unless you've got a lot of spare
I/O to waste (which was the case on the source of this tuning) it can make
your problem worse.  I'd hesitate to recommend setting either percentage
over 5% or either maxpages>100 as a first step on a production system.
You may be in for a bad day tomorrow.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD




Re: UPDATES hang every 5 minutes

От
Greg Smith
Дата:
On Fri, 10 Aug 2007, Marc Rossi wrote:

> Thanks for the heads up.  The box in question is a dual cpu (xeon dual
> cores) with 8 gig & a pair of 10k 146gb raid 1 arrays.  I have the
> pg_xlog dir on one array (along with the OS) & the rest of the data on
> the other array by itself.

Yeah, that's kinda what I thought.  Here's a quick way to think about the
background writer parameters.  Pages are 8KB, and the default
bgwriter_delay will execute every 200ms or 5 times per second.  If you set
one of the maxpages parameters to 100, that means that component of the
background writer might write as much as 8KB*100*5=4MB/sec worth of data
to the disk continuously.  The config you were copying had the maxpages
parameters set to 200 and 600, which meant that in a heavy update
situation the background writer might be writing out (2+6)*4=32MB/second
*on top of* all the other reads ands writes going on.  That's a pretty
substantial additional load to add to just two pairs of disks.

Also, anything written by the all-scan writer (which was by far the more
aggressive one in that setup) has the potential to be a wasted write if
that particular page gets changed again before the next checkpoint, which
sounds highly likely given how you described your workload.  Making those
parameters too high will actually make the system less efficient--and that
can make checkpoints worse.

> A couple of quick questions.  On the fly I can change these params and
> use 'pg_ctl reload" to put them in effect, correct?

I believe all of the ones you'll be touching can be adjusted that way.
To confirm a change took, pop into psql and do "show <parameter>" to see
the current value.

> Also, I have my checkpoint_segments set to 10, if I were to lower this
> (say 5) would this possible have the effect of checkpointing a little
> more often with less data?  (right now I hit the checkpoint_timeout).

You're thinking in the right terms here.  It's possible that by forcing
more frequent checkpoints, the impact of each of them will be reduced.
However, the net will be a less efficient system, because checkpointing is
intensive.  And having twice as many of them doubles how often you get
this spike in response times.

The other approach is to increase the timeout and the segments, so maybe
you're only running into this every 10 minutes or more which makes the
problem less annoying, then try to keep the buffers clean between
checkpoints using the background writer.

What's shared_buffers set to on this system?  A third possibility is to
reduce that and rely more on the operating system to buffer the data for
you.  If there's less data in the buffer cache, it will certainly take
less time to write things out at checkpoint time; there's obviously
downsides to that in terms of how fast regular queries execute.

It will take some experimentation here to get this right.  Checkpoint
problems like you're running into aren't something you knock out in a day
if you're new to this.  Try not to make too big a step at any time and
expect you'll have to dig into this a bit.  I would recommend you start by
intalling the contrib/pg_buffercache module against your database (read
README.pg_buffercache for an intro), which lets you watch what's in the
buffer cache at any time, so you can tell the balance of clean vs. dirty
pages and what tables they're in.  That will let you monitor how effective
the background writer is doing and estimate how bad the checkpoint is
going to be before it happens.  Here's a sample query to get you started:

SELECT c.relname, isdirty, count(*) AS buffers
   FROM pg_class c INNER JOIN pg_buffercache b
   ON b.relfilenode = c.relfilenode INNER JOIN pg_database d
   ON (b.reldatabase = d.oid AND d.datname = current_database())
   GROUP BY c.relname,isdirty
   ORDER BY 2 DESC;

I have something I'm working on that covers a lot of this topic at
http://developer.postgresql.org/index.php/Buffer_Cache%2C_Checkpoints%2C_and_the_BGW
but that's probably a little too low-level for you to chew on usefully
right now.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: UPDATES hang every 5 minutes

От
Marc Rossi
Дата:
Ok, partial day results.  Looks like my changes have not solved the problem, just spread it out a little more (as would
beexpected based on your responses).  The delays are now shorter (about half) but occur more frequently (maybe 1x /
minute). The params I used are: 

bgwriter_lru_percent = 5.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5.0
bgwriter_all_maxpages = 100
wal_buffers = 20

checkpoint_segments = 10
shared_buffers = 196608

I'm going to do some reading today/tonight with the docs/links you have provided and give it another shot on Monday.
Rightnow I'm leaning towards lowering my shared_buffers param as you suggested as third possibility.  I will also take
alook at the buffer cache data to get a better feel. 

It seems to me that the real solution is for me to stop using the database as an IPC system to pass somewhat
time-criticaldata between processes.  Given the time constraints I'm working under this unfortunately was the quickest
route.  

Again, I can't say thanks enough for the great info you have provided.  I'll continue to post any changes/results I
make/see.

Marc




Re: UPDATES hang every 5 minutes

От
Steve Crawford
Дата:
> It seems to me that the real solution is for me to stop using the database as an IPC system to pass somewhat
time-criticaldata between processes.  Given the time constraints I'm working under this unfortunately was the quickest
route.

At least for the first 5 minutes. :) I was wondering about that 1,500
updates/second. PostgreSQL is probably not the optimal solution for IPC.

If you are trying to deal with passing this data among machines,
memcached may be a good solution. Pretty easy to configure and clients
for lots of languages. Blazing fast, too.

Cheers,
Steve