Обсуждение: strange pauses

Поиск
Список
Период
Сортировка

strange pauses

От
Adrian Moisey
Дата:
Hi

I've noticed a strange pattern in our logs.

We only log anything that takes longer then 5 seconds.

If I tail the logs, I see a steady stream of functions that took longer
then 5 seconds.  But every now and then I see nothing for 3 minutes and
after that a whole bunch (about 20 - 30) queries all finishing at the
same second.

During this 3 minutes of nothing being logged, the db is still active
and I can access our website

Looks to me like "something" is "locking" and causing requests to be
delayed.  Once this "lock" unlocks, everything goes through.

Sorry for the bad description, can anybody help?  Where do I start looking?

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: strange pauses

От
Richard Huxton
Дата:
Adrian Moisey wrote:
>
> If I tail the logs, I see a steady stream of functions that took longer
> then 5 seconds.  But every now and then I see nothing for 3 minutes and
> after that a whole bunch (about 20 - 30) queries all finishing at the
> same second.

Search this list for references to "checkpoints". If you run
vmstat/iostat for a bit you should see bursts of disk activity at those
times.

Counter-intuitively you probably want *more* checkpoints (with less data
to write out each time), but see the list archives for discussion. Also
check the manuals for some background info.

HTH

--
   Richard Huxton
   Archonet Ltd

Re: strange pauses

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> Adrian Moisey wrote:
>> If I tail the logs, I see a steady stream of functions that took longer
>> then 5 seconds.  But every now and then I see nothing for 3 minutes and
>> after that a whole bunch (about 20 - 30) queries all finishing at the
>> same second.

> Search this list for references to "checkpoints". If you run
> vmstat/iostat for a bit you should see bursts of disk activity at those
> times.

A different line of thought is that something is taking out an exclusive
lock on a table that all these queries need.  If vmstat doesn't show any
burst of activity during these pauses, look into pg_locks.

            regards, tom lane

Re: strange pauses

От
"Scott Marlowe"
Дата:
On Jan 16, 2008 4:42 AM, Adrian Moisey <adrian@careerjunction.co.za> wrote:
> Hi
>
> I've noticed a strange pattern in our logs.
>
> We only log anything that takes longer then 5 seconds.
>
> If I tail the logs, I see a steady stream of functions that took longer
> then 5 seconds.  But every now and then I see nothing for 3 minutes and
> after that a whole bunch (about 20 - 30) queries all finishing at the
> same second.
>
> During this 3 minutes of nothing being logged, the db is still active
> and I can access our website
>
> Looks to me like "something" is "locking" and causing requests to be
> delayed.  Once this "lock" unlocks, everything goes through.

Hmmm.  Are these queries actually all finishing at the same time or
just showing up in the logs at the same time.  Just thinking there
might be some kind of issue going on with logging.  Are you using
syslog or postgresql stderr logging?

Re: strange pauses

От
Greg Smith
Дата:
On Wed, 16 Jan 2008, Richard Huxton wrote:

> Search this list for references to "checkpoints". If you run vmstat/iostat
> for a bit you should see bursts of disk activity at those times.

The most straightforward way to prove or disprove that the slow queries
line up with checkpoints is to set to checkpoint_warning to a high value
(3600 should work), which should log every checkpoint, and then see if
they show up at the same time in the logs.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: strange pauses

От
Adrian Moisey
Дата:
Hi

>> Search this list for references to "checkpoints". If you run
>> vmstat/iostat for a bit you should see bursts of disk activity at
>> those times.
>
> The most straightforward way to prove or disprove that the slow queries
> line up with checkpoints is to set to checkpoint_warning to a high value
> (3600 should work), which should log every checkpoint, and then see if
> they show up at the same time in the logs.

You guys were spot on.  During these pauses the IO goes up high.

I've got the following set:
checkpoint_timeout = 5min
checkpoint_warning = 3600s

log_min_messages = info

But I see nothing in the logs about checkpoints

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: strange pauses

От
Brian Herlihy
Дата:
We had the same situation, and did two things

1.  Reduce checkpoint timeout
2.  Reduce quantity of data going into database (nice if it's possible!)

1 alone wasn't enough to eliminate the delays, but it did make each delay small enough that the user interface was only
minimallyaffected.  Previously, the delays were causing timeouts in the user interface. 

Our symptoms were that the queries finishing "at the same time" were appearing in clusters every 5 minutes + some
seconds,which happens to be the checkpoint timeout.  Seems a new checkpoint timeout is started only after the
checkpointis complete, hence 5 minute plus, rather than exactly 5 minutes. 

Brian


----- Original Message ----
From: Adrian Moisey <adrian@careerjunction.co.za>

Hi

>> Search this list for references to "checkpoints". If you run
>> vmstat/iostat for a bit you should see bursts of disk activity at
>> those times.
>
> The most straightforward way to prove or disprove that the slow
 queries
> line up with checkpoints is to set to checkpoint_warning to a high
 value
> (3600 should work), which should log every checkpoint, and then see
 if
> they show up at the same time in the logs.

You guys were spot on.  During these pauses the IO goes up high.

I've got the following set:
checkpoint_timeout = 5min
checkpoint_warning = 3600s

log_min_messages = info

But I see nothing in the logs about checkpoints

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

---------------------------(end of
 broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that
 your
       message can get through to the mailing list cleanly




Re: strange pauses

От
"Scott Marlowe"
Дата:
On Jan 17, 2008 12:50 AM, Adrian Moisey <adrian@careerjunction.co.za> wrote:
> Hi
>
> >> Search this list for references to "checkpoints". If you run
> >> vmstat/iostat for a bit you should see bursts of disk activity at
> >> those times.
> >
> > The most straightforward way to prove or disprove that the slow queries
> > line up with checkpoints is to set to checkpoint_warning to a high value
> > (3600 should work), which should log every checkpoint, and then see if
> > they show up at the same time in the logs.
>
> You guys were spot on.  During these pauses the IO goes up high.

You need to tune your background writer process.  See this link:

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Re: strange pauses

От
Adrian Moisey
Дата:
Hi

>> If I tail the logs, I see a steady stream of functions that took
>> longer then 5 seconds.  But every now and then I see nothing for 3
>> minutes and after that a whole bunch (about 20 - 30) queries all
>> finishing at the same second.
>
> Search this list for references to "checkpoints". If you run
> vmstat/iostat for a bit you should see bursts of disk activity at those
> times.
>
> Counter-intuitively you probably want *more* checkpoints (with less data
> to write out each time), but see the list archives for discussion. Also
> check the manuals for some background info.

We use a lot of checkpoints in our database, we don't care about these
much.  We think this is causing the checkpoint to write a lot of data at
checkpoint time.  Can we stop this behavior ?

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: strange pauses

От
Adrian Moisey
Дата:
Hi

>>> If I tail the logs, I see a steady stream of functions that took
>>> longer then 5 seconds.  But every now and then I see nothing for 3
>>> minutes and after that a whole bunch (about 20 - 30) queries all
>>> finishing at the same second.
>>
>> Search this list for references to "checkpoints". If you run
>> vmstat/iostat for a bit you should see bursts of disk activity at
>> those times.
>>
>> Counter-intuitively you probably want *more* checkpoints (with less
>> data to write out each time), but see the list archives for
>> discussion. Also check the manuals for some background info.
>
> We use a lot of checkpoints in our database, we don't care about these

We use a lot of temporary tables in our database, sorry.

> much.  We think this is causing the checkpoint to write a lot of data at
> checkpoint time.  Can we stop this behavior ?
>

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: strange pauses

От
Alvaro Herrera
Дата:
Adrian Moisey wrote:
> Hi
>
>>>> If I tail the logs, I see a steady stream of functions that took longer
>>>> then 5 seconds.  But every now and then I see nothing for 3 minutes and
>>>> after that a whole bunch (about 20 - 30) queries all finishing at the
>>>> same second.
>>>
>>> Search this list for references to "checkpoints". If you run
>>> vmstat/iostat for a bit you should see bursts of disk activity at those
>>> times.
>>>
>>> Counter-intuitively you probably want *more* checkpoints (with less data
>>> to write out each time), but see the list archives for discussion. Also
>>> check the manuals for some background info.
>>
>> We use a lot of checkpoints in our database, we don't care about these
>
> We use a lot of temporary tables in our database, sorry.
>
>> much.  We think this is causing the checkpoint to write a lot of data at
>> checkpoint time.  Can we stop this behavior ?

No, data written in temp tables do not cause extra I/O on checkpoint.
Catalog changes due to temp table creation could, but I doubt that's the
problem.

Perhaps, if you want to avoid I/O caused by temp tables (but it's not at
checkpoint time, so perhaps this has nothing to do with your problem),
you could try raising temp_buffers.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: strange pauses

От
"Scott Marlowe"
Дата:
On Jan 18, 2008 4:58 AM, Adrian Moisey <adrian@careerjunction.co.za> wrote:
>
> We use a lot of checkpoints in our database, we don't care about these
> much.  We think this is causing the checkpoint to write a lot of data at
> checkpoint time.  Can we stop this behavior ?

You can smooth out the checkpoint behaviour by using that link I
posted earlier to tune your bgwriter to write things out ahead of when
the checkpoint will occurr.

Re: strange pauses

От
Adrian Moisey
Дата:
Hi

> Perhaps, if you want to avoid I/O caused by temp tables (but it's not at
> checkpoint time, so perhaps this has nothing to do with your problem),
> you could try raising temp_buffers.

How can I find out if temp_buffers is being exceeded ?

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

Re: strange pauses

От
Decibel!
Дата:
On Jan 21, 2008, at 1:58 AM, Adrian Moisey wrote:
>> Perhaps, if you want to avoid I/O caused by temp tables (but it's
>> not at
>> checkpoint time, so perhaps this has nothing to do with your
>> problem),
>> you could try raising temp_buffers.
>
> How can I find out if temp_buffers is being exceeded ?

You could monitor the pgsql_tmp directory under the appropriate
database directory ($PGDATA/base/oid_number_of_the_database). You
could also see how many pages temporary objects in that connection
are using; you'd have to find the temp schema that your session is
using (\dn pg_temp* from psql), and then

SELECT sum(relpages) FROM pg_class c JOIN pg_namespace n ON
(c.relnamespace=n.oid) AND n.nspname='pg_temp_blah';
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Вложения