Обсуждение: long wait times in ProcessCatchupEvent()

Поиск
Список
Период
Сортировка

long wait times in ProcessCatchupEvent()

От
bock@openit.de (Julian v. Bock)
Дата:
Hi

I have the problem that on our servers it happens regularly under a
certain workload (several times per minute) that all backend processes
get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At
100-200 connections (most of them idle) this causes the system load to
skyrocket. I am not really familiar with the code but my wild guess is
that the processes spend most of their time waiting for spinlocks.

We have reduced the number of connections as much as possible for now
but it still makes up for roughly 50% of the total CPU time.  Has
anyone experienced a similar problem?

I can reproduce the issue on a test system with production data but it
is not so easy to pinpoint what exactly causes the problem. The queries
are basically tsearch2 full text searches over moderately big tables
(~35GB). The queries are performed by functions which aggregate data
from partitions in temporary tables, cache some data, and perform
calculations before returning it to the user.

The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores
and 16GB of ram. I experimented with shared_buffers between 1GB and
4GB but it doesn't make much of a difference. Disk IO doesn't seem to
be an issue here.

Regards,
Julian v. Bock

--
Julian v. Bock               Projektleitung Software-Entwicklung
OpenIT GmbH                  Tel +49 211 239 577-0
In der Steele 33a-41         Fax +49 211 239 577-10
D-40599 Düsseldorf           http://www.openit.de
________________________________________________________________
HRB 38815 Amtsgericht Düsseldorf             USt-Id DE 812951861
Geschäftsführer: Oliver Haakert, Maurice Kemmann

Re: long wait times in ProcessCatchupEvent()

От
Tom Lane
Дата:
bock@openit.de (Julian v. Bock) writes:
> I have the problem that on our servers it happens regularly under a
> certain workload (several times per minute) that all backend processes
> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent().

This is fixed in 8.4 and up.
http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php

If you aren't willing to move off 8.3 you might be able to ameliorate
the problem by reducing the volume of catalog changes, but that can be
pretty hard if you're dependent on temp tables.

            regards, tom lane

Re: long wait times in ProcessCatchupEvent()

От
Craig James
Дата:
On 12/29/10 6:28 AM, Julian v. Bock wrote:
> I have the problem that on our servers it happens regularly under a
> certain workload (several times per minute) that all backend processes
> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At
> 100-200 connections (most of them idle) this causes the system load to
> skyrocket. I am not really familiar with the code but my wild guess is
> that the processes spend most of their time waiting for spinlocks.
>
> We have reduced the number of connections as much as possible for now
> but it still makes up for roughly 50% of the total CPU time.  Has
> anyone experienced a similar problem?
>
> I can reproduce the issue on a test system with production data but it
> is not so easy to pinpoint what exactly causes the problem. The queries
> are basically tsearch2 full text searches over moderately big tables
> (~35GB). The queries are performed by functions which aggregate data
> from partitions in temporary tables, cache some data, and perform
> calculations before returning it to the user.
>
> The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores
> and 16GB of ram. I experimented with shared_buffers between 1GB and
> 4GB but it doesn't make much of a difference. Disk IO doesn't seem to
> be an issue here.

This sounds like the exact same problem I had on Postgres 8.3 and 8.4:

http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php

Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we
neverconfirmed it. 

http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php

Craig


Re: long wait times in ProcessCatchupEvent()

От
Tom Lane
Дата:
Craig James <craig_james@emolecules.com> writes:
> On 12/29/10 6:28 AM, Julian v. Bock wrote:
>> I have the problem that on our servers it happens regularly under a
>> certain workload (several times per minute) that all backend processes
>> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent().

> This sounds like the exact same problem I had on Postgres 8.3 and 8.4:

> http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php

> Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we
neverconfirmed it. 

> http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php

It happened for you on 8.4 too?  In that previous thread you were still
on 8.3.  If you did see it on 8.4 then it wasn't sinval ...

            regards, tom lane

Re: long wait times in ProcessCatchupEvent()

От
Craig James
Дата:
On 12/29/10 11:58 AM, Tom Lane wrote:
> Craig James<craig_james@emolecules.com>  writes:
>> On 12/29/10 6:28 AM, Julian v. Bock wrote:
>>> I have the problem that on our servers it happens regularly under a
>>> certain workload (several times per minute) that all backend processes
>>> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent().
>
>> This sounds like the exact same problem I had on Postgres 8.3 and 8.4:
>
>> http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php
>
>> Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we
neverconfirmed it. 
>
>> http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php
>
> It happened for you on 8.4 too?  In that previous thread you were still
> on 8.3.  If you did see it on 8.4 then it wasn't sinval ...

My mistake - it was only 8.3.

Craig


Re: long wait times in ProcessCatchupEvent()

От
Mladen Gogala
Дата:
On 12/29/2010 2:58 PM, Tom Lane wrote:
> It happened for you on 8.4 too?  In that previous thread you were still
> on 8.3.  If you did see it on 8.4 then it wasn't sinval ...
>
>             regards, tom lane
>
May I ask what exactly is "sinval"? I took a look at Craig's problem and
your description but I wasn't able to figure out what is sinval lock and
what does it lock? I apologize if the question is stupid.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


Re: long wait times in ProcessCatchupEvent()

От
"Kevin Grittner"
Дата:
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

> May I ask what exactly is "sinval"? I took a look at Craig's
> problem and your description but I wasn't able to figure out what
> is sinval lock and what does it lock? I apologize if the question
> is stupid.

This area could probably use a README file, but you can get a good
idea from the comment starting on line 30 of this file:


http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/storage/ipc/sinvaladt.c;h=7910346dd55512be13712ea2342586d705bb0b35

It has to do with communication between processes regarding
invalidation of the shared cache.

-Kevin

Re: long wait times in ProcessCatchupEvent()

От
bock@openit.de (Julian v. Bock)
Дата:
Hi

>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes:

TL> bock@openit.de (Julian v. Bock) writes:
>> I have the problem that on our servers it happens regularly under a
>> certain workload (several times per minute) that all backend
>> processes get a SIGUSR1 and spend several seconds in
>> ProcessCatchupEvent().

TL> This is fixed in 8.4 and up.
TL> http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php

Thanks for the quick reply.

TL> If you aren't willing to move off 8.3 you might be able to
TL> ameliorate the problem by reducing the volume of catalog changes,
TL> but that can be pretty hard if you're dependent on temp tables.

Upgrading to 8.4 or 9.0 is not possible at the moment but if it is only
catalog changes I can probably work around that.

Regards,
Julian v. Bock

--
Julian v. Bock               Projektleitung Software-Entwicklung
OpenIT GmbH                  Tel +49 211 239 577-0
In der Steele 33a-41         Fax +49 211 239 577-10
D-40599 Düsseldorf           http://www.openit.de
________________________________________________________________
HRB 38815 Amtsgericht Düsseldorf             USt-Id DE 812951861
Geschäftsführer: Oliver Haakert, Maurice Kemmann