Обсуждение: long wait times in ProcessCatchupEvent()
Hi I have the problem that on our servers it happens regularly under a certain workload (several times per minute) that all backend processes get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At 100-200 connections (most of them idle) this causes the system load to skyrocket. I am not really familiar with the code but my wild guess is that the processes spend most of their time waiting for spinlocks. We have reduced the number of connections as much as possible for now but it still makes up for roughly 50% of the total CPU time. Has anyone experienced a similar problem? I can reproduce the issue on a test system with production data but it is not so easy to pinpoint what exactly causes the problem. The queries are basically tsearch2 full text searches over moderately big tables (~35GB). The queries are performed by functions which aggregate data from partitions in temporary tables, cache some data, and perform calculations before returning it to the user. The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores and 16GB of ram. I experimented with shared_buffers between 1GB and 4GB but it doesn't make much of a difference. Disk IO doesn't seem to be an issue here. Regards, Julian v. Bock -- Julian v. Bock Projektleitung Software-Entwicklung OpenIT GmbH Tel +49 211 239 577-0 In der Steele 33a-41 Fax +49 211 239 577-10 D-40599 Düsseldorf http://www.openit.de ________________________________________________________________ HRB 38815 Amtsgericht Düsseldorf USt-Id DE 812951861 Geschäftsführer: Oliver Haakert, Maurice Kemmann
bock@openit.de (Julian v. Bock) writes: > I have the problem that on our servers it happens regularly under a > certain workload (several times per minute) that all backend processes > get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). This is fixed in 8.4 and up. http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php If you aren't willing to move off 8.3 you might be able to ameliorate the problem by reducing the volume of catalog changes, but that can be pretty hard if you're dependent on temp tables. regards, tom lane
On 12/29/10 6:28 AM, Julian v. Bock wrote: > I have the problem that on our servers it happens regularly under a > certain workload (several times per minute) that all backend processes > get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). At > 100-200 connections (most of them idle) this causes the system load to > skyrocket. I am not really familiar with the code but my wild guess is > that the processes spend most of their time waiting for spinlocks. > > We have reduced the number of connections as much as possible for now > but it still makes up for roughly 50% of the total CPU time. Has > anyone experienced a similar problem? > > I can reproduce the issue on a test system with production data but it > is not so easy to pinpoint what exactly causes the problem. The queries > are basically tsearch2 full text searches over moderately big tables > (~35GB). The queries are performed by functions which aggregate data > from partitions in temporary tables, cache some data, and perform > calculations before returning it to the user. > > The PostgreSQL version is 8.3.12, the test server has 8 amd64 cores > and 16GB of ram. I experimented with shared_buffers between 1GB and > 4GB but it doesn't make much of a difference. Disk IO doesn't seem to > be an issue here. This sounds like the exact same problem I had on Postgres 8.3 and 8.4: http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we neverconfirmed it. http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php Craig
Craig James <craig_james@emolecules.com> writes: > On 12/29/10 6:28 AM, Julian v. Bock wrote: >> I have the problem that on our servers it happens regularly under a >> certain workload (several times per minute) that all backend processes >> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). > This sounds like the exact same problem I had on Postgres 8.3 and 8.4: > http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php > Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we neverconfirmed it. > http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php It happened for you on 8.4 too? In that previous thread you were still on 8.3. If you did see it on 8.4 then it wasn't sinval ... regards, tom lane
On 12/29/10 11:58 AM, Tom Lane wrote: > Craig James<craig_james@emolecules.com> writes: >> On 12/29/10 6:28 AM, Julian v. Bock wrote: >>> I have the problem that on our servers it happens regularly under a >>> certain workload (several times per minute) that all backend processes >>> get a SIGUSR1 and spend several seconds in ProcessCatchupEvent(). > >> This sounds like the exact same problem I had on Postgres 8.3 and 8.4: > >> http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php > >> Updating to Postgres version 9 fixed it. Here is what appeared to be the best analysis of what was happening, but we neverconfirmed it. > >> http://archives.postgresql.org/pgsql-performance/2010-06/msg00464.php > > It happened for you on 8.4 too? In that previous thread you were still > on 8.3. If you did see it on 8.4 then it wasn't sinval ... My mistake - it was only 8.3. Craig
On 12/29/2010 2:58 PM, Tom Lane wrote: > It happened for you on 8.4 too? In that previous thread you were still > on 8.3. If you did see it on 8.4 then it wasn't sinval ... > > regards, tom lane > May I ask what exactly is "sinval"? I took a look at Craig's problem and your description but I wasn't able to figure out what is sinval lock and what does it lock? I apologize if the question is stupid. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > May I ask what exactly is "sinval"? I took a look at Craig's > problem and your description but I wasn't able to figure out what > is sinval lock and what does it lock? I apologize if the question > is stupid. This area could probably use a README file, but you can get a good idea from the comment starting on line 30 of this file: http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/storage/ipc/sinvaladt.c;h=7910346dd55512be13712ea2342586d705bb0b35 It has to do with communication between processes regarding invalidation of the shared cache. -Kevin
Hi >>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: TL> bock@openit.de (Julian v. Bock) writes: >> I have the problem that on our servers it happens regularly under a >> certain workload (several times per minute) that all backend >> processes get a SIGUSR1 and spend several seconds in >> ProcessCatchupEvent(). TL> This is fixed in 8.4 and up. TL> http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php Thanks for the quick reply. TL> If you aren't willing to move off 8.3 you might be able to TL> ameliorate the problem by reducing the volume of catalog changes, TL> but that can be pretty hard if you're dependent on temp tables. Upgrading to 8.4 or 9.0 is not possible at the moment but if it is only catalog changes I can probably work around that. Regards, Julian v. Bock -- Julian v. Bock Projektleitung Software-Entwicklung OpenIT GmbH Tel +49 211 239 577-0 In der Steele 33a-41 Fax +49 211 239 577-10 D-40599 Düsseldorf http://www.openit.de ________________________________________________________________ HRB 38815 Amtsgericht Düsseldorf USt-Id DE 812951861 Geschäftsführer: Oliver Haakert, Maurice Kemmann