Обсуждение: Stats collector eats my CPU
This is top of my 'top': 15483 postgres 15 0 147m 31m 284 R 17 0.8 29033:23 postgres 24599 postgres 15 0 1293m 274m 231m S 2 6.9 0:02.05 postgres 24598 postgres 15 0 1258m 99m 88m S 1 2.5 0:00.62 postgres the 15483 process is stats collector. At the moment server is almost idle but the stats collector is constantly taking 15-17% of CPU. My statistics related settings looks like this: track_activities on track_counts on default_statistics_target 100 I don't know if it matters at all, but maybe the reason is that the cluster is very large in the term of relation number (many schemes with identical table set). ------------------------------------------------------------------------------------ select count(*) from pg_class; count -------- 257477 ------------------------------------------------------------------------------------ I'm using PostgreSQL 8.3.1 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14) The 15% isn't much and I can afford it but what's strange there is really small trafic at the moment (in average about 5%), and the DML's are about 5% of it. So what is the stat collector doing? I'm affraid what will happend if the real query traffic will increase. Maybe the PGSTAT_STAT_INTERVAL is the issue in my case (vary large schema)? The PG is from yum. Any suggestions? Should I worry? Can I do anything about it? (adjusting schema will not be easy :D ) Wojtek Strzalka
wstrzalka <wstrzalka@gmail.com> writes:
> the 15483 process is stats collector. At the moment server is almost
> idle but the stats collector is constantly taking 15-17% of CPU.
> I don't know if it matters at all, but maybe the reason is that the
> cluster is very large in the term of relation number (many schemes
> with identical table set).
> select count(*) from pg_class;
>  count
> --------
>  257477
Ouch.  You might want to consider a schema redesign.  Usually, if you've
got a lot of tables with the same column-set, it's better to combine
them into one big table with an additional key column.
I'm sure the stats collector runtime is directly tied to having so many
tables --- it's trying to keep stats on each one of them individually.
            regards, tom lane
			
		On Wed, Oct 8, 2008 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > wstrzalka <wstrzalka@gmail.com> writes: >> the 15483 process is stats collector. At the moment server is almost >> idle but the stats collector is constantly taking 15-17% of CPU. > >> I don't know if it matters at all, but maybe the reason is that the >> cluster is very large in the term of relation number (many schemes >> with identical table set). > >> select count(*) from pg_class; >> count >> -------- >> 257477 > > Ouch. You might want to consider a schema redesign. Usually, if you've > got a lot of tables with the same column-set, it's better to combine > them into one big table with an additional key column. > > I'm sure the stats collector runtime is directly tied to having so many > tables --- it's trying to keep stats on each one of them individually. Unfortunately there are other competing issues with huge tables, like long vacuums. There's been some work to mitigate this, but we haven't turned the corner yet. IMNSHO, though, table partitioning is a feature that should be used...cautiously. merlin
			
				I have a large number of tables as well (though nowhere near 25000) and looked into autovacuum configuration:
oliver@gtwm.co.uk / 0845 456 1810 / 07814 828608
			
		
		
	The autovacuum daemon runs ANALYZE commands when necessary which will collect stats.
To test whether this is the cause, you could try to disable autovacuum temporarily and see if there's any effect. If so, read & understand the config. options and see what's relevant given how your tables are used. In my case I just increased autovacuum_naptime from 1min to 10min which reduced my processor usage from very little to practically negligible with no adverse effect. I believe you can define table-specific options if necessary.
That default_statistics_target parameter is larger than the default of 10, presumably by design? It'll also create more processing.
Regards
Oliver Kohll 
On 8 Oct 2008, at 12:07, wstrzalka wrote:
This is top of my 'top':
15483 postgres 15 0 147m 31m 284 R 17 0.8 29033:23 postgres
24599 postgres 15 0 1293m 274m 231m S 2 6.9 0:02.05 postgres
24598 postgres 15 0 1258m 99m 88m S 1 2.5 0:00.62 postgres
the 15483 process is stats collector. At the moment server is almost
idle but the stats collector is constantly taking 15-17% of CPU.
My statistics related settings looks like this:
track_activities on
track_counts on
default_statistics_target 100
I don't know if it matters at all, but maybe the reason is that the
cluster is very large in the term of relation number (many schemes
with identical table set).
------------------------------------------------------------------------------------
select count(*) from pg_class;
count
--------
257477
------------------------------------------------------------------------------------
I'm using PostgreSQL 8.3.1 on x86_64-redhat-linux-gnu, compiled by GCC
gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)
The 15% isn't much and I can afford it but what's strange there is
really small trafic at the moment (in average about 5%), and the DML's
are about 5% of it. So what is the stat collector doing? I'm affraid
what will happend if the real query traffic will increase.
Maybe the PGSTAT_STAT_INTERVAL is the issue in my case (vary large
schema)? The PG is from yum.
Any suggestions? Should I worry? Can I do anything about it?
(adjusting schema will not be easy :D )
Wojtek Strzalka
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
oliver@gtwm.co.uk / 0845 456 1810 / 07814 828608
www.gtwm.co.uk - company
www.gtportalbase.com - product
On Wed, 2008-10-08 at 09:34 -0400, Merlin Moncure wrote: > On Wed, Oct 8, 2008 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > wstrzalka <wstrzalka@gmail.com> writes: > >> the 15483 process is stats collector. At the moment server is almost > >> idle but the stats collector is constantly taking 15-17% of CPU. > > > >> I don't know if it matters at all, but maybe the reason is that the > >> cluster is very large in the term of relation number (many schemes > >> with identical table set). > > > >> select count(*) from pg_class; > >> count > >> -------- > >> 257477 > > > > Ouch. You might want to consider a schema redesign. Usually, if you've > > got a lot of tables with the same column-set, it's better to combine > > them into one big table with an additional key column. > > > > I'm sure the stats collector runtime is directly tied to having so many > > tables --- it's trying to keep stats on each one of them individually. > > Unfortunately there are other competing issues with huge tables, like > long vacuums. There's been some work to mitigate this, but we haven't > turned the corner yet. IMNSHO, though, table partitioning is a > feature that should be used...cautiously. Siebel has 20,000 tables in its data model and that's the biggest I know of. However, I've seen partitioned designs with more than 100,000 tables. 250,000 is a lot for Postgres, but we should be designing Postgres to cope with up to 1,000,000 tables or partitions. There's lots of data out there and if it doesn't come to us it will go elsewhere. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
For clarification 250k is total relation count. There is 85k plain tables + 45k toasts. Except stats collector overload, the cluster works fine ( no really surprising, while we are using world best database :D ) > On Wed, 2008-10-08 at 09:34 -0400, Merlin Moncure wrote: >> On Wed, Oct 8, 2008 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > wstrzalka <wstrzalka@gmail.com> writes: >> >> the 15483 process is stats collector. At the moment server is almost >> >> idle but the stats collector is constantly taking 15-17% of CPU. >> > >> >> I don't know if it matters at all, but maybe the reason is that the >> >> cluster is very large in the term of relation number (many schemes >> >> with identical table set). >> > >> >> select count(*) from pg_class; >> >> count >> >> -------- >> >> 257477 >> > >> > Ouch. You might want to consider a schema redesign. Usually, if you've >> > got a lot of tables with the same column-set, it's better to combine >> > them into one big table with an additional key column. >> > >> > I'm sure the stats collector runtime is directly tied to having so many >> > tables --- it's trying to keep stats on each one of them individually. >> >> Unfortunately there are other competing issues with huge tables, like >> long vacuums. There's been some work to mitigate this, but we haven't >> turned the corner yet. IMNSHO, though, table partitioning is a >> feature that should be used...cautiously. > Siebel has 20,000 tables in its data model and that's the biggest I know > of. However, I've seen partitioned designs with more than 100,000 > tables. 250,000 is a lot for Postgres, but we should be designing > Postgres to cope with up to 1,000,000 tables or partitions. There's lots > of data out there and if it doesn't come to us it will go elsewhere. -- Pozdrowienia, Wojciech Strzałka
2008/10/8 Wojciech Strzałka <wstrzalka@gmail.com>: > > For clarification 250k is total relation count. > There is 85k plain tables + 45k toasts. > > Except stats collector overload, the cluster works fine ( no really > surprising, while we are using world best database :D ) Couple of other points here: *) stats collector process is very long lived and does things, so will accumulate cpu time according to top over the shorter lived backend connections *) most servers are multi core these days...stats will eat at most one core maybe this is a non-problem? are you seeing other issues outside of high cpu time according to top? merlin
No. The only side effects is constant CPU load on this level (15%) regardless if there is DB activity or no. The machine is 2*quad so 15 of 800 isn't really much - but I afraid what will be, if I'll get more significant traffic there. As you mentioned the total time summary is the effect of uptime which is about half a year now. > 2008/10/8 Wojciech Strzałka <wstrzalka@gmail.com>: >> >> For clarification 250k is total relation count. >> There is 85k plain tables + 45k toasts. >> >> Except stats collector overload, the cluster works fine ( no really >> surprising, while we are using world best database :D ) > Couple of other points here: > *) stats collector process is very long lived and does things, so will > accumulate cpu time according to top over the shorter lived backend > connections > *) most servers are multi core these days...stats will eat at most one core > maybe this is a non-problem? are you seeing other issues outside of > high cpu time according to top? > merlin -- Pozdrowienia, Wojciech Strzałka