Обсуждение: Postgres becoming slow, only full vacuum fixes it
Hi,
The problem : Postgres is becoming slow, day after day, and only a full vacuum fixes the problem.
Information you may need to evaluate :
The problem lies on all tables and queries, as far as I can tell, but we can focus on a single table for better comprehension.
The queries I am running to test the speed are :
INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');
DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;
SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;
After a full vacuum, they run in about 100ms.
Today, before the full vacuum, they were taking around 500ms.
Below is an explain analyze of the commands AFTER a full vacuum. I did not run it before, so I can not post relevant info before the vacuum. So, after the full vacuum :
explain analyze INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');
"Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)"
"Trigger for constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131 calls=1"
"Trigger bucardo_add_delta: time=0.454 calls=1"
"Trigger bucardo_triggerkick_MassSMs: time=0.032 calls=1"
"Total runtime: 0.818 ms"
explain analyze DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=6) (actual time=0.035..0.035 rows=0 loops=1)"
" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))"
"Trigger bucardo_triggerkick_MassSMs: time=0.066 calls=1"
"Total runtime: 0.146 ms"
explain analyze SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;
"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=557) (actual time=0.028..0.028 rows=0 loops=1)"
" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))"
"Total runtime: 0.053 ms"
Below are the metadata of the table :
=====================================
CREATE TABLE "AWAITINGSTATUSSMPP"
(
"MESSAGEID" bigint NOT NULL,
"SMSCMSGID" character varying(50) NOT NULL,
"CONNECTIONID" smallint NOT NULL,
"EXPIRE_TIME" timestamp without time zone NOT NULL,
"RECIPIENT" character varying(20) NOT NULL,
"CLIENT_MSG_ID" character varying(255),
CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY ("SMSCMSGID", "CONNECTIONID"),
CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN KEY ("MESSAGEID")
REFERENCES "MESSAGES" ("ID") MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres;
GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO "MassSMsUsers";
CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP"
ON "AWAITINGSTATUSSMPP"
USING btree
("MESSAGEID", "RECIPIENT");
CREATE TRIGGER bucardo_add_delta
AFTER INSERT OR UPDATE OR DELETE
ON "AWAITINGSTATUSSMPP"
FOR EACH ROW
EXECUTE PROCEDURE bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"();
CREATE TRIGGER "bucardo_triggerkick_MassSMs"
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON "AWAITINGSTATUSSMPP"
FOR EACH STATEMENT
EXECUTE PROCEDURE bucardo."bucardo_triggerkick_MassSMs"();
=====================================
The table only has about 200 records because it is being used a temporary storage and records are constantly inserted and deleted.
BUT please don't get hold on this fact, because as I already said, the speed problem is not restricted to this table. The same problems appear on the following query
UPDATE "MESSAGES" SET "SENT" = "SENT" + 1 WHERE "ID" = 143447;
and MESSAGES table has mainly inserts and few deletes...
My postgresql.conf file :
======================
port = 5433 # (change requires restart)
max_connections = 100 # (change requires restart)
shared_buffers = 256MB # min 128kB. DoubleIP - Default was 32MB
synchronous_commit = off # immediate fsync at commit. DoubleIP - Default was on
effective_cache_size = 512MB # DoubleIP - Default was 128MB
log_destination = 'stderr' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
silent_mode = on # Run server silently.
log_line_prefix = '%t %d %u ' # special values:
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
autovacuum_naptime = 28800 # time between autovacuum runs. DoubleIP - default was 1min
autovacuum_vacuum_threshold = 100 # min number of row updates before
autovacuum_vacuum_scale_factor = 0.0 # fraction of table size before vacuum. DoubleIP - default was 0.2
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
=======================
As you will see, I have altered the shared_buffers and synchronous_commit values.
The shared_buffers had the default value 32Mb. When I changed it to 256Mb the problem still appears but it takes more time to appear (3-4 days). With 32MB, it appeared faster, probably after 24 hours.
Also, I have changed the autovacuum daemon to work every 8 hours but I changed its values to make sure it vacuums pretty much all tables (the ones for which at least 100 rows have changed).
Please note, though, that my problem existed even before playing around with the autovacuum. This is why I tried to change its values in the first place.
The server is synchronized with another server using bucardo. Bucardo process is running on the other server.
The same problem appears on the 2nd server too... after 3-4 days, postgres is running slower and slower.
Our server configuration :
DELL PowerEdge T610 Tower Chassis for Up to 8x 3.5" HDDs
2x Intel Xeon E5520 Processor (2.26GHz, 8M Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory
8GB Memory,1333MHz
2 x 146GB SAS 15k 3.5" HD Hot Plug
6 x 1TB SATA 7.2k 3.5" Additional HD Hot Plug
PERC 6/i RAID Controller Card 256MB PCIe, 2x4 Connectors
SUSE Linux Enterprise Server 10, SP2
The 2 HDs are set up with RAID-1
The 6 HDs are set up with RAID-5
Linux is running on the RAID-1 configuration
Postgres is running on the RAID-5 configuration
Finally a top before and after the full vacuum :
top - 11:27:44 up 72 days, 13:27, 37 users, load average: 1.05, 1.31, 1.45
Tasks: 279 total, 3 running, 276 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.6%us, 0.8%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8166432k total, 7963116k used, 203316k free, 115344k buffers
Swap: 2097144k total, 2097020k used, 124k free, 2337636k cached
top - 11:30:58 up 72 days, 13:31, 38 users, load average: 1.53, 1.59, 1.53
Tasks: 267 total, 2 running, 265 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3%us, 0.4%sy, 0.0%ni, 98.0%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8166432k total, 6016268k used, 2150164k free, 61092k buffers
Swap: 2097144k total, 2010204k used, 86940k free, 2262896k cached
I hope I have provided enough info and hope that someone can point me to the correct direction.
Thank you very much even for reading up to here !
Best regards,
Kiriakos
Sorry, forgot to mention the most obvious and important information :
My postgres is 8.4.2
On Sep 24, 2012, at 13:33, Kiriakos Tsourapas wrote:
Hi,The problem : Postgres is becoming slow, day after day, and only a full vacuum fixes the problem.Information you may need to evaluate :The problem lies on all tables and queries, as far as I can tell, but we can focus on a single table for better comprehension.The queries I am running to test the speed are :INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;After a full vacuum, they run in about 100ms.Today, before the full vacuum, they were taking around 500ms.Below is an explain analyze of the commands AFTER a full vacuum. I did not run it before, so I can not post relevant info before the vacuum. So, after the full vacuum :explain analyze INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');"Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)""Trigger for constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131 calls=1""Trigger bucardo_add_delta: time=0.454 calls=1""Trigger bucardo_triggerkick_MassSMs: time=0.032 calls=1""Total runtime: 0.818 ms"explain analyze DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=6) (actual time=0.035..0.035 rows=0 loops=1)"" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))""Trigger bucardo_triggerkick_MassSMs: time=0.066 calls=1""Total runtime: 0.146 ms"explain analyze SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=557) (actual time=0.028..0.028 rows=0 loops=1)"" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))""Total runtime: 0.053 ms"Below are the metadata of the table :=====================================CREATE TABLE "AWAITINGSTATUSSMPP"("MESSAGEID" bigint NOT NULL,"SMSCMSGID" character varying(50) NOT NULL,"CONNECTIONID" smallint NOT NULL,"EXPIRE_TIME" timestamp without time zone NOT NULL,"RECIPIENT" character varying(20) NOT NULL,"CLIENT_MSG_ID" character varying(255),CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY ("SMSCMSGID", "CONNECTIONID"),CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN KEY ("MESSAGEID")REFERENCES "MESSAGES" ("ID") MATCH SIMPLEON UPDATE NO ACTION ON DELETE CASCADE)WITH (OIDS=FALSE);ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres;GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO "MassSMsUsers";CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP"ON "AWAITINGSTATUSSMPP"USING btree("MESSAGEID", "RECIPIENT");CREATE TRIGGER bucardo_add_deltaAFTER INSERT OR UPDATE OR DELETEON "AWAITINGSTATUSSMPP"FOR EACH ROWEXECUTE PROCEDURE bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"();CREATE TRIGGER "bucardo_triggerkick_MassSMs"AFTER INSERT OR UPDATE OR DELETE OR TRUNCATEON "AWAITINGSTATUSSMPP"FOR EACH STATEMENTEXECUTE PROCEDURE bucardo."bucardo_triggerkick_MassSMs"();=====================================The table only has about 200 records because it is being used a temporary storage and records are constantly inserted and deleted.BUT please don't get hold on this fact, because as I already said, the speed problem is not restricted to this table. The same problems appear on the following queryUPDATE "MESSAGES" SET "SENT" = "SENT" + 1 WHERE "ID" = 143447;and MESSAGES table has mainly inserts and few deletes...My postgresql.conf file :======================port = 5433 # (change requires restart)max_connections = 100 # (change requires restart)shared_buffers = 256MB # min 128kB. DoubleIP - Default was 32MBsynchronous_commit = off # immediate fsync at commit. DoubleIP - Default was oneffective_cache_size = 512MB # DoubleIP - Default was 128MBlog_destination = 'stderr' # Valid values are combinations oflogging_collector = on # Enable capturing of stderr and csvlogsilent_mode = on # Run server silently.log_line_prefix = '%t %d %u ' # special values:log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions andautovacuum_naptime = 28800 # time between autovacuum runs. DoubleIP - default was 1minautovacuum_vacuum_threshold = 100 # min number of row updates beforeautovacuum_vacuum_scale_factor = 0.0 # fraction of table size before vacuum. DoubleIP - default was 0.2datestyle = 'iso, mdy'lc_messages = 'en_US.UTF-8' # locale for system error messagelc_monetary = 'en_US.UTF-8' # locale for monetary formattinglc_numeric = 'en_US.UTF-8' # locale for number formattinglc_time = 'en_US.UTF-8' # locale for time formattingdefault_text_search_config = 'pg_catalog.english'=======================As you will see, I have altered the shared_buffers and synchronous_commit values.The shared_buffers had the default value 32Mb. When I changed it to 256Mb the problem still appears but it takes more time to appear (3-4 days). With 32MB, it appeared faster, probably after 24 hours.Also, I have changed the autovacuum daemon to work every 8 hours but I changed its values to make sure it vacuums pretty much all tables (the ones for which at least 100 rows have changed).Please note, though, that my problem existed even before playing around with the autovacuum. This is why I tried to change its values in the first place.The server is synchronized with another server using bucardo. Bucardo process is running on the other server.The same problem appears on the 2nd server too... after 3-4 days, postgres is running slower and slower.Our server configuration :DELL PowerEdge T610 Tower Chassis for Up to 8x 3.5" HDDs2x Intel Xeon E5520 Processor (2.26GHz, 8M Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory8GB Memory,1333MHz2 x 146GB SAS 15k 3.5" HD Hot Plug6 x 1TB SATA 7.2k 3.5" Additional HD Hot PlugPERC 6/i RAID Controller Card 256MB PCIe, 2x4 ConnectorsSUSE Linux Enterprise Server 10, SP2The 2 HDs are set up with RAID-1The 6 HDs are set up with RAID-5Linux is running on the RAID-1 configurationPostgres is running on the RAID-5 configurationFinally a top before and after the full vacuum :top - 11:27:44 up 72 days, 13:27, 37 users, load average: 1.05, 1.31, 1.45Tasks: 279 total, 3 running, 276 sleeping, 0 stopped, 0 zombieCpu(s): 3.6%us, 0.8%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%stMem: 8166432k total, 7963116k used, 203316k free, 115344k buffersSwap: 2097144k total, 2097020k used, 124k free, 2337636k cachedtop - 11:30:58 up 72 days, 13:31, 38 users, load average: 1.53, 1.59, 1.53Tasks: 267 total, 2 running, 265 sleeping, 0 stopped, 0 zombieCpu(s): 1.3%us, 0.4%sy, 0.0%ni, 98.0%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%stMem: 8166432k total, 6016268k used, 2150164k free, 61092k buffersSwap: 2097144k total, 2010204k used, 86940k free, 2262896k cachedI hope I have provided enough info and hope that someone can point me to the correct direction.Thank you very much even for reading up to here !Best regards,Kiriakos
Hello,
1) upgrade your PostgreSQL installation, there have been numerous bugfixes releases since 8.4.2
2) you'll have to show us an explain analyze of the slow queries. If I take a look at those you provided everything run i less than 1ms.
3) with 200 records you'll always have a seqscan
4) how much memory do you have ? shared_buffers = 256MB and effective_cache_size = 512MB looks OK only if you have between 1 and 2GB of RAM
5) synchronous_commit = off should only be used if you have a battery-backed write cache.
6) autovacuum_naptime should be changed only if autovacuum is constantly running (so if you have dozen of databases in your cluster)
7) are you sure the problem isn't related to Bucardo ?
Julien
On 09/24/2012 13:55, Kiriakos Tsourapas wrote:
1) upgrade your PostgreSQL installation, there have been numerous bugfixes releases since 8.4.2
2) you'll have to show us an explain analyze of the slow queries. If I take a look at those you provided everything run i less than 1ms.
3) with 200 records you'll always have a seqscan
4) how much memory do you have ? shared_buffers = 256MB and effective_cache_size = 512MB looks OK only if you have between 1 and 2GB of RAM
5) synchronous_commit = off should only be used if you have a battery-backed write cache.
6) autovacuum_naptime should be changed only if autovacuum is constantly running (so if you have dozen of databases in your cluster)
7) are you sure the problem isn't related to Bucardo ?
Julien
On 09/24/2012 13:55, Kiriakos Tsourapas wrote:
Sorry, forgot to mention the most obvious and important information :My postgres is 8.4.2On Sep 24, 2012, at 13:33, Kiriakos Tsourapas wrote:Hi,The problem : Postgres is becoming slow, day after day, and only a full vacuum fixes the problem.Information you may need to evaluate :The problem lies on all tables and queries, as far as I can tell, but we can focus on a single table for better comprehension.The queries I am running to test the speed are :INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;After a full vacuum, they run in about 100ms.Today, before the full vacuum, they were taking around 500ms.Below is an explain analyze of the commands AFTER a full vacuum. I did not run it before, so I can not post relevant info before the vacuum. So, after the full vacuum :explain analyze INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 13:39:19', '111');"Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)""Trigger for constraint FK_AWAITINGSTATUSSMPP_MESSAGES: time=0.131 calls=1""Trigger bucardo_add_delta: time=0.454 calls=1""Trigger bucardo_triggerkick_MassSMs: time=0.032 calls=1""Total runtime: 0.818 ms"explain analyze DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=6) (actual time=0.035..0.035 rows=0 loops=1)"" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))""Trigger bucardo_triggerkick_MassSMs: time=0.066 calls=1""Total runtime: 0.146 ms"explain analyze SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND "CONNECTIONID" = 1;"Seq Scan on "AWAITINGSTATUSSMPP" (cost=0.00..2.29 rows=1 width=557) (actual time=0.028..0.028 rows=0 loops=1)"" Filter: ((("SMSCMSGID")::text = '1111'::text) AND ("CONNECTIONID" = 1))""Total runtime: 0.053 ms"Below are the metadata of the table :=====================================CREATE TABLE "AWAITINGSTATUSSMPP"("MESSAGEID" bigint NOT NULL,"SMSCMSGID" character varying(50) NOT NULL,"CONNECTIONID" smallint NOT NULL,"EXPIRE_TIME" timestamp without time zone NOT NULL,"RECIPIENT" character varying(20) NOT NULL,"CLIENT_MSG_ID" character varying(255),CONSTRAINT "PK_AWAITINGSTATUSSMPP" PRIMARY KEY ("SMSCMSGID", "CONNECTIONID"),CONSTRAINT "FK_AWAITINGSTATUSSMPP_MESSAGES" FOREIGN KEY ("MESSAGEID")REFERENCES "MESSAGES" ("ID") MATCH SIMPLEON UPDATE NO ACTION ON DELETE CASCADE)WITH (OIDS=FALSE);ALTER TABLE "AWAITINGSTATUSSMPP" OWNER TO postgres;GRANT ALL ON TABLE "AWAITINGSTATUSSMPP" TO "MassSMsUsers";CREATE INDEX "IX_AWAITINGSTATUSSMPP_MSGID_RCP"ON "AWAITINGSTATUSSMPP"USING btree("MESSAGEID", "RECIPIENT");CREATE TRIGGER bucardo_add_deltaAFTER INSERT OR UPDATE OR DELETEON "AWAITINGSTATUSSMPP"FOR EACH ROWEXECUTE PROCEDURE bucardo."bucardo_add_delta_SMSCMSGID|CONNECTIONID"();CREATE TRIGGER "bucardo_triggerkick_MassSMs"AFTER INSERT OR UPDATE OR DELETE OR TRUNCATEON "AWAITINGSTATUSSMPP"FOR EACH STATEMENTEXECUTE PROCEDURE bucardo."bucardo_triggerkick_MassSMs"();=====================================The table only has about 200 records because it is being used a temporary storage and records are constantly inserted and deleted.BUT please don't get hold on this fact, because as I already said, the speed problem is not restricted to this table. The same problems appear on the following queryUPDATE "MESSAGES" SET "SENT" = "SENT" + 1 WHERE "ID" = 143447;and MESSAGES table has mainly inserts and few deletes...My postgresql.conf file :======================port = 5433 # (change requires restart)max_connections = 100 # (change requires restart)shared_buffers = 256MB # min 128kB. DoubleIP - Default was 32MBsynchronous_commit = off # immediate fsync at commit. DoubleIP - Default was oneffective_cache_size = 512MB # DoubleIP - Default was 128MBlog_destination = 'stderr' # Valid values are combinations oflogging_collector = on # Enable capturing of stderr and csvlogsilent_mode = on # Run server silently.log_line_prefix = '%t %d %u ' # special values:log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions andautovacuum_naptime = 28800 # time between autovacuum runs. DoubleIP - default was 1minautovacuum_vacuum_threshold = 100 # min number of row updates beforeautovacuum_vacuum_scale_factor = 0.0 # fraction of table size before vacuum. DoubleIP - default was 0.2datestyle = 'iso, mdy'lc_messages = 'en_US.UTF-8' # locale for system error messagelc_monetary = 'en_US.UTF-8' # locale for monetary formattinglc_numeric = 'en_US.UTF-8' # locale for number formattinglc_time = 'en_US.UTF-8' # locale for time formattingdefault_text_search_config = 'pg_catalog.english'=======================As you will see, I have altered the shared_buffers and synchronous_commit values.The shared_buffers had the default value 32Mb. When I changed it to 256Mb the problem still appears but it takes more time to appear (3-4 days). With 32MB, it appeared faster, probably after 24 hours.Also, I have changed the autovacuum daemon to work every 8 hours but I changed its values to make sure it vacuums pretty much all tables (the ones for which at least 100 rows have changed).Please note, though, that my problem existed even before playing around with the autovacuum. This is why I tried to change its values in the first place.The server is synchronized with another server using bucardo. Bucardo process is running on the other server.The same problem appears on the 2nd server too... after 3-4 days, postgres is running slower and slower.Our server configuration :DELL PowerEdge T610 Tower Chassis for Up to 8x 3.5" HDDs2x Intel Xeon E5520 Processor (2.26GHz, 8M Cache, 5.86 GT/s QPI, Turbo, HT), 1066MHz Max Memory8GB Memory,1333MHz2 x 146GB SAS 15k 3.5" HD Hot Plug6 x 1TB SATA 7.2k 3.5" Additional HD Hot PlugPERC 6/i RAID Controller Card 256MB PCIe, 2x4 ConnectorsSUSE Linux Enterprise Server 10, SP2The 2 HDs are set up with RAID-1The 6 HDs are set up with RAID-5Linux is running on the RAID-1 configurationPostgres is running on the RAID-5 configurationFinally a top before and after the full vacuum :top - 11:27:44 up 72 days, 13:27, 37 users, load average: 1.05, 1.31, 1.45Tasks: 279 total, 3 running, 276 sleeping, 0 stopped, 0 zombieCpu(s): 3.6%us, 0.8%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%stMem: 8166432k total, 7963116k used, 203316k free, 115344k buffersSwap: 2097144k total, 2097020k used, 124k free, 2337636k cachedtop - 11:30:58 up 72 days, 13:31, 38 users, load average: 1.53, 1.59, 1.53Tasks: 267 total, 2 running, 265 sleeping, 0 stopped, 0 zombieCpu(s): 1.3%us, 0.4%sy, 0.0%ni, 98.0%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%stMem: 8166432k total, 6016268k used, 2150164k free, 61092k buffersSwap: 2097144k total, 2010204k used, 86940k free, 2262896k cachedI hope I have provided enough info and hope that someone can point me to the correct direction.Thank you very much even for reading up to here !Best regards,Kiriakos
-- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
On Monday, September 24, 2012 02:21:09 PM Julien Cigar wrote: > 5) synchronous_commit = off should only be used if you have a > battery-backed write cache. Huh? Are you possibly confusing this with full_page_writes? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 09/24/2012 14:34, Andres Freund wrote: > On Monday, September 24, 2012 02:21:09 PM Julien Cigar wrote: >> 5) synchronous_commit = off should only be used if you have a >> battery-backed write cache. > Huh? Are you possibly confusing this with full_page_writes? indeed...! sorry for that (note that you still have a (very) small chance of loosing data with synchronous_commit = off if your server crashes between two "commit chunks") > Greetings, > > Andres -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
On Monday, September 24, 2012 02:53:59 PM Julien Cigar wrote: > On 09/24/2012 14:34, Andres Freund wrote: > > On Monday, September 24, 2012 02:21:09 PM Julien Cigar wrote: > >> 5) synchronous_commit = off should only be used if you have a > >> battery-backed write cache. > > > > Huh? Are you possibly confusing this with full_page_writes? > > indeed...! sorry for that > (note that you still have a (very) small chance of loosing data with > synchronous_commit = off if your server crashes between two "commit > chunks") Sure, you have a chance of loosing the last some transactions, but you won't corrupt anything. Thats the entire point of the setting ;) Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, Thank you for your response. Please find below my answers/comments. On Sep 24, 2012, at 15:21, Julien Cigar wrote: > Hello, > > 1) upgrade your PostgreSQL installation, there have been numerous bugfixes releases since 8.4.2 Not possible right now. It will have to be the last solution. > 2) you'll have to show us an explain analyze of the slow queries. If I take a look at those you provided everything runi less than 1ms. Will do so in a couple of days that it will get slow again. > 3) with 200 records you'll always have a seqscan Does it really matter? I mean, with 200 records any query should be ultra fast. Right ? > 4) how much memory do you have ? shared_buffers = 256MB and effective_cache_size = 512MB looks OK only if you have between1 and 2GB of RAM I have included the server specs and the results of top commands, showing that we have 8GB ram and how much memory is used/cached/swapped.Personally I don't quite understand the linux memory, but I have posted them hoping you may see somethingI don't. > 5) synchronous_commit = off should only be used if you have a battery-backed write cache. I agree with the comments that have followed my post. I have changed it, knowing there is a small risk, but hoping it willhelp our performance. > 6) autovacuum_naptime should be changed only if autovacuum is constantly running (so if you have dozen of databases inyour cluster) As I said, changing the autovacuum values have not changed the problem. So, you may as well consider that we have the defaultvalues for autovacuuming... the problem existed with the default values too. > 7) are you sure the problem isn't related to Bucardo ? Not at all sure... I have no idea. Can you suggest of a way to figure it out ? Thank you
On 09/24/2012 15:51, Kiriakos Tsourapas wrote: > Hi, > > Thank you for your response. > Please find below my answers/comments. > > > On Sep 24, 2012, at 15:21, Julien Cigar wrote: > >> Hello, >> >> 1) upgrade your PostgreSQL installation, there have been numerous bugfixes releases since 8.4.2 > Not possible right now. It will have to be the last solution. >> 2) you'll have to show us an explain analyze of the slow queries. If I take a look at those you provided everything runi less than 1ms. > Will do so in a couple of days that it will get slow again. >> 3) with 200 records you'll always have a seqscan > Does it really matter? I mean, with 200 records any query should be ultra fast. Right ? right..! >> 4) how much memory do you have ? shared_buffers = 256MB and effective_cache_size = 512MB looks OK only if you have between1 and 2GB of RAM > I have included the server specs and the results of top commands, showing that we have 8GB ram and how much memory is used/cached/swapped.Personally I don't quite understand the linux memory, but I have posted them hoping you may see somethingI don't. with 8GB of RAM I would start with shared_buffers to 1GB and effective_cache_size to 4GB. I would also change the default work_mem to 32MB and maintenance_work_mem to 512MB >> 5) synchronous_commit = off should only be used if you have a battery-backed write cache. > I agree with the comments that have followed my post. I have changed it, knowing there is a small risk, but hoping it willhelp our performance. >> 6) autovacuum_naptime should be changed only if autovacuum is constantly running (so if you have dozen of databases inyour cluster) > As I said, changing the autovacuum values have not changed the problem. So, you may as well consider that we have the defaultvalues for autovacuuming... the problem existed with the default values too. >> 7) are you sure the problem isn't related to Bucardo ? > Not at all sure... I have no idea. Can you suggest of a way to figure it out ? Unfortunately I never used Bucardo, but be sure that it's not a problem with your network (and that you understand all the challenges involved in multi-master replication) > > > Thank you -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
Вложения
i remember having a server with 8.4.4 where we had multiple problems with autovacuum. if i am not mistaken there are some bugs related with vacuum until 8.4.7. i would suggest you to upgrade to the latest 8.4.x version BR, -- View this message in context: http://postgresql.1045698.n5.nabble.com/Postgres-becoming-slow-only-full-vacuum-fixes-it-tp5725119p5725129.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Hi, On 24 September 2012 20:33, Kiriakos Tsourapas <ktsour@gmail.com> wrote: > The problem : Postgres is becoming slow, day after day, and only a full > vacuum fixes the problem. > > Information you may need to evaluate : > > The problem lies on all tables and queries, as far as I can tell, but we can > focus on a single table for better comprehension. > > The queries I am running to test the speed are : > INSERT INTO "AWAITINGSTATUSSMPP" VALUES('143428', '1111', 1, '2012-06-16 > 13:39:19', '111'); > DELETE FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND > "CONNECTIONID" = 1; > SELECT * FROM "AWAITINGSTATUSSMPP" WHERE "SMSCMSGID" = '1111' AND > "CONNECTIONID" = 1; > > After a full vacuum, they run in about 100ms. > Today, before the full vacuum, they were taking around 500ms. I had similar issue and I disabled cost based auto vacuum: autovacuum_vacuum_cost_delay = -1 -1 says that vacuum_cost_delay will be used and default value for vacuum_cost_delay is 0 (ie. off) Of couse you need to change other autovacuum settings but you did that. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
On 24/09/12 22:33, Kiriakos Tsourapas wrote: > Hi, > > The problem : Postgres is becoming slow, day after day, and only a full vacuum fixes the problem. > > > > My postgresql.conf file : > ====================== > port = 5433 # (change requires restart) > max_connections = 100 # (change requires restart) > shared_buffers = 256MB # min 128kB. DoubleIP - Default was 32MB > synchronous_commit = off # immediate fsync at commit. DoubleIP - Default was on > effective_cache_size = 512MB # DoubleIP - Default was 128MB > log_destination = 'stderr' # Valid values are combinations of > logging_collector = on # Enable capturing of stderr and csvlog > silent_mode = on # Run server silently. > log_line_prefix = '%t %d %u ' # special values: > log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and > autovacuum_naptime = 28800 # time between autovacuum runs. DoubleIP - default was 1min > autovacuum_vacuum_threshold = 100 # min number of row updates before > autovacuum_vacuum_scale_factor = 0.0 # fraction of table size before vacuum. DoubleIP - default was 0.2 > datestyle = 'iso, mdy' > lc_messages = 'en_US.UTF-8' # locale for system error message > lc_monetary = 'en_US.UTF-8' # locale for monetary formatting > lc_numeric = 'en_US.UTF-8' # locale for number formatting > lc_time = 'en_US.UTF-8' # locale for time formatting > default_text_search_config = 'pg_catalog.english' > Given that vacuum full fixes the issue I suspect you need to have autovacuum set wake up much sooner, not later. So autovacuum_naptime = 28800 or even = 60 (i.e the default) is possibly too long. We have several database here where I change this setting to 10 i.e: autovacuum_naptime = 10s in order to avoid massive database bloat and queries that get slower and slower... You might want to be a bit *less* aggressive with autovacuum_vacuum_scale_factor - I usually have this at 0.1, i.e: autovacuum_vacuum_scale_factor = 0.1 otherwise you will be vacuuming all the time - which is usually not what you want (not for all your tables anyway). regards Mark
Thank you, I will take this into consideration, since upgrading to 9 will be much harder I assume... On Sep 24, 2012, at 15:23, MirrorX wrote: > i remember having a server with 8.4.4 where we had multiple problems with > autovacuum. > if i am not mistaken there are some bugs related with vacuum until 8.4.7. > i would suggest you to upgrade to the latest 8.4.x version
Hi Mark, When the problem appears, vacuuming is not helping. I ran vacuum manually and the problem was still there. Only full vacuumworked. As far as I have understood, autovacuuming is NOT doing FULL vacuum. So, messing around with its values should not help mein any way. Thanks > > Given that vacuum full fixes the issue I suspect you need to have autovacuum set wake up much sooner, not later. So autovacuum_naptime= 28800 or even = 60 (i.e the default) is possibly too long. We have several database here where I changethis setting to 10 i.e: > > autovacuum_naptime = 10s > > > in order to avoid massive database bloat and queries that get slower and slower... > > You might want to be a bit *less* aggressive with autovacuum_vacuum_scale_factor - I usually have this at 0.1, i.e: > > autovacuum_vacuum_scale_factor = 0.1 > > > otherwise you will be vacuuming all the time - which is usually not what you want (not for all your tables anyway). > > regards > > Mark > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Hi, Suggestion noted. Nevertheless, I cannot imagine what it would help. Actually, the cost_delay makes autovacuum freeze when it takes more timethan expected, therefore, having it enabled should help the system. I may try it as a last possible resolution (remember that I have to wait for a couple of days for the problem to appear,so any test I perform will be taking days to figure out if it helped !!!) > > I had similar issue and I disabled cost based auto vacuum: > autovacuum_vacuum_cost_delay = -1 > > -1 says that vacuum_cost_delay will be used and default value for > vacuum_cost_delay is 0 (ie. off) > > Of couse you need to change other autovacuum settings but you did that. > > -- > Ondrej Ivanic > (ondrej.ivanic@gmail.com)
Kiriakos Tsourapas wrote: > When the problem appears, vacuuming is not helping. I ran vacuum > manually and the problem was still there. Only full vacuum worked. > > As far as I have understood, autovacuuming is NOT doing FULL > vacuum. So, messing around with its values should not help me in > any way. That is absolutely wrong. A regular vacuum, or autovacuum not hopelessly crippled by your configuration, will prevent the table bloat which is slowing things down. It does not, however, fix bloat once it has occurred; a normal vacuum then is like closing the barn door after the horse has already bolted -- it would have prevented the problem if done in time, but it won't cure it. A VACUUM FULL in version 8.4 will fix bloat of the table's heap, but will tend to bloat the indexes. You should probably fix your autovcauum configuration (making it *at least* as aggressive as the default), CLUSTER the affected table(s) to fix both table and index bloat, and schedule an upgrade to the latest bug fix release of major version 8.4. http://www.postgresql.org/support/versioning/ Minor releases (where the version number only changes after the second dot) only contain fixes for bugs and security problems, and never require a dump/load or pg_upgrade run. If you insist on running with known bugs, you can expect problems. -Kevin
Hi, On 25 September 2012 21:10, Kiriakos Tsourapas <ktsour@gmail.com> wrote: > Suggestion noted. > Nevertheless, I cannot imagine what it would help. Actually, the cost_delay > makes autovacuum freeze when it takes more time than expected, therefore, > having it enabled should help the system. Yes, and I think that "freeze" might be part of your problem. You can: - turn of auto cost based vacuum - or properly set cost parameters: vacuum_cost_page_hit (1), vacuum_cost_page_miss (10), vacuum_cost_page_dirty (20) and vacuum_cost_limit (200) In order to "freeze" ie. reach vacuum_cost_limit auto vacuum needs to: - vacuum up to 200 buffers found in the shared buffer cache (200 / vacuum_cost_page_hit = 200) - or vacuum up to 20 buffers that have to be read from disk (200 / vacuum_cost_page_miss = 20) - or when vacuum modifies up to 10 blocks that were previously clean (200 / vacuum_cost_page_dirty = 10) Basically, you can fiddle with all three parameters until the cows come home or just disable cost based auto vacuum. I think your configuration can handle agressive auto vacuum. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Dear all, I am taking your suggestions one step at a time. I changed my configuration to a much more aggressive autovacuum policy (0.5% for analyzing and 1% for autovacuum). autovacuum_naptime = 1min autovacuum_vacuum_threshold = 50 #autovacuum_analyze_threshold = 50 autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.005 I had tables with 180.000 record and another with 2M records, so the default values of 0.2 for autovacuum would mean that18.000 and 200K records would have to change respectively, delaying the vacuum for many days. I will monitor for the next 2-3 days and post back the results. Thank you all for your suggestions so far. Kiriakos
[resending because I accidentally failed to include the list] Kiriakos Tsourapas wrote: > I am taking your suggestions one step at a time. > > I changed my configuration to a much more aggressive autovacuum > policy (0.5% for analyzing and 1% for autovacuum). > > autovacuum_naptime = 1min > autovacuum_vacuum_threshold = 50 > #autovacuum_analyze_threshold = 50 > autovacuum_vacuum_scale_factor = 0.01 > autovacuum_analyze_scale_factor = 0.005 > > I had tables with 180.000 record and another with 2M records, so > the default values of 0.2 for autovacuum would mean that 18.000 and > 200K records would have to change respectively, delaying the vacuum > for many days. I am concerned that your initial email said that you had this setting: autovacuum_naptime = 28800 This is much too high for most purposes; small, frequently-modified tables won't be kept in good shape with this setting. Perhaps you're not having that problem at the moment, but it's risky to assume that you don't and never will. When autovacuum wakes up and there is nothing to do it should go back to sleep very quickly. Don't expect too much from just making autovacuum run more often until you have eliminated existing bloat (autovacuum generally just limits further growth of bloat) and updated to the latest 8.4 minor release. The following bugs fixes are among many you are living without until you upgrade: - Prevent show_session_authorization() from crashing within autovacuum processes (Tom Lane) - Fix persistent slowdown of autovacuum workers when multiple workers remain active for a long time (Tom Lane) The effective vacuum_cost_limit for an autovacuum worker could drop to nearly zero if it processed enough tables, causing it to run extremely slowly. - Fix VACUUM so that it always updates pg_class.reltuples/relpages (Tom Lane) This fixes some scenarios where autovacuum could make increasingly poor decisions about when to vacuum tables. - Fix btree index corruption from insertions concurrent with vacuuming (Tom Lane) An index page split caused by an insertion could sometimes cause a concurrently-running VACUUM to miss removing index entries that it should remove. After the corresponding table rows are removed, the dangling index entries would cause errors (such as "could not read block N in file ...") or worse, silently wrong query results after unrelated rows are re-inserted at the now-free table locations. This bug has been present since release 8.2, but occurs so infrequently that it was not diagnosed until now. If you have reason to suspect that it has happened in your database, reindexing the affected index will fix things. - Ensure autovacuum worker processes perform stack depth checking properly (Heikki Linnakangas) Previously, infinite recursion in a function invoked by auto-ANALYZE could crash worker processes. - Only allow autovacuum to be auto-canceled by a directly blocked process (Tom Lane) The original coding could allow inconsistent behavior in some cases; in particular, an autovacuum could get canceled after less than deadlock_timeout grace period. - Improve logging of autovacuum cancels (Robert Haas) -Kevin
Hi Kevin,
On Sep 26, 2012, at 14:39, Kevin Grittner wrote:
I am concerned that your initial email said that you had this
setting:
autovacuum_naptime = 28800
This is much too high for most purposes; small, frequently-modified
tables won't be kept in good shape with this setting. Perhaps you're
not having that problem at the moment, but it's risky to assume that
you don't and never will. When autovacuum wakes up and there is
nothing to do it should go back to sleep very quickly.
I used the 28800 (8hours) setting after I realized that the default 1min was not helping.
I also changed other parameters when I changed it to 8 hours, to make sure tables would be auto vacuumed.
The problem with my setting was that autovacuum gets stopped if a lock is needed on the table. So, it was very bad choice to run it every 8 hours, because usually it got stopped and never did anything.
So, I turned back to the original setting of 1min but changed the autovacuum_vacuum_scale_factor to 1% instead of 20%. Hopefully tables will be more frequently vacuumed now and the problem will not appear again.
Don't expect too much from just making autovacuum run more often
until you have eliminated existing bloat (autovacuum generally just
limits further growth of bloat) and updated to the latest 8.4 minor
release. The following bugs fixes are among many you are living
without until you upgrade:
Can you please suggest of a way to
- find if there is existing bloat
- eliminate it
Thank you
Dear all, Just letting you know that making the autovacuum policy more aggressive seems to have fixed the problem. It's been 4 days now and everything is running smoothly. Just a reminder, what I changed was : autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.005 making autovacuum run at 1% instead of 20% (the dafault) and the analyze run at 0,5% instead of 10%. Maybe it's more aggressive than needed... I will monitor and post back. Thank you all for your help.
Kiriakos Tsourapas, 25.09.2012 13:01: > Thank you, > > I will take this into consideration, since upgrading to 9 will be much harder I assume... > I think an upgrade from 8.3 to 8.4 was "harder" due to the removal of a lot of implicit type casts. 8.4 to 9.x shouldn't be that problematic after all (but will take longer due to the required dump/reload)
On Tue, Sep 25, 2012 at 5:24 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > I think an upgrade from 8.3 to 8.4 was "harder" due to the removal of a lot > of implicit type casts. FYI that was from 8.2 to 8.3 that implicit casts were removed.
> -----Original Message----- > From: Thomas Kellerer [mailto:spam_eater@gmx.net] > Sent: Tuesday, September 25, 2012 7:24 AM > To: pgsql-performance@postgresql.org > Subject: Re: Postgres becoming slow, only full vacuum fixes it > > Kiriakos Tsourapas, 25.09.2012 13:01: > > Thank you, > > > > I will take this into consideration, since upgrading to 9 will be > much harder I assume... > > > > I think an upgrade from 8.3 to 8.4 was "harder" due to the removal of a > lot of implicit type casts. > 8.4 to 9.x shouldn't be that problematic after all (but will take > longer due to the required dump/reload) > Actually, 8.3 to 8.4 required db dump/restore. When upgrading from 8.4 to 9.x pg_upgrade could be used without dump/restore. Regards, Igor Neyman
I am posting back to let you know that the DB is working fine since the changes in the autovacuum settings. I am including the changes I made for later reference to anyone that may face similar issues. Thank you all for your time and help ! On Sep 28, 2012, at 9:52, Kiriakos Tsourapas wrote: > Dear all, > > Just letting you know that making the autovacuum policy more aggressive seems to have fixed the problem. > It's been 4 days now and everything is running smoothly. > > Just a reminder, what I changed was : > autovacuum_vacuum_scale_factor = 0.01 > autovacuum_analyze_scale_factor = 0.005 > making autovacuum run at 1% instead of 20% (the dafault) and the analyze run at 0,5% instead of 10%. > > Maybe it's more aggressive than needed... I will monitor and post back. > > > Thank you all for your help.