Обсуждение: Massive table bloat
Hi! I've set up some system to track slow page executions in one of our (as yet not live) web apps. The tracking itself is handled completely within the database using a function. Within a very short time (approx. 1 week) and although we haven't got that much traffic on our testpages, the table in question as grown beyond a size of 23 GB, even though a SELECT count(*) on it will tell me that it only contains 235 rows. I'm sure I must be missing something obvious here... Here's the DDL for the table: CREATE TABLE stats.slowpages ( url text NOT NULL, lastexecduration integer NOT NULL, avgslowexecduration integer, execcount integer, lastexectime timestamp without time zone, site_id integer NOT NULL, slowestexecduration integer, totaltimespent bigint, CONSTRAINT "slowpages_pkey" PRIMARY KEY (url) )WITHOUT OIDS; -- Indexes CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree (lastexecduration); CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree (avgslowexecduration); CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree (execcount); CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree (lastexectime); CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree (site_id); CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING btree (url, site_id); CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree (totaltimespent); And this here is the function we use to insert or update entries in this table: CREATE or REPLACE FUNCTION "stats"."iou_slowpages"( IN "_site_id" integer, IN "_url" text, IN "_duration" integer) RETURNS void AS $BODY$ BEGIN LOOP UPDATE stats.slowpages SET avgslowexecduration = ((avgslowexecduration*execcount)+_duration)/(execcount+1) , execcount = execcount+1 , lastexectime = now() , lastexecduration = _duration , totaltimespent = totaltimespent + _duration , slowestexecduration = CASE WHEN _duration > slowestexecduration THEN _duration ELSE slowestexecduration END WHERE url = _url AND site_id = _site_id; IF found THEN RETURN; END IF; BEGIN INSERT INTO stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura tion,totaltimespent,execcount,lastexectime,site_id) VALUES (_url, _duration, _duration,_duration,_duration, 1, now(), _site_id); RETURN; EXCEPTION WHEN unique_violation THEN END; END LOOP; END; $BODY$ LANGUAGE 'plpgsql' VOLATILE; _site_id is a small integer value, _url is a full URL string to a page and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7. Any idea about what I may be missing here? Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Furth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jurg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
On 3 March 2010 15:33, Markus Wollny <Markus.Wollny@computec.de> wrote: > Hi! > > I've set up some system to track slow page executions in one of our (as > yet not live) web apps. The tracking itself is handled completely within > the database using a function. Within a very short time (approx. 1 week) > and although we haven't got that much traffic on our testpages, the > table in question as grown beyond a size of 23 GB, even though a SELECT > count(*) on it will tell me that it only contains 235 rows. I'm sure I > must be missing something obvious here... > > Here's the DDL for the table: > > CREATE TABLE stats.slowpages > ( > url text NOT NULL, > lastexecduration integer NOT NULL, > avgslowexecduration integer, > execcount integer, > lastexectime timestamp without time zone, > site_id integer NOT NULL, > slowestexecduration integer, > totaltimespent bigint, > CONSTRAINT "slowpages_pkey" PRIMARY KEY (url) > )WITHOUT OIDS; > > -- Indexes > CREATE INDEX idx_slowpages_duration ON stats.slowpages USING btree > (lastexecduration); > CREATE INDEX idx_slowpages_avgduration ON stats.slowpages USING btree > (avgslowexecduration); > CREATE INDEX idx_slowpages_execcount ON stats.slowpages USING btree > (execcount); > CREATE INDEX idx_slowpages_lastexec ON stats.slowpages USING btree > (lastexectime); > CREATE INDEX idx_slowpages_site ON stats.slowpages USING btree > (site_id); > CREATE UNIQUE INDEX uidx_slowpages_url_site ON stats.slowpages USING > btree (url, site_id); > CREATE INDEX idx_slowpages_totaltimespent ON stats.slowpages USING btree > (totaltimespent); > > And this here is the function we use to insert or update entries in this > table: > > CREATE or REPLACE FUNCTION "stats"."iou_slowpages"( > IN "_site_id" integer, > IN "_url" text, > IN "_duration" integer) > RETURNS void AS > $BODY$ > BEGIN > LOOP > > UPDATE stats.slowpages > SET avgslowexecduration = > ((avgslowexecduration*execcount)+_duration)/(execcount+1) > , execcount = execcount+1 > , lastexectime = now() > , lastexecduration = _duration > , totaltimespent = totaltimespent + _duration > , slowestexecduration = CASE WHEN _duration > > slowestexecduration > THEN _duration ELSE slowestexecduration END > WHERE url = _url AND site_id = _site_id; > IF found THEN > RETURN; > END IF; > > BEGIN > INSERT INTO > stats.slowpages(url,lastexecduration,avgslowexecduration,slowestexecdura > tion,totaltimespent,execcount,lastexectime,site_id) > VALUES (_url, _duration, _duration,_duration,_duration, 1, > now(), _site_id); > RETURN; > EXCEPTION WHEN unique_violation THEN > > END; > END LOOP; > END; > $BODY$ > LANGUAGE 'plpgsql' VOLATILE; > > _site_id is a small integer value, _url is a full URL string to a page > and _duration is a value in milliseconds. We're on PostgreSQL 8.3.7. > > Any idea about what I may be missing here? > > Kind regards > > Markus > If you update rows, it actually creates a new version of it. The old one doesn't get removed until the VACUUM process cleans it up, so maybe you need to run that against the database? Regards Thom
do a vacuum analyze verbose on it, and see if it complains about FSM (free space map) setting. Which it probably will be.
> -----Ursprüngliche Nachricht----- > Von: Thom Brown [mailto:thombrown@gmail.com] > Gesendet: Mittwoch, 3. März 2010 16:56 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] Massive table bloat > If you update rows, it actually creates a new version of it. > The old one doesn't get removed until the VACUUM process > cleans it up, so maybe you need to run that against the database? I already do on a nightly basis (which is probably not often enough in this case) and have got autovacuum running. I'll checkinto FSM settings as suggested by Grzegorz Jaśkiewicz, there's probably half a solution to the problem there, the otherhalf being probably the autovacuum daemon not visiting this table nearly often enough. Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276