Re: cost and actual time
От | Manfred Koizar |
---|---|
Тема | Re: cost and actual time |
Дата | |
Msg-id | 94095vcb2p7db03jmbhnu1agh3k2iqrodf@4ax.com обсуждение исходный текст |
Ответ на | Re: cost and actual time (Chantal Ackermann <chantal.ackermann@biomax.de>) |
Список | pgsql-performance |
On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann <chantal.ackermann@biomax.de> wrote: >Nested Loop: 53508.86 msec >Merge Join: 113066.81 msec >Hash Join: 439344.44 msec Chantal, you might have reached the limit of what Postgres (or any other database?) can do for you with these data structures. Time for something completely different: Try calculating the counts in advance. CREATE TABLE occ_stat ( did INT NOT NULL, gid INT NOT NULL, cnt INT NOT NULL ) WITHOUT OIDS; CREATE INDEX occ_stat_dg ON occ_stat(did, gid); CREATE INDEX occ_stat_gd ON occ_stat(gid, did); There is *no* UNIQUE constraint on (did, gid). You get the numbers you're after by SELECT did, sum(cnt) AS cnt FROM occ_stat WHERE gid = 'whatever' GROUP BY did ORDER BY cnt DESC; occ_stat is initially loaded by INSERT INTO occ_stat SELECT did, gid, count(*) FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid) GROUP BY did, gid; Doing it in chunks WHERE sid BETWEEN a::bigint AND b::bigint might be faster. You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o while you do the initial load. If it takes too long, see below for how to do it in the background; hopefully the load task will catch up some day :-) Keeping occ_stat current: CREATE RULE d_o_i AS ON INSERT TO d_o DO ( INSERT INTO occ_stat SELECT NEW.did, g_o.gid, 1 FROM g_o WHERE g_o.sid = NEW.sid); CREATE RULE d_o_d AS ON DELETE TO d_o DO ( INSERT INTO occ_stat SELECT OLD.did, g_o.gid, -1 FROM g_o WHERE g_o.sid = OLD.sid); On UPDATE do both. Create a set of similar rules for g_o. These rules will create a lot of duplicates on (did, gid) in occ_stat. Updating existing rows and inserting only new combinations might seem obvious, but this method has concurrency problems (cf. the thread "Hard problem with concurrency" on -hackers). So occ_stat calls for reorganisation from time to time: BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS; INSERT INTO t SELECT did, gid, sum(cnt) FROM occ_stat GROUP BY did, gid HAVING count(*) > 1; DELETE FROM occ_stat WHERE t.did = occ_stat.did AND t.gid = occ_stat.gid; INSERT INTO occ_stat SELECT * FROM t; DROP TABLE t; COMMIT; VACUUM ANALYZE occ_stat; -- very important!! Now this should work, but the rules could kill INSERT/UPDATE/DELETE performance. Depending on your rate of modifications you might be forced to push the statistics calculation to the background. CREATE TABLE d_o_change ( sid BIGINT NOT NULL, did INT NOT NULL, cnt INT NOT NULL ) WITHOUT OIDS; ... ON INSERT TO d_o DO ( INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1)); ... ON DELETE TO d_o DO ( INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1)); ... ON UPDATE TO d_o WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did DO both And the same for g_o. You need a task that periodically scans [dg]_o_change and does ... BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT <any row (or some rows) from x_o_change>; INSERT INTO occ_stat <see above>; DELETE <the selected row(s) from x_o_change>; COMMIT; Don't forget to VACUUM! If you invest a little more work, I guess you can combine the reorganisation into the loader task ... I have no idea whether this approach is better than what you have now. With a high INSERT/UPDATE/DELETE rate it may lead to a complete performance disaster. You have to try ... Servus Manfred
В списке pgsql-performance по дате отправления: