Re: cost and actual time

Поиск

Список

Период

Сортировка

От	Manfred Koizar
Тема	Re: cost and actual time
Дата	20 февраля 2003 г. 08:01:58
Msg-id	94095vcb2p7db03jmbhnu1agh3k2iqrodf@4ax.com обсуждение исходный текст
Ответ на	Re: cost and actual time (Chantal Ackermann <chantal.ackermann@biomax.de>)
Список	pgsql-performance

Дерево обсуждения

On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann
<chantal.ackermann@biomax.de> wrote:
>Nested Loop: 53508.86 msec
>Merge Join: 113066.81 msec
>Hash Join:  439344.44 msec

Chantal,

you might have reached the limit of what Postgres (or any other
database?) can do for you with these data structures.  Time for
something completely different:  Try calculating the counts in
advance.

    CREATE TABLE occ_stat (
        did INT NOT NULL,
        gid INT NOT NULL,
        cnt INT NOT NULL
    ) WITHOUT OIDS;

    CREATE INDEX occ_stat_dg ON occ_stat(did, gid);
    CREATE INDEX occ_stat_gd ON occ_stat(gid, did);

There is *no* UNIQUE constraint on (did, gid).  You get the numbers
you're after by
    SELECT did, sum(cnt) AS cnt
      FROM occ_stat
     WHERE gid = 'whatever'
     GROUP BY did
     ORDER BY cnt DESC;

occ_stat is initially loaded by

    INSERT INTO occ_stat
    SELECT did, gid, count(*)
      FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid)
     GROUP BY did, gid;

Doing it in chunks
    WHERE sid BETWEEN a::bigint AND b::bigint
might be faster.

You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o
while you do the initial load.  If it takes too long, see below for
how to do it in the background; hopefully the load task will catch up
some day :-)

Keeping occ_stat current:

    CREATE RULE d_o_i AS ON INSERT
        TO d_o DO (
            INSERT INTO occ_stat
            SELECT NEW.did, g_o.gid, 1
              FROM g_o
             WHERE g_o.sid = NEW.sid);

    CREATE RULE d_o_d AS ON DELETE
        TO d_o DO (
            INSERT INTO occ_stat
            SELECT OLD.did, g_o.gid, -1
              FROM g_o
             WHERE g_o.sid = OLD.sid);

On UPDATE do both.  Create a set of similar rules for g_o.

These rules will create a lot of duplicates on (did, gid) in occ_stat.
Updating existing rows and inserting only new combinations might seem
obvious, but this method has concurrency problems (cf. the thread
"Hard problem with concurrency" on -hackers).  So occ_stat calls for
reorganisation from time to time:

    BEGIN;
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS;

    INSERT INTO t
    SELECT did, gid, sum(cnt)
      FROM occ_stat
     GROUP BY did, gid
    HAVING count(*) > 1;

    DELETE FROM occ_stat
     WHERE t.did = occ_stat.did
       AND t.gid = occ_stat.gid;

    INSERT INTO occ_stat SELECT * FROM t;

    DROP TABLE t;
    COMMIT;
    VACUUM ANALYZE occ_stat;  -- very important!!

Now this should work, but the rules could kill INSERT/UPDATE/DELETE
performance.  Depending on your rate of modifications you might be
forced to push the statistics calculation to the background.

    CREATE TABLE d_o_change (
        sid BIGINT NOT NULL,
        did INT NOT NULL,
        cnt INT NOT NULL
    ) WITHOUT OIDS;

    ... ON INSERT TO d_o DO (
        INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1));

    ... ON DELETE TO d_o DO (
        INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1));

    ... ON UPDATE TO d_o
        WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did
        DO both

And the same for g_o.

You need a task that periodically scans [dg]_o_change and does ...

    BEGIN;
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    SELECT <any row (or some rows) from x_o_change>;
    INSERT INTO occ_stat <see above>;
    DELETE <the selected row(s) from x_o_change>;
    COMMIT;

Don't forget to VACUUM!

If you invest a little more work, I guess you can combine the
reorganisation into the loader task ...

I have no idea whether this approach is better than what you have now.
With a high INSERT/UPDATE/DELETE rate it may lead to a complete
performance disaster.  You have to try ...

Servus
 Manfred

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Mario Weilguni"
Дата: 20 февраля 2003 г., 07:36:12
Сообщение: Re: Write ahead logging

Следующее

От: Mike Nielsen
Дата: 20 февраля 2003 г., 08:06:58
Сообщение: Re: Peluang Usaha yang Luar Biasa

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: cost and actual time

Предыдущее

Следующее