Re: Performance problems testing with Spamassassin 3.1.0

Поиск
Список
Период
Сортировка
От Matthew Schumacher
Тема Re: Performance problems testing with Spamassassin 3.1.0
Дата
Msg-id 42EAA410.9010700@aptalaska.net
обсуждение исходный текст
Ответ на Re: Performance problems testing with Spamassassin 3.1.0  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Performance problems testing with Spamassassin 3.1.0
Re: Performance problems testing with Spamassassin 3.1.0
Список pgsql-performance
Ok, here is where I'm at, I reduced the proc down to this:

CREATE FUNCTION update_token (_id INTEGER,
                              _token BYTEA,
                              _spam_count INTEGER,
                              _ham_count INTEGER,
                              _atime INTEGER)
RETURNS VOID AS
$$
BEGIN
  LOOP
    UPDATE bayes_token
      SET spam_count = spam_count + _spam_count,
          ham_count  = ham_count + _ham_count,
          atime      = _atime
      WHERE id = _id
      AND token = _token;

    IF found THEN
      RETURN;
    END IF;

    INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
_ham_count, _atime);
    IF FOUND THEN
      UPDATE bayes_vars SET token_count = token_count + 1 WHERE id = _id;
      IF NOT FOUND THEN
        RAISE EXCEPTION 'unable to update token_count in bayes_vars';
        return FALSE;
      END IF;

       RETURN;
    END IF;

    RETURN;
  END LOOP;
END;
$$
LANGUAGE plpgsql;

All it's doing is trying the update before the insert to get around the
problem of not knowing which is needed.  With only 2-3 of the queries
implemented I'm already back to running about the same speed as the
original SA proc that is going to ship with SA 3.1.0.

All of the queries are using indexes so at this point I'm pretty
convinced that the biggest problem is the sheer number of queries
required to run this proc 200 times for each email (once for each token).

I don't see anything that could be done to make this much faster on the
postgres end, it's looking like the solution is going to involve cutting
down the number of queries some how.

One thing that is still very puzzling to me is why this runs so much
slower when I put the data.sql in a transaction.  Obviously transactions
are acting different when you call a proc a zillion times vs an insert
query.

Anyway, if anyone else has any ideas I'm all ears, but at this point
it's looking like raw query speed is needed for this app and while I
don't care for mysql as a database, it does have the speed going for it.

schu

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Performance problems testing with Spamassassin 3.1.0
Следующее
От: "Dario"
Дата:
Сообщение: Re: Left joining against two empty tables makes a query