Обсуждение: COMMIT within function?

Поиск
Список
Период
Сортировка

COMMIT within function?

От
Dawid Kuroczko
Дата:
Suppose I have vacuum_values() function, which removes all
"no longer referenced" by parent column.  Kind of function
to be run from time to time to clean table from crud.
It looks like this:
CREATE FUNCTION vacuum_values() RETURNS void AS $$
        DECLARE
                r RECORD;
        BEGIN
                FOR r IN SELECT value_id FROM values NATURAL LEFT JOIN
other_tab WHERE other_tab.value_id IS NULL FOR UPDATE OF values LOOP
                        DELETE FROM values WHERE value_id = r.value_id;
                END LOOP;
                RETURN;
        END;
        $$ LANGUAGE plpgsql;

Here, as the query runs against two table values (2 mln. rows) and
ther_tab (20 mln. rows) it is relatively slow...  However there is
a chance that while this query goes, and goes, some rows will become
referenced once more... and the DELETE will fail because of FOREIGN
KEY, and the whole function will ROLLBACK...  Is there a way to force
"ignore errors" or something?  As far as I checked, I can catch errors,
but I don't really can stop the ROLLBACK.  There are SAVEPOINTs but
I guess they are useful for explicit ROLLBACK TO SAVEPOINT...

Of course I can move all this logic outside of backend, and make
the backend just 'do' the DELETEs, ignoring errors...  But still,
it should be doable in the procedural languages aswell.....

   Regards,
      Dawid

Re: COMMIT within function?

От
Pierre-Frédéric Caillaud
Дата:
> Suppose I have vacuum_values() function, which removes all
> "no longer referenced" by parent column.  Kind of function
> to be run from time to time to clean table from crud.
> It looks like this:

    I suppose you have a good reason to not use a foreign key with "ON DELETE
CASCADE" ?

>                 FOR r IN SELECT value_id FROM values NATURAL LEFT JOIN
> other_tab WHERE other_tab.value_id IS NULL FOR UPDATE OF values LOOP
>                         DELETE FROM values WHERE value_id = r.value_id;
>                 END LOOP;
>                 RETURN;

    I don't remember the exact syntax (look in the DELETE docs) but you can
certainly put a left join inside a delete and do it all at once with only
one query, and it'll be faster to boot.

Re: COMMIT within function?

От
Dawid Kuroczko
Дата:
On Mon, 22 Nov 2004 00:16:07 +0100, Pierre-Frédéric Caillaud
<lists@boutiquenumerique.com> wrote:
>
> > Suppose I have vacuum_values() function, which removes all
> > "no longer referenced" by parent column.  Kind of function
>         I suppose you have a good reason to not use a foreign key with "ON DELETE
> CASCADE" ?

Well, the issue here is saving space and speed with
lots of repeatable data.  Like e-mail addresses, most
of them are frequently reused, so instead of a table

CREATE TABLE messages (author text, ...);

I create two:
CREATE TABLE authors (author_id serial PRIMARY KEY, author text UNIQUE
NOT NULL);
CREATE TABLE messages (author_id integer REFERENCES authors, ...);

...and a matching view, and a function/rule which "invisibly"
changes author to author_id whenever data is added (with
authors table being updated when necessary).

Now, after some time I remove old  messages, and some of authors become
"unreferenced" (think: From-s of spam messages).  It would be nice to vacuum
them out.  The problem is when one of those authors "shows up"  after
long absence between our SELECT and actual DELETE.  For a busy table
(this happen to be one) it is quite possible. :)

Ah, and ON DELETE CASCADE would mean I would loose perfectly
 good messages. Having LOCK on the table is also not-so-good
an idea (think: authors with 2mln rows, messags with 20mln rows).

> >                 FOR r IN SELECT value_id FROM values NATURAL LEFT JOIN
> > other_tab WHERE other_tab.value_id IS NULL FOR UPDATE OF values LOOP
> >                         DELETE FROM values WHERE value_id = r.value_id;
> >                 END LOOP;
> >                 RETURN;
>
>         I don't remember the exact syntax (look in the DELETE docs) but you can
> certainly put a left join inside a delete and do it all at once with only
> one query, and it'll be faster to boot.

Well, DELETE FROM ... WHERE ... is nice, but it will be explicitly
"all-or-nothing",
whereas with FUNCTION I have a ghost of hope that it may not be atomic. :)
...and I don't think you can do OUTER JOIN without subselect using DELETE FROM
WHERE.

   Regards,
        Dawid

Re: COMMIT within function?

От
Michael Fuhr
Дата:
On Sun, Nov 21, 2004 at 07:29:26PM +0100, Dawid Kuroczko wrote:

> Of course I can move all this logic outside of backend, and make
> the backend just 'do' the DELETEs, ignoring errors...  But still,
> it should be doable in the procedural languages aswell.....

In PostgreSQL 8.0, PL/pgSQL functions can trap errors without rolling
back the entire transaction:

http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

BEGIN
    DELETE FROM values WHERE value_id = r.value_id;
EXCEPTION
    WHEN foreign_key_violation THEN
    NULL;
END;

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: COMMIT within function?

От
Dawid Kuroczko
Дата:
On Sun, 21 Nov 2004 20:10:03 -0700, Michael Fuhr <mike@fuhr.org> wrote:
http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
>
> BEGIN
>     DELETE FROM values WHERE value_id = r.value_id;
> EXCEPTION
>     WHEN foreign_key_violation THEN
>         NULL;
> END;

Ahh, exactly what I was looking for. :)  The thing I didn't notice
was that, while exception causes rollback to "BEGIN", it does
not mean to the beginning of the function.  In other words
I didn't nest BEGIN...END blocks and all I got from using
exceptions was that they did not show any errors. :)

Thank you! I am now enlightened.  This works perfect, exactly
as I hoped it would. :)

   Regards,
      dawid