Re: Danger of idiomatic plpgsql loop for merging data

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Danger of idiomatic plpgsql loop for merging data
Дата
Msg-id AANLkTinjnAfrLTQTJ1iUtrwBJ5sVdndYw3RC8Zx9w_xw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Danger of idiomatic plpgsql loop for merging data  ("J. Greg Davidson" <jgd@well.com>)
Список pgsql-general
On Tue, Aug 3, 2010 at 11:32 PM, J. Greg Davidson <jgd@well.com> wrote:
> Hi fellow PostgreSQL hackers,
>
> First, a thank you to Merlin for commenting on my earlier post!
> I've run into another dangerous problem since the earlier post.
>
> I began converting from the plpgsql loop idiom for merging data
> into a COALESCE(find(), create(), find()) idiom and ran into a
> problem with the latter.  I'll hold the full code to the end of
> this post and summarize the situation first.
>
> I'm using this idiom in 48 get_ functions associated with
> 48 different tables!  In all cases, the functions should be
> idempotent with monotonic side effects, i.e. when the data
> desired is not present (the first time they are called),
> they create the data; in all cases, they return a reference
> (primary key value) to the data.
>
> Here's the new problem: Depending on the idiom I use, when I
> nest a call to one of these get_ functions as an argument to
> another function, the outer function does not see the new data!
>
> To be more specific, in the call:
>   SELECT access_foo( get_foo(foo_data) )
> where access_foo expects a reference to a row of TABLE foos
> and get_foo returns such a reference, possibly creating the
> desired row,
>
> when I write get_foo() with the plpgsql loop idiom, it seems
> to always return a reference which access_foo can use immediately.
> On the other hand, when I use the COALESCE(find(), create(), find())
> idiom and the get_foo() function created a new row, access_foo
> fails to find it!
>
> In all cases saying:
>   SELECT get_foo(foo_data);
>   SELECT access_foo( get_foo(foo_data) );
> works fine since if the data needed to be added, it was done
> in the separate earlier transaction.
>
> Because of this problem, I'm abandoning my original preference
> for the COALESCE(find(), create(), find()) idiom and I'm adding
> a check to the plpgsql LOOP idiom to prevent it going infinite.
>
> For those who'd like to see the gory details, here is the
> code, simplified as much as possible (and with a suffix on
> the type name to please Merlin:):
>
> -- (0) The table in question:
>
> -- The trailing underscores can be read as "field" or "slot"
> -- which is sometimes useful to avoid clashes with reserved
> -- words, local variables, etc.
>
> CREATE TABLE foos (
>  ref_ foo_reftype PRIMARY KEY DEFAULT next_foo_ref();
>  name_ text UNIQUE NOT NULL;
> );
>
> -- (1) The idiom from the PostgreSQL reference manual, which
> -- unfortunately can go into an infinite loop if a trigger
> -- should raise a unique_violation exception.
>
> -- The underscore prefix can be read as "local"
> -- and is sometimes useful to avoid name clashes, etc.
>
> CREATE OR REPLACE
> FUNCTION get_foo(text) RETURNS foo_reftype AS $$
> DECLARE
>  _ref foo_reftype;
> BEGIN
>  LOOP
>    SELECT ref_ INTO _ref
>      FROM foos WHERE name_ = $1;
>    IF FOUND THEN RETURN _ref; END IF;
>    BEGIN
>      INSERT INTO foos(name_) VALUES($1);
>    EXCEPTION
>      WHEN unique_violation THEN
>      -- maybe another thread?
>    END;
>  END LOOP;
> END;
> $$ LANGUAGE plpgsql STRICT;
>
> -- (2) Where I was originally going
> -- to avoid the infinite loop problem,
> -- and also hoping to get better performance
> -- on the most common case where the
> -- first call to old_foo() finds the row
> -- (since SQL functions are inlined into
> -- the execution plan):
>
> CREATE OR REPLACE
> FUNCTION old_foo(text) RETURNS foo_reftype AS $$
>  SELECT ref_ FROM foos WHERE name_ = $1
> $$ LANGUAGE SQL STRICT;
>
> CREATE OR REPLACE
> FUNCTION new_foo(text) RETURNS foo_reftype AS $$
> DECLARE
>  this regprocedure := 'new_foo(text)';
>  _ref foo_reftype;
> BEGIN
>  INSERT INTO foos(name_) VALUES ($1)
>    RETURNING ref_ INTO _ref;
>  RETURN _ref;
> EXCEPTION
>  WHEN unique_violation THEN
>    -- maybe another thread?
>    RAISE NOTICE '% "%" unique_violation', this, $1;
>    RETURN NULL;
> END;
> $$ LANGUAGE plpgsql STRICT;
>
> CREATE OR REPLACE
> FUNCTION get_foo(text) RETURNS foo_reftype AS $$
>  SELECT COALESCE(
>    old_foo($1), new_foo($1), old_foo($1)
>  )
> $$ LANGUAGE sql STRICT;
>
> -- (3) Where I'm going now, although it feels
> -- like a patch (I hate :-( patches!):
>
> CREATE OR REPLACE
> FUNCTION get_foo(text) RETURNS foo_reftype AS $$
> DECLARE
>  _ref foo_reftype;
>  killroy_was_here boolean := false;
>  this regprocedure := 'get_foo(text)';
> BEGIN
>  LOOP
>    SELECT ref_ INTO _ref
>      FROM foos WHERE name_ = $1;
>    IF FOUND THEN RETURN _foo; END IF;
>    IF killroy_was_here THEN
>      RAISE EXCEPTION '% "%" loops!', this, $1;
>    END IF;
>    killroy_was_here := true;
>    BEGIN
>      INSERT INTO foos(name_) VALUES($1);
>    EXCEPTION
>      WHEN unique_violation THEN -- another thread?
>      RAISE NOTICE '% "%" unique_violation', this, $1;
>    END;
>  END LOOP;
> END;
> $$ LANGUAGE plpgsql;

The infinite loop check is good but you missed the most important
part: you need to be checking sqlerrm to see where the unique
violation is coming from.  Your original issue was that some dependent
trigger was causing the error which is getting caught here.  Your
check should ONLY be handling unique violations on the table 'foos'.

The error message (sqlerrm) will look something like this:
ERROR:  duplicate key value violates unique constraint "a_constraint_pkey"

I would do something like this:
      WHEN unique_violation THEN -- another TABLE?
        this_table := false;

        IF SQLERRM ~ 'unique constraint "a_constraint_pkey"' THEN
          this_table := true;
        END IF;

        IF SQLERRM ~ 'unique constraint "another_unique_constraint"' THEN
          this_table := true;
        END IF;

       IF NOT this_table
         RAISE '%', SQLERRM USING ERRCODE = 'unique_violation';
       END IF;

yes, this is awful.  hopefully your constraints have useful names that
are unique.  IMNSHO the fully schema qualified table name should be in
the error message.

merlin

В списке pgsql-general по дате отправления:

Предыдущее
От: Rikard Bosnjakovic
Дата:
Сообщение: Embedded text column versus referenced text
Следующее
От:
Дата:
Сообщение: tgname munged