Обсуждение: Danger of idiomatic plpgsql loop for merging data
Hi fellow PostgreSQL hackers, I just got burned by the idiomatic loop documented in the PostgreSQL manual as Example 39-2. Exceptions with UPDATE/INSERT I have now replaced this "standard" idiom with a safer one described below. What went wrong: It seems that the table I was either inserting into or selecting from had a trigger inserting some associated data which was sometimes raising a unique_violation exception, turning the "standard" idiom into an infinite loop! My (simplified) old code looked like this: CREATE TABLE foos ( foo_ foo PRIMARY KEY DEFAULT next_foo(); name_ text UNIQUE NOT NULL; ); CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo AS $$ DECLARE _foo foo; BEGIN LOOP SELECT foo_ INTO _foo FROM foos WHERE name_ = $1; IF FOUND THEN RETURN _foo; END IF; BEGIN INSERT INTO foos(name_) VALUES($1); EXCEPTION WHEN unique_violation THEN -- maybe another thread? END; END LOOP; END; $$ LANGUAGE plpgsql STRICT; My (simplified) new code is longer but more flexible, safer and adds logging: CREATE OR REPLACE FUNCTION old_foo(text) RETURNS foo AS $$ SELECT foo_ FROM foos WHERE name_ = $1 $$ LANGUAGE SQL STRICT; CREATE OR REPLACE FUNCTION new_foo(text) RETURNS foo AS $$ DECLARE this regprocedure := 'new_foo(text)'; _foo foo; BEGIN INSERT INTO foos(name_) VALUES ($1) RETURNING foo_ INTO _foo; RETURN _ref; EXCEPTION WHEN unique_violation THEN -- maybe another thread? RAISE NOTICE '% "%" unique_violation', this, $1; RETURN NULL; END; $$ LANGUAGE plpgsql STRICT; CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo AS $$ SELECT COALESCE( old_foo($1), new_foo($1), old_foo($1) ) $$ LANGUAGE sql STRICT; _Greg
On Wed, Jul 28, 2010 at 5:27 PM, J. Greg Davidson <jgd@well.com> wrote: > Hi fellow PostgreSQL hackers, > > I just got burned by the idiomatic loop > documented in the PostgreSQL manual as > > Example 39-2. Exceptions with UPDATE/INSERT > > I have now replaced this "standard" idiom > with a safer one described below. > > What went wrong: > > It seems that the table I was either > inserting into or selecting from had > a trigger inserting some associated > data which was sometimes raising a > unique_violation exception, turning the > "standard" idiom into an infinite loop! > > My (simplified) old code looked like this: > > CREATE TABLE foos ( > foo_ foo PRIMARY KEY DEFAULT next_foo(); > name_ text UNIQUE NOT NULL; > ); > > CREATE OR REPLACE > FUNCTION get_foo(text) RETURNS foo AS $$ > DECLARE > _foo foo; > BEGIN > LOOP > SELECT foo_ INTO _foo > FROM foos WHERE name_ = $1; > IF FOUND THEN RETURN _foo; END IF; > BEGIN > INSERT INTO foos(name_) VALUES($1); > EXCEPTION > WHEN unique_violation THEN > -- maybe another thread? > END; > END LOOP; > END; > $$ LANGUAGE plpgsql STRICT; > > My (simplified) new code is longer but > more flexible, safer and adds logging: > > CREATE OR REPLACE > FUNCTION old_foo(text) RETURNS foo AS $$ > SELECT foo_ FROM foos WHERE name_ = $1 > $$ LANGUAGE SQL STRICT; > > CREATE OR REPLACE > FUNCTION new_foo(text) RETURNS foo AS $$ > DECLARE > this regprocedure := 'new_foo(text)'; > _foo foo; > BEGIN > INSERT INTO foos(name_) VALUES ($1) > RETURNING foo_ INTO _foo; > RETURN _ref; > EXCEPTION > WHEN unique_violation THEN > -- maybe another thread? > RAISE NOTICE '% "%" unique_violation', this, $1; > RETURN NULL; > END; > $$ LANGUAGE plpgsql STRICT; > > CREATE OR REPLACE > FUNCTION get_foo(text) RETURNS foo AS $$ > SELECT COALESCE( > old_foo($1), new_foo($1), old_foo($1) > ) > $$ LANGUAGE sql STRICT; hm. It's theoretically possible, although highly unlikely, in read committed for this to return null. The loop is the *only* way to guarantee insert/update without retrying in the application. I do however like the use of coalesce as a sugary SQL way if doing 'if else'. Also nice use of regprocedure in the logging. However the original problem of the exception handler catching the wrong unique violation is an bug ISTM in your exception handling -- that should have been caught in the cascaded handler and handled/rethrown in such a way as to not loop you. Couple of style notes (my 0.02$). you've obviously put a lot of thought into your style but I found your example a bit hard to follow at first, so I'll present an alternative: *) Highly prefer named input args in pl/pgsql (don't use $1, etc). *) I prefer to suffix composite types _t -- your example would be easier to read: create type foo_t as (...) CREATE TABLE foos ( foo foo_t PRIMARY KEY DEFAULT next_foo(); name_ text UNIQUE NOT NULL; ); *) Don't like the suffix underscore (what does it mean?). *) I personally reserve plurals for arrays not tables. most people don't though so no big deal: create table foo ( foo foo_t, -- you can do this foos foo_t[] ); merlin
On Thu, Jul 29, 2010 at 10:06 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Jul 28, 2010 at 5:27 PM, J. Greg Davidson <jgd@well.com> wrote: >> Hi fellow PostgreSQL hackers, >> >> I just got burned by the idiomatic loop >> documented in the PostgreSQL manual as >> >> Example 39-2. Exceptions with UPDATE/INSERT >> >> I have now replaced this "standard" idiom >> with a safer one described below. >> >> What went wrong: >> >> It seems that the table I was either >> inserting into or selecting from had >> a trigger inserting some associated >> data which was sometimes raising a >> unique_violation exception, turning the >> "standard" idiom into an infinite loop! >> >> My (simplified) old code looked like this: >> >> CREATE TABLE foos ( >> foo_ foo PRIMARY KEY DEFAULT next_foo(); >> name_ text UNIQUE NOT NULL; >> ); >> >> CREATE OR REPLACE >> FUNCTION get_foo(text) RETURNS foo AS $$ >> DECLARE >> _foo foo; >> BEGIN >> LOOP >> SELECT foo_ INTO _foo >> FROM foos WHERE name_ = $1; >> IF FOUND THEN RETURN _foo; END IF; >> BEGIN >> INSERT INTO foos(name_) VALUES($1); >> EXCEPTION >> WHEN unique_violation THEN >> -- maybe another thread? >> END; >> END LOOP; >> END; >> $$ LANGUAGE plpgsql STRICT; >> >> My (simplified) new code is longer but >> more flexible, safer and adds logging: >> >> CREATE OR REPLACE >> FUNCTION old_foo(text) RETURNS foo AS $$ >> SELECT foo_ FROM foos WHERE name_ = $1 >> $$ LANGUAGE SQL STRICT; >> >> CREATE OR REPLACE >> FUNCTION new_foo(text) RETURNS foo AS $$ >> DECLARE >> this regprocedure := 'new_foo(text)'; >> _foo foo; >> BEGIN >> INSERT INTO foos(name_) VALUES ($1) >> RETURNING foo_ INTO _foo; >> RETURN _ref; >> EXCEPTION >> WHEN unique_violation THEN >> -- maybe another thread? >> RAISE NOTICE '% "%" unique_violation', this, $1; >> RETURN NULL; >> END; >> $$ LANGUAGE plpgsql STRICT; >> >> CREATE OR REPLACE >> FUNCTION get_foo(text) RETURNS foo AS $$ >> SELECT COALESCE( >> old_foo($1), new_foo($1), old_foo($1) >> ) >> $$ LANGUAGE sql STRICT; > > hm. It's theoretically possible, although highly unlikely, in read > committed for this to return null. The loop is the *only* way to > guarantee insert/update without retrying in the application. > > I do however like the use of coalesce as a sugary SQL way if doing 'if > else'. Also nice use of regprocedure in the logging. However the > original problem of the exception handler catching the wrong unique > violation is an bug ISTM in your exception handling -- that should > have been caught in the cascaded handler and handled/rethrown in such > a way as to not loop you. Probably better would be to have the unique constraint handler grep sqlerrm to ensure that the violation is coming from the table being merged (otherwise simply rethrow). That's a lot safer/faster than using sub-transaction in every dependent trigger. Also ISTM any infinite loop like that could use a safety mechanism for paranoia purposes. If you _do_ rethrow, make sure to use the version of raise that supports throwing the specific sql error code. The fact that this is so difficult to get right is really an indictment of the SQL language TBH (it was only introduced in 2008). This is a perfectly reasonable thing to want to do but a proper implementation is quite nasty. This is highly requested feature and our workaround is (let's be frank) a kludge. merlin
Hi fellow PostgreSQL hackers, First, a thank you to Merlin for commenting on my earlier post! I've run into another dangerous problem since the earlier post. I began converting from the plpgsql loop idiom for merging data into a COALESCE(find(), create(), find()) idiom and ran into a problem with the latter. I'll hold the full code to the end of this post and summarize the situation first. I'm using this idiom in 48 get_ functions associated with 48 different tables! In all cases, the functions should be idempotent with monotonic side effects, i.e. when the data desired is not present (the first time they are called), they create the data; in all cases, they return a reference (primary key value) to the data. Here's the new problem: Depending on the idiom I use, when I nest a call to one of these get_ functions as an argument to another function, the outer function does not see the new data! To be more specific, in the call: SELECT access_foo( get_foo(foo_data) ) where access_foo expects a reference to a row of TABLE foos and get_foo returns such a reference, possibly creating the desired row, when I write get_foo() with the plpgsql loop idiom, it seems to always return a reference which access_foo can use immediately. On the other hand, when I use the COALESCE(find(), create(), find()) idiom and the get_foo() function created a new row, access_foo fails to find it! In all cases saying: SELECT get_foo(foo_data); SELECT access_foo( get_foo(foo_data) ); works fine since if the data needed to be added, it was done in the separate earlier transaction. Because of this problem, I'm abandoning my original preference for the COALESCE(find(), create(), find()) idiom and I'm adding a check to the plpgsql LOOP idiom to prevent it going infinite. For those who'd like to see the gory details, here is the code, simplified as much as possible (and with a suffix on the type name to please Merlin:): -- (0) The table in question: -- The trailing underscores can be read as "field" or "slot" -- which is sometimes useful to avoid clashes with reserved -- words, local variables, etc. CREATE TABLE foos ( ref_ foo_reftype PRIMARY KEY DEFAULT next_foo_ref(); name_ text UNIQUE NOT NULL; ); -- (1) The idiom from the PostgreSQL reference manual, which -- unfortunately can go into an infinite loop if a trigger -- should raise a unique_violation exception. -- The underscore prefix can be read as "local" -- and is sometimes useful to avoid name clashes, etc. CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo_reftype AS $$ DECLARE _ref foo_reftype; BEGIN LOOP SELECT ref_ INTO _ref FROM foos WHERE name_ = $1; IF FOUND THEN RETURN _ref; END IF; BEGIN INSERT INTO foos(name_) VALUES($1); EXCEPTION WHEN unique_violation THEN -- maybe another thread? END; END LOOP; END; $$ LANGUAGE plpgsql STRICT; -- (2) Where I was originally going -- to avoid the infinite loop problem, -- and also hoping to get better performance -- on the most common case where the -- first call to old_foo() finds the row -- (since SQL functions are inlined into -- the execution plan): CREATE OR REPLACE FUNCTION old_foo(text) RETURNS foo_reftype AS $$ SELECT ref_ FROM foos WHERE name_ = $1 $$ LANGUAGE SQL STRICT; CREATE OR REPLACE FUNCTION new_foo(text) RETURNS foo_reftype AS $$ DECLARE this regprocedure := 'new_foo(text)'; _ref foo_reftype; BEGIN INSERT INTO foos(name_) VALUES ($1) RETURNING ref_ INTO _ref; RETURN _ref; EXCEPTION WHEN unique_violation THEN -- maybe another thread? RAISE NOTICE '% "%" unique_violation', this, $1; RETURN NULL; END; $$ LANGUAGE plpgsql STRICT; CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo_reftype AS $$ SELECT COALESCE( old_foo($1), new_foo($1), old_foo($1) ) $$ LANGUAGE sql STRICT; -- (3) Where I'm going now, although it feels -- like a patch (I hate :-( patches!): CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo_reftype AS $$ DECLARE _ref foo_reftype; killroy_was_here boolean := false; this regprocedure := 'get_foo(text)'; BEGIN LOOP SELECT ref_ INTO _ref FROM foos WHERE name_ = $1; IF FOUND THEN RETURN _foo; END IF; IF killroy_was_here THEN RAISE EXCEPTION '% "%" loops!', this, $1; END IF; killroy_was_here := true; BEGIN INSERT INTO foos(name_) VALUES($1); EXCEPTION WHEN unique_violation THEN -- another thread? RAISE NOTICE '% "%" unique_violation', this, $1; END; END LOOP; END; $$ LANGUAGE plpgsql; -- end code _Greg P.S. Sorry about the excessive underscores! I've been burned by having local variables confused with fields, so when I need to clearly differentiate such things, I use this convention. J. Greg Davidson
On Tue, Aug 3, 2010 at 11:32 PM, J. Greg Davidson <jgd@well.com> wrote: > Hi fellow PostgreSQL hackers, > > First, a thank you to Merlin for commenting on my earlier post! > I've run into another dangerous problem since the earlier post. > > I began converting from the plpgsql loop idiom for merging data > into a COALESCE(find(), create(), find()) idiom and ran into a > problem with the latter. I'll hold the full code to the end of > this post and summarize the situation first. > > I'm using this idiom in 48 get_ functions associated with > 48 different tables! In all cases, the functions should be > idempotent with monotonic side effects, i.e. when the data > desired is not present (the first time they are called), > they create the data; in all cases, they return a reference > (primary key value) to the data. > > Here's the new problem: Depending on the idiom I use, when I > nest a call to one of these get_ functions as an argument to > another function, the outer function does not see the new data! > > To be more specific, in the call: > SELECT access_foo( get_foo(foo_data) ) > where access_foo expects a reference to a row of TABLE foos > and get_foo returns such a reference, possibly creating the > desired row, > > when I write get_foo() with the plpgsql loop idiom, it seems > to always return a reference which access_foo can use immediately. > On the other hand, when I use the COALESCE(find(), create(), find()) > idiom and the get_foo() function created a new row, access_foo > fails to find it! > > In all cases saying: > SELECT get_foo(foo_data); > SELECT access_foo( get_foo(foo_data) ); > works fine since if the data needed to be added, it was done > in the separate earlier transaction. > > Because of this problem, I'm abandoning my original preference > for the COALESCE(find(), create(), find()) idiom and I'm adding > a check to the plpgsql LOOP idiom to prevent it going infinite. > > For those who'd like to see the gory details, here is the > code, simplified as much as possible (and with a suffix on > the type name to please Merlin:): > > -- (0) The table in question: > > -- The trailing underscores can be read as "field" or "slot" > -- which is sometimes useful to avoid clashes with reserved > -- words, local variables, etc. > > CREATE TABLE foos ( > ref_ foo_reftype PRIMARY KEY DEFAULT next_foo_ref(); > name_ text UNIQUE NOT NULL; > ); > > -- (1) The idiom from the PostgreSQL reference manual, which > -- unfortunately can go into an infinite loop if a trigger > -- should raise a unique_violation exception. > > -- The underscore prefix can be read as "local" > -- and is sometimes useful to avoid name clashes, etc. > > CREATE OR REPLACE > FUNCTION get_foo(text) RETURNS foo_reftype AS $$ > DECLARE > _ref foo_reftype; > BEGIN > LOOP > SELECT ref_ INTO _ref > FROM foos WHERE name_ = $1; > IF FOUND THEN RETURN _ref; END IF; > BEGIN > INSERT INTO foos(name_) VALUES($1); > EXCEPTION > WHEN unique_violation THEN > -- maybe another thread? > END; > END LOOP; > END; > $$ LANGUAGE plpgsql STRICT; > > -- (2) Where I was originally going > -- to avoid the infinite loop problem, > -- and also hoping to get better performance > -- on the most common case where the > -- first call to old_foo() finds the row > -- (since SQL functions are inlined into > -- the execution plan): > > CREATE OR REPLACE > FUNCTION old_foo(text) RETURNS foo_reftype AS $$ > SELECT ref_ FROM foos WHERE name_ = $1 > $$ LANGUAGE SQL STRICT; > > CREATE OR REPLACE > FUNCTION new_foo(text) RETURNS foo_reftype AS $$ > DECLARE > this regprocedure := 'new_foo(text)'; > _ref foo_reftype; > BEGIN > INSERT INTO foos(name_) VALUES ($1) > RETURNING ref_ INTO _ref; > RETURN _ref; > EXCEPTION > WHEN unique_violation THEN > -- maybe another thread? > RAISE NOTICE '% "%" unique_violation', this, $1; > RETURN NULL; > END; > $$ LANGUAGE plpgsql STRICT; > > CREATE OR REPLACE > FUNCTION get_foo(text) RETURNS foo_reftype AS $$ > SELECT COALESCE( > old_foo($1), new_foo($1), old_foo($1) > ) > $$ LANGUAGE sql STRICT; > > -- (3) Where I'm going now, although it feels > -- like a patch (I hate :-( patches!): > > CREATE OR REPLACE > FUNCTION get_foo(text) RETURNS foo_reftype AS $$ > DECLARE > _ref foo_reftype; > killroy_was_here boolean := false; > this regprocedure := 'get_foo(text)'; > BEGIN > LOOP > SELECT ref_ INTO _ref > FROM foos WHERE name_ = $1; > IF FOUND THEN RETURN _foo; END IF; > IF killroy_was_here THEN > RAISE EXCEPTION '% "%" loops!', this, $1; > END IF; > killroy_was_here := true; > BEGIN > INSERT INTO foos(name_) VALUES($1); > EXCEPTION > WHEN unique_violation THEN -- another thread? > RAISE NOTICE '% "%" unique_violation', this, $1; > END; > END LOOP; > END; > $$ LANGUAGE plpgsql; The infinite loop check is good but you missed the most important part: you need to be checking sqlerrm to see where the unique violation is coming from. Your original issue was that some dependent trigger was causing the error which is getting caught here. Your check should ONLY be handling unique violations on the table 'foos'. The error message (sqlerrm) will look something like this: ERROR: duplicate key value violates unique constraint "a_constraint_pkey" I would do something like this: WHEN unique_violation THEN -- another TABLE? this_table := false; IF SQLERRM ~ 'unique constraint "a_constraint_pkey"' THEN this_table := true; END IF; IF SQLERRM ~ 'unique constraint "another_unique_constraint"' THEN this_table := true; END IF; IF NOT this_table RAISE '%', SQLERRM USING ERRCODE = 'unique_violation'; END IF; yes, this is awful. hopefully your constraints have useful names that are unique. IMNSHO the fully schema qualified table name should be in the error message. merlin
On Wed, 2010-08-04 at 11:12 -0400, Merlin Moncure wrote: > The infinite loop check is good but you missed the most important > part: you need to be checking sqlerrm to see where the unique > violation is coming from. Your original issue was that some dependent > trigger was causing the error which is getting caught here. Your > check should ONLY be handling unique violations on the table 'foos'. > > The error message (sqlerrm) will look something like this: > ERROR: duplicate key value violates unique constraint "a_constraint_pkey" > > I would do something like this: > WHEN unique_violation THEN -- another TABLE? > this_table := false; > > IF SQLERRM ~ 'unique constraint "a_constraint_pkey"' THEN > this_table := true; > END IF; > > IF SQLERRM ~ 'unique constraint "another_unique_constraint"' THEN > this_table := true; > END IF; > > IF NOT this_table > RAISE '%', SQLERRM USING ERRCODE = 'unique_violation'; > END IF; > > yes, this is awful. hopefully your constraints have useful names that > are unique. IMNSHO the fully schema qualified table name should be in > the error message. > > merlin My infinite loop check is probably paranoia if I put in the check you suggest. The check you suggest is absolutely correct, yet it cannot be coded portably. The unique constraints have whatever name PostgreSQL generates in response to the PRIMARY KEY or UNIQUE keywords. I have to deal with 48 different tables in the current codebase, so both maintenance and boilerplate reduction are important. This leads me to suggest the following new idiom for this kind of function, starting with two necessary utility functions: -- definitions needed here moved to bottom of message CREATE OR REPLACE FUNCTION errm_is_from_table(text, regclass) RETURNS boolean AS $$ -- Warning: Non-portable implementation! -- Based on current PostgreSQL SQLERRM strings like: -- duplicate key value violates unique constraint ... -- ... "foos_pkey" SELECT $1 LIKE '%"' || $2 || '_%"%' $$ LANGUAGE sql; CREATE OR REPLACE FUNCTION errm_not_from_table( text, regclass, regprocedure, VARIADIC text[] = '{}'::TEXT[] ) RETURNS boolean AS $$ BEGIN IF NOT errm_is_from_table($1, $2) THEN RETURN true; END IF; RAISE NOTICE '% raised: %', $3::regproc || '(' || array_to_string($4, ',') || '): ', $1; RETURN false; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION get_foo(text) RETURNS foo_reftype AS $$ DECLARE _ref foo_reftype; _table regclass := 'foos'; this regprocedure := 'get_foo(text)'; BEGIN LOOP SELECT ref_ INTO _ref FROM foos WHERE name_ = $1; IF FOUND THEN RETURN _foo; END IF; BEGIN INSERT INTO foos(name_) VALUES($1); EXCEPTION WHEN unique_violation THEN IF errm_not_from_table(ERRM, _table, _this, $1) THEN RAISE; -- re-raises original exception END IF; END; END LOOP; END; $$ LANGUAGE plpgsql; If I could move the re-raising into errm_not_from_table() then I could make things even cleaner, but I don't know if that's possible. Here are the omitted definitions needed to make this simplified example code work: CREATE DOMAIN foo_reftype AS INTEGER; CREATE TABLE foos ( ref_ foo_reftype PRIMARY KEY, name_ text UNIQUE NOT NULL ); CREATE SEQUENCE foos__seq OWNED BY foos.ref_; CREATE FUNCTION next_foo_ref() RETURNS foo_reftype AS $$ SELECT nextval('foos__seq')::foo_reftype $$ LANGUAGE sql; ALTER TABLE foos ALTER COLUMN ref_ SET DEFAULT next_foo_ref(); _Greg J. Greg Davidson