Обсуждение: storing intermediate results in recursive plpgsql functions
Hello, I have a table 'entity'. It has an id column and a parent_id column that refers to another entity in the same table. I often need to calculate all of the "descendants" of a particular id. I wrote a plpgsql function 'get_descendants' that, given an id, recurses over all the children of that id, and the children's children, etc... As it recurses, it puts the ids of the descendants into a one-column table called 'descendants'. Since the return of this function needs to be a setof int4, and as best I can tell there's no way to return a setof from a recursive plpgsql function (nowhere internally to store the intermediate result set), I call this function from a wrapper sql function that simply calls the recursive function, and then selects from the 'descendants' table. This all works great until two users simultaneously call get_descendants() wanting the descendants of two different people. They of course get mixed and matched in the descendants table. I thought I knew of a couple of different ways to possibly solve the problem but I'm finding out that they don't work so well after all. First I thought I would just have the function create 'descendants' as a temporary table instead of a permanent one, but since table names get converted to OIDs when I load the function into the db, I'm getting Relation Not Found errors. I'm guessing that trying to refer to temporary tables from stored procedures is not a common practice. :-) Then I thought I'd just have the sql function obtain an access exclusive lock on the 'descendants' table, call the recursive plpgsql function, select the results from 'descendants', then unlock the 'descendants' table. Then the second instance of get_descendants would sit and wait for descendants to come free. That doesn't work because the sql function wants the last statement called to be the one that returns the setof int4, not the commit; that is necessary. Right now the only way I can think of to do this is to have a bit flag that the client can examine to see if anyone else is currently calling get_descendants and to wait if so and try again to obtain the bit for itself in a second or so. I am the first to admit that this would be a horrendous hack (yeah, like the rest of what I've already done isn't ;) I feel like I'm missing something obvious. But it seems the exact combination I have here (sql function calling a recursive plpgsql function and then selecting the result set from a table) is just enough to thwart all the usual techniques. I have a short mental list of things I should have done/could do to maybe help: 1. Stop using recursive functions and represent the family tree in Joe Celko fashion (I learned about his existence after the creation of this database, unfortunately) 2. Figure out how to store an array of integers as the intermediate result set on every pass of a recursive plpgsql function and return them as a setof int4 when I'm done recursing. I remember trying a while on this and finding it impossible. I still feel like I'm missing some obvious solution. Anyone have any insight? Thanks, Fran
Fran Fabrizio <ffabrizio@mmrd.com> writes: > I feel like I'm missing something obvious. Well, you could use a temp table if you access it via EXECUTEd commands. Or you could use a non-temp table and just not unlock; let the unlock happen at normal end of transaction. The latter is probably easier. regards, tom lane
This is a followup to a thread last Friday. I'll restate the problem and already-tried solutions, including Tom Lane's from Friday, then I'll ask for more ideas. :-) Requirement: I need a plpgsql function that can return to me the list of all child id's given a parent id. This will query a table 'entities' with 'id' and 'parent_id' fields. Therefore, the nature of the data is such that this function will need to be recursive. Additionally, this function needs to be multiuser-safe in case two connections call it at the same time with different parent id's. Limitations/Assumptions (correct me if I am wrong): 1. plpgsql functions cannot return setof's. So, if I want the set of child ids, I need to wrap this plpgsql function with an sql function, which can return setof. 2. The only way to pass the result set from the plpgsql to the sql function is storing it externally to the function- i.e. in a (temporary or permanent) table. 3. I can create a table called descendants that has one column, id. If it's a permanent table, I'll have to lock it so that a concurrent call to the function cannot intersperse id's in the result set. If it's a temporary table, it will only be visible to it's own connection anyhow. 4. If I take the permanent table route, I run into a transaction problem. I cannot have the wrapper sql function lock the table, because the last statement has to be a select that returns the setof integers, not the commit to unlock the table. If I put the commit before the 'select descendants.id from descendants', it's not multiuser-safe. If I omit the commit, the table stays locked for the rest of the life of the connection (apprently sql functions do not automatically end transactions/release locks like plpgsql functions do). Similarly, if I lock the table inside the inner plpgsql function, it gets unlocked before the wrapper sql function returns the result set (because every plpgsql function is run inside a transaction and the table gets released as soon as the plpgsql function returns.) So, it appears there's no safe way to do it via permanent table. This was one of Tom's suggestions, but since a plpgsql function cannot return setof, and I need to rely on a wrapper sql function for that, I'm not sure how that could work. 5. If I go with a temporary table, there is the 'when do i create it' issue. I cannot create it in the sql function because i need conditional logic to say "if it doesnt already exist for this connection, create it". If I try to create it when it already exists, I of course get an error. On the other hand, I don't know of a way to check for temporary table existence from within plpgsql in order to know whether to create it or not. If that is possible, I might have a working solution there. The plpgsql function would check for temp table existence, create it if not, delete previous contents, and populate it. The wrapper would select from the temp table for the return set. Multiuser safe since it's a temp table. But not at all sure if it's even possible. At this point, I believe my only solution to this problem is to convert the recursive data in the table into Joe Celko tree style (which would reduce the problem to "select id from entities where left > (select left from entities where id = $parentid) and right < (select right from entities where id = $parentid). However, if anyone can think of a way to do this with the current table, this would be preferable for the short and medium-term. I appreciate all the help I've received so far, hopefully we can get this solved before too long. Thanks, Fran
Fran Fabrizio <ffabrizio@mmrd.com> writes: > (apprently sql functions do not automatically end > transactions/release locks like plpgsql functions do) Say what? There's no difference on that score for any kind of function; none of them end transactions or release locks at exit. regards, tom lane
>>(apprently sql functions do not automatically end >>transactions/release locks like plpgsql functions do) >> > > Say what? There's no difference on that score for any kind of function; > none of them end transactions or release locks at exit. If that's the case, then this is the second time in a week I've found huge errors in this darn PostgreSQL Developer's Handbook I bought. I'm coming very close to tossing this thing in the garbage. "When a PL/pgSQL function locks a table, the lock is released when the PL/pgSQL function returns". "You can't have transactions in PL/pgSQL functions. Every function is executed in one transaction." This book is really starting to tick me off. Grr. I'm amazed that there are errors of this magnitude. The authors are Ewald Geschwinde and Hans-Jurgen Schonig, does anyone know their credentials? Thanks for clearing that up, Tom. -Fran
Fran Fabrizio <ffabrizio@mmrd.com> writes: > If that's the case, then this is the second time in a week I've found > huge errors in this darn PostgreSQL Developer's Handbook I bought. I'm > coming very close to tossing this thing in the garbage. > "When a PL/pgSQL function locks a table, the lock is released when the > PL/pgSQL function returns". This is bogus... > "You can't have transactions in PL/pgSQL functions. Every function is > executed in one transaction." This is perfectly true: the transaction in which the calling query is contained also contains the operations executed in the called function. However, evidently the context misled you to think it meant that the function has its own transaction. > This book is really starting to tick me off. Grr. I'm amazed that > there are errors of this magnitude. The authors are Ewald Geschwinde > and Hans-Jurgen Schonig, does anyone know their credentials? Can't say that I recognize either name. But you might as well let them know of the mistakes you find, so that they can fix 'em in future editions (if any). regards, tom lane
> >>(apprently sql functions do not automatically end > >>transactions/release locks like plpgsql functions do) > > > > Say what? There's no difference on that score for any kind of function; > > none of them end transactions or release locks at exit. > > "When a PL/pgSQL function locks a table, the lock is released when the > PL/pgSQL function returns". I don't see any way in which that could be a true statement. Which is not to say that it couldn't be a true statement :) > "You can't have transactions in PL/pgSQL functions. Every function is > executed in one transaction." This is a true statement (although not worded very clearly). Let me give a shot at making it more understandable: You can not start or end a transaction inside a function. The reason for that is rather simple: PostgreSQL does not support nested transactions. Since all functions are already wrapped in a transaction, implicit or explicit, they are already inside a transaction, and therefore can not start a new transaction. For the same reason, you can not end a transaction because the function is considered an atomic action. Greg
On Tue, 5 Mar 2002, Tom Lane wrote: > Fran Fabrizio <ffabrizio@mmrd.com> writes: > > If that's the case, then this is the second time in a week I've found > > huge errors in this darn PostgreSQL Developer's Handbook I bought. I'm > > coming very close to tossing this thing in the garbage. > > > "When a PL/pgSQL function locks a table, the lock is released when the > > PL/pgSQL function returns". > > This is bogus... > > > "You can't have transactions in PL/pgSQL functions. Every function is > > executed in one transaction." > > This is perfectly true: the transaction in which the calling query is > contained also contains the operations executed in the called function. > However, evidently the context misled you to think it meant that the > function has its own transaction. According to our docs: ---------- If you do a LOCK TABLE in PL/pgSQL, the lock will not be released until the calling transaction is finished. and You also cannot have transactions in PL/pgSQL procedures. The entire function (and other functions called from therein) is executed in a transaction and PostgreSQL rolls back the results if something goes wrong. ---------- So if the function is executed in a transaction and the transaction ends when the function does, doesn't that mean the lock is released when the function ends? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo at Pop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Vince Vielhaber <vev@michvhf.com> writes: > So if the function is executed in a transaction and the transaction > ends when the function does, doesn't that mean the lock is released > when the function ends? But the transaction *doesn't* end when the function does. The function must have been called from some SQL query (eg, a SELECT) and there is an explicit or implicit transaction wrapped around every query issued by an application. The earliest the lock can possibly be released is when the current application-given query ends. If the query is inside an explicit BEGIN/COMMIT block then the lock will be held till you COMMIT or ROLLBACK. Feel free to propose better wording for that part of the plpgsql docs... regards, tom lane