Обсуждение: SQL design pattern for a delta trigger?
IS there such a thing? I can be the first to consider this. What I am aiming for is a solution with a couple coupled tables, one of which represents state through time and the other represents transactions or deltas on the state. With one field (a floating point number) in the state table (or should I say a number for each id field), it obviously has a time interval for which it is valid: a start time and an end time. What I am after is a situation where the moment a record is inserted in the deltas table, a trigger function first looks to see if the id provided presently has a state in the state table. If not, then it creates one. Then, the end date for that state record gets set to the current time and a new record is inserted with the new state (computed by applying the delta to the value in the previous record for the state), the current date as the start date and null for the end date. This seems like an obvious thing to try, but I am floundering a little and am therefore wondering if anyone has seen an SQL design pattern that talks about this, and an url where I can see such a discussion and, better, an example. The first concern is to ensure that every record inserted into the deltas table is immediately reflected in the state table, and the second is that the history of state can be reconstructed from a suitable query on the state table. I can do this easily in client code, but isn't this the sort of thing best suited to living within the database itself? Thanks Ted
--- Ted Byers <r.ted.byers@rogers.com> wrote: > IS there such a thing? I can be the first to > consider > this. > OOPS. The mind is faster than the fingers. That should have been "I can NOT be the first to consider this. Ted
On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: > IS there such a thing? I can be the first to consider > this. > What I am aiming for is a solution with a couple > coupled tables, one of which represents state through > time and the other represents transactions or deltas > on the state. With one field (a floating point > number) in the state table (or should I say a number > for each id field), it obviously has a time interval > for which it is valid: a start time and an end time. > > What I am after is a situation where the moment a > record is inserted in the deltas table, a trigger > function first looks to see if the id provided > presently has a state in the state table. If not, > then it creates one. Then, the end date for that > state record gets set to the current time and a new > record is inserted with the new state (computed by > applying the delta to the value in the previous record > for the state), the current date as the start date and > null for the end date. > > This seems like an obvious thing to try, but I am > floundering a little and am therefore wondering if > anyone has seen an SQL design pattern that talks about > this, and an url where I can see such a discussion > and, better, an example. The first concern is to > ensure that every record inserted into the deltas > table is immediately reflected in the state table, and > the second is that the history of state can be > reconstructed from a suitable query on the state > table. > > I can do this easily in client code, but isn't this > the sort of thing best suited to living within the > database itself? What you want to do here for handling the update v. insert is called an "UPSERT". Basically, what you do is run the update as if the row exists and catch the exception that is thrown if it doesn't at which point you insert the record with the end date = now(). After that you can proceed normally with creating the new record with start date = now() and end date = NULL. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
--- Erik Jones <erik@myemma.com> wrote: > > On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: > > [snip] > What you want to do here for handling the update v. > insert is called > an "UPSERT". Basically, what you do is run the > update as if the row > exists and catch the exception that is thrown if it > doesn't at which > point you insert the record with the end date = > now(). After that > you can proceed normally with creating the new > record with start date > = now() and end date = NULL. > Thanks Eric. Do you know of an URL where this is discussed or where I can find an example. None of my books discuss this, and my search using google has so far produced only noise. Thanks again. Ted
Ted Byers wrote: > --- Erik Jones <erik@myemma.com> wrote: > >> On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: >> >> [snip] >> What you want to do here for handling the update v. >> insert is called >> an "UPSERT". Basically, what you do is run the >> update as if the row >> exists and catch the exception that is thrown if it >> doesn't at which >> point you insert the record with the end date = >> now(). After that >> you can proceed normally with creating the new >> record with start date >> = now() and end date = NULL. >> > Thanks Eric. Do you know of an URL where this is > discussed or where I can find an example. None of my > books discuss this, and my search using google has so > far produced only noise. You can do this with a conditional. Something like the following should work. IF NOT (a query matching your data returns rows) THEN INSERT (your new data) AFAIK, the developers are working on implementing the {MERGE, UPDATE OR ON FAILURE INSERT, UPSERT} statement. Until then, you have to build your upsert manually. Colin
On Dec 7, 2007, at 6:29 AM, Ted Byers wrote: > > --- Erik Jones <erik@myemma.com> wrote: > >> >> On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: >> >> [snip] >> What you want to do here for handling the update v. >> insert is called >> an "UPSERT". Basically, what you do is run the >> update as if the row >> exists and catch the exception that is thrown if it >> doesn't at which >> point you insert the record with the end date = >> now(). After that >> you can proceed normally with creating the new >> record with start date >> = now() and end date = NULL. >> > Thanks Eric. Do you know of an URL where this is > discussed or where I can find an example. None of my > books discuss this, and my search using google has so > far produced only noise. http://www.postgresql.org/docs/current/static/plpgsql-control- structures.html#PLPGSQL-UPSERT-EXAMPLE might be a good place to start. Cheers, Steve
On Dec 7, 2007, at 8:29 AM, Ted Byers wrote: > > --- Erik Jones <erik@myemma.com> wrote: > >> >> On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: >> >> [snip] >> What you want to do here for handling the update v. >> insert is called >> an "UPSERT". Basically, what you do is run the >> update as if the row >> exists and catch the exception that is thrown if it >> doesn't at which >> point you insert the record with the end date = >> now(). After that >> you can proceed normally with creating the new >> record with start date >> = now() and end date = NULL. >> > Thanks Eric. Do you know of an URL where this is > discussed or where I can find an example. None of my > books discuss this, and my search using google has so > far produced only noise. > > Thanks again. The basic pseudo-code (not really SQL) is something like: TRY: run UPDATE EXCEPT not found: make INSERT If you can give me some table layouts I can probably work out a better example for you. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
So two design patterns for a makeshift UPSERT have been presented - one is to check beforehand, and only insert if the item isn't present already, the other is to do the insert blindly and let PG check for you, and catch any exceptions. I'm also wondering what people's ideas are for a sort of BULK UPSERT. I often find myself inserting the results of a SELECT and wanting a similar check for already existing rows. The idiom I've stumbled upon looks like this: insert into foo (x, y, z) select a, b, c from bar join bax ... EXCEPT select x, y, z from foo; Namely, I subtract from the results to be inserted any rows that are already present in the target table. This can actually even be used for UPSERTing a single row, and has the virtue of being pure SQL, but I've wondered about its efficiency. One alternative would be to iterate over the SELECT result with a procedural language, and do a series of UPSERTS, but that seems unlikely to be as efficient for a large result set. Any comments about the relative merits of these or other alternatives? Thanks. - John Burger MITRE
On Dec 8, 2007, at 7:54 AM, John D. Burger wrote: > So two design patterns for a makeshift UPSERT have been presented - > one is to check beforehand, and only insert if the item isn't > present already ... which will give the wrong results if there's any concurrent updates... > , the other is to do the insert blindly and let PG check for you, > and catch any exceptions. > > I'm also wondering what people's ideas are for a sort of BULK > UPSERT. I often find myself inserting the results of a SELECT and > wanting a similar check for already existing rows. The idiom I've > stumbled upon looks like this: > > insert into foo (x, y, z) > select a, b, c from bar join bax ... > EXCEPT > select x, y, z from foo; > > Namely, I subtract from the results to be inserted any rows that > are already present in the target table. > This can actually even be used for UPSERTing a single row, and has > the virtue of being pure SQL, but I've wondered about its efficiency. Worry more about it's correctness. Doing entirely the wrong thing, quickly, isn't always what you want. If there's any concurrency involved at all, this is likely to do the wrong thing. > One alternative would be to iterate over the SELECT result with a > procedural language, and do a series of UPSERTS, but that seems > unlikely to be as efficient for a large result set. Just take the idiom that's been pointed out in the documentation and wrap a loop around it. Cheers, Steve
Thanks all. I tried the appended code in a trigger function, but postgresql won't take it. It complains that assets.quantity is not a scalar. However, the WHERE clause in that select statement guarantees that at most only one record will be returned. An open position on a given kind of asset is represented by null in the end_valid_time field, and the combination of asset_type_id, portfolio_id and end_valid_time is certain to be unique, if there is a record for that asset type in that porfolio at all. I thought I'd try checking for an open position first because the manual indicated that exception handling is quite expensive. But I must have missed something, because it doesn't like how I tried to define my trigger function. I have four sequences, one each for four tables. Two of the tables are just look up tables, for asset types and portfolios; trivial for test case with only an autoincrementing integer primary key and a "name". The other two are the ones of interest. Assets is treated as read only as far as the user is concerned. The user's data in the assets table is mediated through transactions inserted (and NEVER deleted or updated), into the transactions table. Assets has the minimal suite of columns (autoincrementing integer primary key, asset_typeID, portfolio_id, all integers, quantity with is a floating point number and two dates: start_valid_time and end_valid_time). Transactions has only a transaction_id, portfolio_id, asset_type_id, quantity and transaction_date. There are of course foreign keys connectin the assets and transactions tables to the lookup tables, and a composite index on assets to make looking up records based on portfolio_id, asset_id and end_valid_time as quick as possible. It couldn't be simpler, conceptually! yet I must have missed something, cause postgresql won't accept the function body I show below. If I can't get this working quickly, I may just resort to creating a stored procedure that takes the transaction details as arguments and processes both tables appropriately without relying on a trigger. :-( Thanks for everyone's help. Ted =========================================== DECLARE id BIGINT; q DOUBLE PRECISION; BEGIN SELECT assets.id INTO id, assets.quantity INTO q FROM assets WHERE assets.asset_type_id = NEW.asset_type_id AND assets.portfolio_id = NEW.portfolio_id AND assets.end_valid_time IS NULL; IF (id IS NULL) THEN INSERT INTO assets (asset_type_id, portfolio_id,quantity,start_valid_stime,end_valid_time) VALUES (NEW.asset_type_id,NEW.portfolio_id, NEW.quantity, NEW.transaction_date,NULL); ELSE UPDATE assets SET end_valid_time = NEW.transaction_date WHERE id = id; INSERT INTO assets (asset_type_id, portfolio_id,quantity,start_valid_stime,end_valid_time) VALUES (NEW.asset_type_id,NEW.portfolio_id, q + NEW.quantity, NEW.transaction_date,NULL); END END
On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote: > You can do this with a conditional. Something like the following > should work. > > IF > NOT (a query matching your data returns rows) > THEN > INSERT (your new data) There exists a race condition here unless you've locked your tables.
Vivek Khera wrote: > On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote: > >> You can do this with a conditional. Something like the following >> should work. >> >> IF >> NOT (a query matching your data returns rows) >> THEN >> INSERT (your new data) > > There exists a race condition here unless you've locked your tables. Yes, clearly. In the context of the thread, I was assuming my algorithm would be implemented as an atomic transaction. For what it's worth, the real algorithm would be as follows. I hadn't had enough coffee yet, and I forgot the UPDATE bit. IF (a query matching your old data returns rows) THEN UPDATE with your new data ELSE INSERT your new data Colin
On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: > For what it's worth, the real algorithm would be as follows. I > hadn't had enough coffee yet, and I forgot the UPDATE bit. > > IF > (a query matching your old data returns rows) > THEN > UPDATE with your new data > ELSE > INSERT your new data Still exists race condition. Your race comes from testing existence, then creating/modifying data afterwards. You need to make the test/ set atomic else you have race.
--- Vivek Khera <khera@kcilink.com> wrote: > > On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: > > > For what it's worth, the real algorithm would be > as follows. I > > hadn't had enough coffee yet, and I forgot the > UPDATE bit. > > > > IF > > (a query matching your old data returns rows) > > THEN > > UPDATE with your new data > > ELSE > > INSERT your new data > > Still exists race condition. Your race comes from > testing existence, > then creating/modifying data afterwards. You need > to make the test/ > set atomic else you have race. > Yes, but how do you do that in a stored function or procedure or in a trigger. It would be obvious to me if I were writing this in C++ or Java, but how do you do it using SQL in an RDBMS? I saw something about table locks, but that doesn't seem wise, WRT performance. The classic example of a race condition, involving a bank account, was used in the manual to introduce the idea of a transaction, but we can't use a transaction in a trigger, can we? It is one thing to point out a race condition, but a pointer to a solution that would work in the context of the problem at hand would be useful and appreciated. Thanks all. Ted
On Dec 10, 2007, at 4:48 PM, Ted Byers wrote: > > --- Vivek Khera <khera@kcilink.com> wrote: > >> >> On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: >> >>> For what it's worth, the real algorithm would be >> as follows. I >>> hadn't had enough coffee yet, and I forgot the >> UPDATE bit. >>> >>> IF >>> (a query matching your old data returns rows) >>> THEN >>> UPDATE with your new data >>> ELSE >>> INSERT your new data >> >> Still exists race condition. Your race comes from >> testing existence, >> then creating/modifying data afterwards. You need >> to make the test/ >> set atomic else you have race. >> > > Yes, but how do you do that in a stored function or > procedure or in a trigger. It would be obvious to me > if I were writing this in C++ or Java, but how do you > do it using SQL in an RDBMS? > > I saw something about table locks, but that doesn't > seem wise, WRT performance. > > The classic example of a race condition, involving a > bank account, was used in the manual to introduce the > idea of a transaction, but we can't use a transaction > in a trigger, can we? > > It is one thing to point out a race condition, but a > pointer to a solution that would work in the context > of the problem at hand would be useful and > appreciated. > > Thanks all. In a stored procedure you'd just execute the UPDATE and then check the FOUND variable to see if it found a row to update: UPDATE table_name SET foo='bar' WHERE id=5; IF NOT FOUND THEN INSERT INTO table_name (id, foo) VALUES (5, 'bar'); END IF; Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Thanks Erik > > In a stored procedure you'd just execute the UPDATE > and then check > the FOUND variable to see if it found a row to > update: > > UPDATE table_name SET foo='bar' WHERE id=5; > > IF NOT FOUND THEN > INSERT INTO table_name (id, foo) VALUES (5, 'bar'); > END IF; > To be clear, if I understand you correctly, with your example, if there is no record where id=5, nothing happens except FOUND is set to false? Can I, then, declare a variable prior to your update statement, and then modify your update statement so that the value in a particular field on the row where id=5 can be captured? Bearing in mind this is to be in a row level trigger after an insert into table_name, something like: DECLARE q DOUBLE; UPDATE table_name SET foo='bar', q = table_name.quantity WHERE id=5; And then follow that with something like: IF FOUND THEN INSERT INTO another_table (baz,quantity) VALUES (foo,q+NEW.quantity); ELSE INSERT INTO another_table (baz,quantity) VALUES (foo,NEW.quantity); END IF Thanks again, Ted
--- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote: > but how do you > do it using SQL in an RDBMS? I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL. Regards, Richard Broersma Jr.
Richard Broersma Jr wrote: > --- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote: > > > but how do you > > do it using SQL in an RDBMS? > > I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL. IIRC the standard's definition of MERGE is still subject to the race condition :-) It seems mostly defined for OLAP, and assumes rather static data. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "La tristeza es un muro entre dos jardines" (Khalil Gibran)
Vivek Khera wrote: > On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: > >> For what it's worth, the real algorithm would be as follows. I >> hadn't had enough coffee yet, and I forgot the UPDATE bit. >> >> IF (a query matching your old data returns rows) THEN UPDATE with >> your new data ELSE INSERT your new data > > Still exists race condition. Your race comes from testing existence, > then creating/modifying data afterwards. You need to make the > test/set atomic else you have race. I guess when I wrote that the algorithm would have to be implemented in an atomic manner, it fell on deaf ears. That said, perhaps implementing a good MERGE would be not such a bad idea for PostgreSQL 8.4. Colin
On 12/10/07, Colin Wetherbee <cww@denterprises.org> wrote: > Vivek Khera wrote: > > On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: > >> IF (a query matching your old data returns rows) THEN UPDATE with > >> your new data ELSE INSERT your new data > > Still exists race condition. Your race comes from testing existence, > > then creating/modifying data afterwards. You need to make the > > test/set atomic else you have race. > I guess when I wrote that the algorithm would have to be implemented in > an atomic manner, it fell on deaf ears. The problem is that there isn't a good atomic method for that order of operations, short of locking the entire table first. A concurrent transaction might insert a row after your test but before your own INSERT. Even a SERIALIZABLE transaction won't help, as PostgreSQL doesn't implement predicate locking. That's why the example in the docs is a loop with result checking on both operations, and requires a UNIQUE constraint to work correctly. If high concurrency isn't a concern, table locking is the simpler approach.