Обсуждение: ctid & updates
1# I saw a post from Jan Wieck about how ctid can be used for a fast update. I noticed that ctid changes on update (as expected since it's really a new row). Is there anyway to get the new ctid from the update so later updates to the row can continue to use ctid to zero in on the row location? 2# Also, is ctid unique for each row? I would guess so since it looks like (and I'm guessing) that ctid is a page/row offset. So there should be only one thing at each ctid address. Is that correct that ctid is unique to a table? Can anything interesting be done with the empty space? Is there any way to find the maximum ctid and look for quantities of empty space? I assume a user-side program could use that data to see how much unused, yet allocated space there is in a table? That might be the "You don't have to be a PostgreSQL hacker" version of looking for compressible (via relocation) tables. Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22
Joshua b. Jore wrote: > 1# > I saw a post from Jan Wieck about how ctid can be used for a fast update. > I noticed that ctid changes on update (as expected since it's really a new > row). Is there anyway to get the new ctid from the update so later > updates to the row can continue to use ctid to zero in on the row > location? That's one of the details I'm still thinking about. And in the case of a cursor using a junk attribute it gets worse, because there is no easy way to push that new value back into the cursor's result set. But I doubt that this would become a real world problem ever. People who need to update one and the same DB row again and again during the same transaction are spaghetti-code-script- kiddies who know for sure that "a cursor's that thingy on ya screen that ya move withe mouse", so we're pretty safe here :-p > 2# > Also, is ctid unique for each row? Yes, per table. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Mon, 3 Jun 2002, Jan Wieck wrote: > Joshua b. Jore wrote: > > 1# > > I saw a post from Jan Wieck about how ctid can be used for a fast update. > > I noticed that ctid changes on update (as expected since it's really a new > > row). Is there anyway to get the new ctid from the update so later > > updates to the row can continue to use ctid to zero in on the row > > location? > > That's one of the details I'm still thinking about. And in > the case of a cursor using a junk attribute it gets worse, > because there is no easy way to push that new value back into > the cursor's result set. > > But I doubt that this would become a real world problem ever. > People who need to update one and the same DB row again and > again during the same transaction are spaghetti-code-script- > kiddies who know for sure that "a cursor's that thingy on ya > screen that ya move withe mouse", so we're pretty safe here True, some really bad code could be written. In my case I just wanted to do something less than optimal as a short term fix which might mean the same row is updated multiple times. That's a queueing/aggregation cache just waiting to get implemented to get around that (that's in my app, not Postgresql for others who are reading this) ;-). I just figured that if I was going to be dumb and update the same row multiple times it might as well be speedy. I did notice that this puts a wrinkle into MVCC. While MVCC operates normally this method of finding the row makes the conflicting update/delete miss the row and alter nothing. I was only playing with Read Committed but if I started two transactions, updated the row in one, then updated in another it acted normally (the second waited until the first completed). When the first txn committed the second went on and found that the row had disappeared and returned zero rows updated. I might just use that behaviour to detect conflicts (of course, only when possible). Neato keen. Thanks Jan Josh
"Joshua b. Jore" <josh@greentechnologist.org> writes: > I noticed that ctid changes on update (as expected since it's really a new > row). Is there anyway to get the new ctid from the update so later > updates to the row can continue to use ctid to zero in on the row > location? There's a function called something like currtid that takes the CTID of the possibly-obsoleted row and returns the CTID of its latest updated version. I believe this is exported because the ODBC driver uses it, so it's unlikely to go away, even though AFAIR it's not documented anywhere. A risk of using it is that CTID of an updated row cannot be trusted for very long --- once VACUUM has come by, you might find that CTID reassigned to some other row entirely. > Can anything interesting be done with the empty space? Is there any > way to find the maximum ctid and look for quantities of empty space? I don't think CTID gives you any useful hint about the amount of free space available on a page. regards, tom lane
Cool thanks Tom. This is awesome - as long as I don't mind being tied to PostgreSQL then this sort of thing is a nifty thing to make things go faster. Josh For the archives and others who are just reading this (I know I often just follow interesting threads to get ideas) I plan to use the tid two ways: fast access for updates to a row, detecting updates. I'll do something like UPDATE ... WHERE ctid = '(23,22)'::tid to do a really fast update. The number of modified rows will be either one (there is only one row per tid address) or zero (the row was altered since I took the row's tid). My application will notice the change and react accordingly - say get a new copy of the row from the table and see what the user wants to do. This addresses the issue of multiple users sending updates to the same row. Functions from src/backend/utils/adt/tid.c Both functions return the current tid for a row indicated by a tid. From my (very inexpert) reading of the source this might take any previous tid and get the current one. currtid takes the table OID and currtid2 takes the table name currtid(OID,TID) RETURNS TID currtid2(TEXT,TID) RETURNS TID An example: CREATE TABLE a ( j INTEGER ); INSERT INTO a (j) VALUES (0); INSERT INTO a (j) VALUES (1); SELECT tableoid,ctid,oid,* FROM a; tableoid | oid | ctid | i ----------+----------+-------+--- 63993118 | 63993128 | (0,1) | 0 63993118 | 63993129 | (0,2) | 1 /* * ctid starts counting from page zero, row 1 */ UPDATE a SET i = 0 WHERE i = 0; SELECT tableoid,ctid,oid,* FROM a; tableoid | oid | ctid | i ----------+----------+-------+--- 63993118 | 63993129 | (0,2) | 1 63993118 | 63993128 | (0,3) | 0 /* * ctid incremented on an update (since an update is really just a sneaky * sort of insert) */ SELECT currtid2('a'::text, '(0,1)'::tid) currtid2 ---------- (0,3) /* * locate the current row based on the old address */ UPDATE a SET i = 0 WHERE i = 0; SELECT tableoid,ctid,oid,* FROM a; tableoid | oid | ctid | i ----------+----------+-------+--- 63993118 | 63993129 | (0,2) | 1 63993118 | 63993128 | (0,4) | 0 /* * ctid incremented again, as expected */ SELECT currtid2('a'::text, '(0,1)'::tid) currtid2 ---------- (0,4) /* * currtid2 still finds the current row by using the old * tid which is two revisions old. Maybe this just sticks * around until the table is vacuumed. Make sure to not * try to use the tid over any vacuum operation since that * just moves everything around anyway. */ So if I tried to do an UPDATE operation where the row is located by ctid and nothing was updated then either the table was vacuumed or the row was obsoleted by another unknown update. The new ctid is available but it might be better to check with the user and get the new values or something like that. Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22 On Mon, 3 Jun 2002, Tom Lane wrote: > "Joshua b. Jore" <josh@greentechnologist.org> writes: > > I noticed that ctid changes on update (as expected since it's really a new > > row). Is there anyway to get the new ctid from the update so later > > updates to the row can continue to use ctid to zero in on the row > > location? > > There's a function called something like currtid that takes the > CTID of the possibly-obsoleted row and returns the CTID of its latest > updated version. I believe this is exported because the ODBC driver > uses it, so it's unlikely to go away, even though AFAIR it's not > documented anywhere. A risk of using it is that CTID of an updated > row cannot be trusted for very long --- once VACUUM has come by, > you might find that CTID reassigned to some other row entirely. > > > Can anything interesting be done with the empty space? Is there any > > way to find the maximum ctid and look for quantities of empty space? > > I don't think CTID gives you any useful hint about the amount of free > space available on a page. > > regards, tom lane >
Tom Lane wrote: > "Joshua b. Jore" <josh@greentechnologist.org> writes: > > I noticed that ctid changes on update (as expected since it's really a new > > row). Is there anyway to get the new ctid from the update so later > > updates to the row can continue to use ctid to zero in on the row > > location? > > There's a function called something like currtid that takes the > CTID of the possibly-obsoleted row and returns the CTID of its latest > updated version. I believe this is exported because the ODBC driver > uses it, so it's unlikely to go away, even though AFAIR it's not > documented anywhere. A risk of using it is that CTID of an updated > row cannot be trusted for very long --- once VACUUM has come by, > you might find that CTID reassigned to some other row entirely. But it should be safe within the transaction that did the update, right? And since cursors (in PG) cannot span multiple transactions, this would be the last piece I'm looking for to use in UPDATE ... WHERE CURRENT OF. Thanks. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes: > Tom Lane wrote: >> There's a function called something like currtid that takes the >> CTID of the possibly-obsoleted row and returns the CTID of its latest >> updated version. I believe this is exported because the ODBC driver >> uses it, so it's unlikely to go away, even though AFAIR it's not >> documented anywhere. A risk of using it is that CTID of an updated >> row cannot be trusted for very long --- once VACUUM has come by, >> you might find that CTID reassigned to some other row entirely. > But it should be safe within the transaction that did the > update, right? Sure; VACUUM won't risk deleting tuples that were visible as of the start of the oldest open transaction, so anything that you found earlier in the current transaction will surely still be there, even if it's not the latest committed version anymore. I wouldn't trust a CTID older than the current transaction, however. regards, tom lane