Обсуждение: PL/pgSQL bug?
It seems that sometimes uncommitted data (dirty data?) could be seen in PL/pgSQL function. Below is a sample script to reproduce the problem: If you execute "SELECT myftest(1)" concurrently, you will see the subselect in the SELECT INTO... will produce: ERROR: More than one tuple returned by a subselect used as an expression. This is odd, since the coulum i is a primary key, and should never has duplicate values. If you comment out the SELECT INTO... statement, you could see a line something like: NOTICE: ctid (0,5) xmin 645188 xmax 645190 cmin 2 cmax 2 This is odd too, since xmax > 0 or cmax > 0 should never happen with visible tuples, in my understanding. I see these in 7.0.3, 7.1.2 and current. -- Tatsuo Ishii ---------------------------------------------------------------------- DROP TABLE t1; CREATE TABLE t1 (i INT PRIMARY KEY); DROP FUNCTION myftest(INT); CREATE FUNCTION myftest(INT) RETURNS INT AS 'DECLARE myid INT;DECLARE rec RECORD;key ALIAS FOR $1;BEGIN UPDATE t1 SET i = 1 WHERE i = 1; SELECT INTO tid,myid ctid,iFROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); FOR rec IN SELECT ctid,xmin,xmax,cmin,cmax from t1 LOOP RAISENOTICE ''ctid % xmin % xmax % cmin % cmax %'', rec.ctid,rec.xmin,rec.xmax,rec.cmin,rec.cmax; END LOOP; RETURN 0; END;'LANGUAGE'plpgsql';
Tatsuo Ishii wrote: > It seems that sometimes uncommitted data (dirty data?) could be seen > in PL/pgSQL function. > > Below is a sample script to reproduce the problem: If you execute > "SELECT myftest(1)" concurrently, you will see the subselect in the > SELECT INTO... will produce: > > ERROR: More than one tuple returned by a subselect used as an expression. > > This is odd, since the coulum i is a primary key, and should never has > duplicate values. > > If you comment out the SELECT INTO... statement, you could see a line > something like: > > NOTICE: ctid (0,5) xmin 645188 xmax 645190 cmin 2 cmax 2 > > This is odd too, since xmax > 0 or cmax > 0 should never happen with > visible tuples, in my understanding. > > I see these in 7.0.3, 7.1.2 and current. If that's the case, it must be a general problem with SPI that'll apply to any procedural language as well as user defined C function using SPI. When scans or functions are involved, PL/pgSQL uses SPI functionality to evaluate the expression. Jan > -- > Tatsuo Ishii > > ---------------------------------------------------------------------- > DROP TABLE t1; > CREATE TABLE t1 (i INT PRIMARY KEY); > DROP FUNCTION myftest(INT); > CREATE FUNCTION myftest(INT) > RETURNS INT > AS ' > DECLARE myid INT; > DECLARE rec RECORD; > key ALIAS FOR $1; > BEGIN > UPDATE t1 SET i = 1 WHERE i = 1; > SELECT INTO tid,myid ctid,i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); > FOR rec IN SELECT ctid,xmin,xmax,cmin,cmax from t1 LOOP > RAISE NOTICE ''ctid % xmin % xmax % cmin % cmax %'', rec.ctid,rec.xmin,rec.xmax,rec.cmin,rec.cmax; > END LOOP; > RETURN 0; > END; > ' > LANGUAGE 'plpgsql'; > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
I said: > SPI is doing CommandCounterIncrements at bizarre times, and in > particular you get fewer CommandCounterIncrements while planning > and executing a plpgsql function than you do while re-executing > an already-planned one. s/fewer/more/ ... guess I'm not fully awake yet ... but anyway, SPI's handling of CommandCounterIncrement is certainly broken. Particularly for cursors --- a CCI for every FETCH will not do, you want the whole scan to be run with the same commandId. regards, tom lane
Okay, I understand Tatsuo's original complaint, and I don't think it's a bug exactly --- it's MVCC/Read Committed operating as designed. Using the variant script I just posted and two *freshly started* backends, do: Backend 1: regression=# begin; BEGIN regression=# SELECT myftest(1); NOTICE: i 1 ctid (0,42) xmin 5701 xmax 0 cmin 3 cmax 0 NOTICE: i 2 ctid (0,43) xmin 5701 xmax 0 cmin 5 cmax 0myftest --------- 0 (1 row) Backend 2: regression=# SELECT myftest(1); [ backend 2 hangs; now go back and commit backend 1 ] NOTICE: i 1 ctid (0,40) xmin 5696 xmax 5701 cmin 1 cmax 3 NOTICE: i 1 ctid (0,44) xmin 5702 xmax 0 cmin 2 cmax 0 NOTICE: i 2 ctid (0,45) xmin 5702 xmax 0 cmin 4 cmax 0 NOTICE: Error occurred while executing PL/pgSQL function myftest NOTICE: line 10 at select into variables ERROR: More than one tuple returned by a subselect used as an expression. regression=# The second backend finds that it wants to update the same row backend 1 did, so it waits to see if 1 commits. After the commit, it decides it can do the update. Now, what will we see later in that same transaction? SELECT will consider the original row (ctid 40, here) to be still good --- it was deleted, sure enough, but by a transaction that has not committed as far as the current transaction is concerned. And the row inserted earlier in our own transaction is good too. So you see two rows with i=1. The only way to avoid this is to use Serializable mode, which would mean that backend 2 would've gotten an error on its UPDATE. However, if you do the same experiment a second time in the same backends, you get different results. This I think is a SPI bug: SPI is doing CommandCounterIncrements at bizarre times, and in particular you get fewer CommandCounterIncrements while planning and executing a plpgsql function than you do while re-executing an already-planned one. Not sure yet exactly how it should be changed. regards, tom lane
I said: > Not necessarily. It looks to me like someone is forgetting to do a > CommandCounterIncrement() between plpgsql statements. It's worse than that: someone is caching an out-of-date command counter value. Load the attached variant of Tatsuo's script, and then do this: regression=# SELECT myftest(1); NOTICE: i 1 ctid (0,30) xmin 5687 xmax 0 cmin 2 cmax 0 NOTICE: i 2 ctid (0,31) xmin 5687 xmax 0 cmin 4 cmax 0myftest --------- 0 (1 row) regression=# SELECT myftest(1); NOTICE: i 1 ctid (0,32) xmin 5688 xmax 0 cmin 1 cmax 0myftest --------- 0 (1 row) regression=# Neat eh? What happened to the i=2 line? If you start a fresh backend, the first execution of the function works. regards, tom lane DROP TABLE t1; CREATE TABLE t1 (i INT PRIMARY KEY); insert into t1 values(1); DROP FUNCTION myftest(INT); CREATE FUNCTION myftest(INT) RETURNS INT AS 'DECLARE myid INT;DECLARE rec RECORD;key ALIAS FOR $1;BEGIN UPDATE t1 SET i = 1 WHERE i = 1;INSERT INTO t1 VALUES (2); FOR rec IN SELECT i,ctid,xmin,xmax,cmin,cmax from t1 LOOP RAISE NOTICE ''i % ctid % xmin % xmax % cmin % cmax %'',rec.i,rec.ctid,rec.xmin,rec.xmax,rec.cmin,rec.cmax; END LOOP; SELECT INTO myid i FROM t1 WHERE i = (SELECT i FROMt1 WHERE i = 1);DELETE FROM t1 WHERE i = 2; RETURN 0; END;'LANGUAGE 'plpgsql';
Jan Wieck <JanWieck@Yahoo.com> writes: > If that's the case, it must be a general problem with SPI > that'll apply to any procedural language as well as user > defined C function using SPI. Not necessarily. It looks to me like someone is forgetting to do a CommandCounterIncrement() between plpgsql statements. Is this something that plpgsql should do, or should SPI do it? Not clear. regards, tom lane
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > NOTICE: ctid (0,5) xmin 645188 xmax 645190 cmin 2 cmax 2 > This is odd too, since xmax > 0 or cmax > 0 should never happen with > visible tuples, in my understanding. That's what the docs presently say, but they're in error --- nonzero xmax could represent a not-yet-committed deleting xact (or one that did commit, but not in your snapshot); or it could be from a deleting xact that rolled back. I get regression=# SELECT myftest(1); NOTICE: Error occurred while executing PL/pgSQL function myftest NOTICE: line 6 at SQL statement ERROR: parser: parse error at or near "ctid" regression=# so there's something wrong with the function as posted. regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > NOTICE: ctid (0,5) xmin 645188 xmax 645190 cmin 2 cmax 2 > > This is odd too, since xmax > 0 or cmax > 0 should never happen with > > visible tuples, in my understanding. > > That's what the docs presently say, but they're in error --- nonzero > xmax could represent a not-yet-committed deleting xact (or one that > did commit, but not in your snapshot); or it could be from a deleting > xact that rolled back. > > I get > > regression=# SELECT myftest(1); > NOTICE: Error occurred while executing PL/pgSQL function myftest > NOTICE: line 6 at SQL statement > ERROR: parser: parse error at or near "ctid" > regression=# > > so there's something wrong with the function as posted. > > regards, tom lane > Sorry, please try new one attatched below. DROP TABLE t1; CREATE TABLE t1 (i INT PRIMARY KEY); INSERT INTO t1 VALUES(1); DROP FUNCTION myftest(INT); CREATE FUNCTION myftest(INT) RETURNS INT AS 'DECLARE myid INT;DECLARE rec RECORD;key ALIAS FOR $1;BEGIN UPDATE t1 SET i = 1 WHERE i = 1; SELECT INTO myid i FROMt1 WHERE i = (SELECT i FROM t1 WHERE i = 1); FOR rec IN SELECT ctid,xmin,xmax,cmin,cmax from t1 LOOP RAISE NOTICE''ctid % xmin % xmax % cmin % cmax %'', rec.ctid,rec.xmin,rec.xmax,rec.cmin,rec.cmax; END LOOP; RETURN 0; END;'LANGUAGE'plpgsql';
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > If your theory is like that, I could see same effect without using > PL/pgSQL. But I see following in a session using psql (original row's > ctid = (0,2)) > [T1] begin; > [T2] begin; > [T1] update t1 set i = 1 where i = 1; > [T2] update t1 set i = 1 where i = 1; <-- waiting for T1 committed/aborted > [T1] end; > [T2] select ctid, i from t1; > test=# select ctid,i from t1; > ctid | i > -------+--- > (0,4) | 1 > (1 row) > So I only see one row from the last select in T2? I believe the reason for this is that in Read Committed mode, each separate query from the client computes a new snapshot (see SetQuerySnapshot calls in postgres.c). So, when your "select ctid, i from t1" query executes, it computes a snapshot that says T1 is committed, and then it doesn't see the row left over from T1. On the other hand, your plpgsql function operates inside a single client query and so it's using just one QuerySnaphot. One way to make the results equivalent is to compute a new QuerySnapshot for each SPI query. Quite aside from the cost of doing so, I do not think it makes sense, considering that the previous QuerySnapshot must be restored when we return from the function. Do we really want functions to see transaction status different from what's seen outside the function call? I doubt it. The other way to make the results the same is to omit the SetQuerySnapshot calls for successive client-issued queries in one transaction. This could perhaps be defended on logical grounds, but considering your complaint I'm not sure it would make people happier. regards, tom lane
> Okay, I understand Tatsuo's original complaint, and I don't think it's > a bug exactly --- it's MVCC/Read Committed operating as designed. Using > the variant script I just posted and two *freshly started* backends, do: I don't think so. Seems the problem is specific to PL/pgSQL (or SPI?). > The second backend finds that it wants to update the same row backend 1 > did, so it waits to see if 1 commits. After the commit, it decides it > can do the update. Now, what will we see later in that same > transaction? SELECT will consider the original row (ctid 40, here) > to be still good --- it was deleted, sure enough, but by a transaction > that has not committed as far as the current transaction is concerned. > And the row inserted earlier in our own transaction is good too. So > you see two rows with i=1. The only way to avoid this is to use > Serializable mode, which would mean that backend 2 would've gotten an > error on its UPDATE. If your theory is like that, I could see same effect without using PL/pgSQL. But I see following in a session using psql (original row's ctid = (0,2)) [T1] begin; [T2] begin; [T1] update t1 set i = 1 where i = 1; [T2] update t1 set i = 1 where i = 1; <-- waiting for T1 committed/aborted [T1] end; [T2] select ctid, i from t1; test=# select ctid,i from t1;ctid | i -------+---(0,4) | 1 (1 row) So I only see one row from the last select in T2? -- Tatsuo Ishii
> I believe the reason for this is that in Read Committed mode, > each separate query from the client computes a new snapshot (see > SetQuerySnapshot calls in postgres.c). So, when your > "select ctid, i from t1" query executes, it computes a snapshot > that says T1 is committed, and then it doesn't see the row left > over from T1. On the other hand, your plpgsql function operates > inside a single client query and so it's using just one QuerySnaphot. Oh I see. So the "problem" is not specific to PL/pgSQL, but exists in all our procedual languages. > One way to make the results equivalent is to compute a new QuerySnapshot > for each SPI query. Quite aside from the cost of doing so, I do not > think it makes sense, considering that the previous QuerySnapshot must > be restored when we return from the function. Do we really want > functions to see transaction status different from what's seen outside > the function call? I doubt it. > > The other way to make the results the same is to omit the > SetQuerySnapshot calls for successive client-issued queries in one > transaction. This could perhaps be defended on logical grounds, > but considering your complaint I'm not sure it would make people > happier. Ok, maybe another workaround might be adding a checking for cmax in the subselect: SELECT INTO myid i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); to make sure that cmax > 0? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Ok, maybe another workaround might be adding a checking for cmax in > the subselect: > SELECT INTO myid i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); > to make sure that cmax > 0? Huh? How would that help? regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Ok, maybe another workaround might be adding a checking for cmax in > > the subselect: > > > SELECT INTO myid i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); > > > to make sure that cmax > 0? > > Huh? How would that help? According to the doc, tuples with cmax > 0 should not be visible to the current transaction, no? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > According to the doc, tuples with cmax > 0 should not be visible to > the current transaction, no? The docs are wrong --- my mistake originally, and in fact I just fixed it in current sources. cmax != 0 only indicates that someone tried to delete the tuple; not that the someone ever committed, much less that their commit should be visible to you under MVCC rules. (Also, I believe the command counter starts at 0, so this test would only catch deletes that weren't the first command in their transaction, anyway. Testing xmax != 0 would avoid that issue, but not the fundamental problem of commit status.) regards, tom lane
> -----Original Message----- > From: Tom Lane > > I believe the reason for this is that in Read Committed mode, > each separate query from the client computes a new snapshot (see > SetQuerySnapshot calls in postgres.c). So, when your > "select ctid, i from t1" query executes, it computes a snapshot > that says T1 is committed, and then it doesn't see the row left > over from T1. On the other hand, your plpgsql function operates > inside a single client query and so it's using just one QuerySnaphot. > > One way to make the results equivalent is to compute a new QuerySnapshot > for each SPI query. Quite aside from the cost of doing so, I do not > think it makes sense, considering that the previous QuerySnapshot must > be restored when we return from the function. Do we really want > functions to see transaction status different from what's seen outside > the function call? Yes I do. > I doubt it. > > The other way to make the results the same is to omit the > SetQuerySnapshot calls for successive client-issued queries in one > transaction. What's different from SERIALIZABLE mode ? regards, Hiroshi Inoue
> > One way to make the results equivalent is to compute a new QuerySnapshot > > for each SPI query. Quite aside from the cost of doing so, I do not > > think it makes sense, considering that the previous QuerySnapshot must > > be restored when we return from the function. Do we really want > > functions to see transaction status different from what's seen outside > > the function call? > > Yes I do. Me too. Current behavior of procedural languages seem hard to understand for users. BTW, why must we restore the previous QuerySnapshot? We already break the rule (if it's a rule). For example, COPY TO calls SetQuerySnapshot (see tcop/utility.c). So, below produces "ERROR: More than one tuple returned by a subselect used as an expression": DROP TABLE t1; CREATE TABLE t1 (i INT PRIMARY KEY); DROP FUNCTION myftest(INT); CREATE FUNCTION myftest(INT) RETURNS INT AS ' UPDATE t1 SET i = 1 WHERE i = 1; SELECT i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1);'LANGUAGE 'sql'; while below does not throw an error: DROP TABLE t1; CREATE TABLE t1 (i INT PRIMARY KEY); DROP FUNCTION myftest(INT); CREATE FUNCTION myftest(INT) RETURNS INT AS ' UPDATE t1 SET i = 1 WHERE i = 1; COPY t1 TO ''/tmp/t1.data''; SELECT i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i =1);'LANGUAGE 'sql';
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > BTW, why must we restore the previous QuerySnapshot? We already break > the rule (if it's a rule). No we don't. There are no SetQuerySnapshot calls occuring *within* a query. An example of why that would be unacceptable: consider select myfunc(f1) from table where f2 = 42; Suppose executing myfunc() causes an unrestored SetQuerySnapshot call. Then, if other transactions are busy changing f2 values, the set of rows that this query returns could depend on the order in which rows are visited --- since whether it thinks a row with f2 = 42 is visible might depend on whether any previous rows had been matched (causing myfunc() to be evaluated). For that matter, it could depend on the form of the query plan used --- in some plans, myfunc() might be called while the scan is in progress, in others not till afterward. > For example, COPY TO calls SetQuerySnapshot > (see tcop/utility.c). That's just because postgres.c doesn't do it automatically for utility statements. There's still just one SetQuerySnapshot per query. regards, tom lane
> No we don't. There are no SetQuerySnapshot calls occuring *within* > a query. An example of why that would be unacceptable: consider > > select myfunc(f1) from table where f2 = 42; > > Suppose executing myfunc() causes an unrestored SetQuerySnapshot call. > Then, if other transactions are busy changing f2 values, the set of > rows that this query returns could depend on the order in which rows > are visited --- since whether it thinks a row with f2 = 42 is visible > might depend on whether any previous rows had been matched (causing > myfunc() to be evaluated). For that matter, it could depend on the > form of the query plan used --- in some plans, myfunc() might be called > while the scan is in progress, in others not till afterward. If so, FROM clause-less SELECT (select myfunc();) might be ok. > > For example, COPY TO calls SetQuerySnapshot > > (see tcop/utility.c). > > That's just because postgres.c doesn't do it automatically for utility > statements. There's still just one SetQuerySnapshot per query. I'm confused. In my example: CREATE FUNCTION myftest(INT) RETURNS INT AS ' UPDATE t1 SET i = 1 WHERE i = 1; COPY t1 TO ''/tmp/t1.data''; SELECT i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i =1);'LANGUAGE 'sql'; When COPY is invoked in the function, I thought SetQuerySnapshot is called. So "SELECT myftest(1);" would call SetQuerySnapshot twice, no? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > CREATE FUNCTION myftest(INT) > RETURNS INT > AS ' > UPDATE t1 SET i = 1 WHERE i = 1; > COPY t1 TO ''/tmp/t1.data''; > SELECT i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); > ' > LANGUAGE 'sql'; > When COPY is invoked in the function, I thought SetQuerySnapshot is > called. Hmm, I think you are right. This means that calling SetQuerySnapshot in ProcessUtility is the *wrong* place to do it; or that there should be additional logic to suppress the call in this context. IMHO, anyway. regards, tom lane
Tatsuo Ishii wrote: > > > > One way to make the results equivalent is to compute a new QuerySnapshot > > > for each SPI query. Quite aside from the cost of doing so, I do not > > > think it makes sense, considering that the previous QuerySnapshot must > > > be restored when we return from the function. Do we really want > > > functions to see transaction status different from what's seen outside > > > the function call? > > > > Yes I do. > > Me too. Current behavior of procedural languages seem hard to > understand for users. > Yes it's a siginificant point. I've referred to the impropriety to use a unique snapshot thoughout a function call when this kind of bug(?) was reported. Who could take care of it in writing PL/pgSQL ? > BTW, why must we restore the previous QuerySnapshot? For example, in the case such as select .., some_func(item1), .. from a_table; SELECT always uses the same snapshot for all its internal fetch operations, so it seems reasonable for each some_func() to be called in the same snapshot. It's possible for a function to use a unique snapshot if there are only SELECT statements in the function but it's impossible if there are UPDATE/DELETE or SELECT .. FOR UPDATE statements etc.. We should be careful to handle such functions which have side effects. IMHO we shouldn't call such functions or shouldn't expect consistent results with the use of such funtions. OTOH select some_func(..); is a procedure call not a function call in reality. There seems to be no necessity to restore the previous QuerySnapshot when calling procedures and we could call any function as a procedure. regards, Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > It's possible for a function to use a unique snapshot > if there are only SELECT statements in the function > but it's impossible if there are UPDATE/DELETE or > SELECT .. FOR UPDATE statements etc. You are confusing snapshots (which determine visibility of the results of OTHER transactions) with command-counter incrementing (which determines visibility of the results of OUR OWN transaction). I agree that plpgsql's handling of command-counter changes is broken, but it does not follow that sprinkling the code with SetQuerySnapshot is wise. regards, tom lane
Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > It's possible for a function to use a unique snapshot > > if there are only SELECT statements in the function > > but it's impossible if there are UPDATE/DELETE or > > SELECT .. FOR UPDATE statements etc. > > You are confusing snapshots (which determine visibility of the results > of OTHER transactions) with command-counter incrementing (which > determines visibility of the results of OUR OWN transaction). I agree > that plpgsql's handling of command-counter changes is broken, but it > does not follow that sprinkling the code with SetQuerySnapshot is wise. Why do you blame PL/pgSQL for that? I don't see a single reference to the command counter from the PL/pgSQL sources. All it does is using SPI. So does "using SPI" by itself count as "boken"? If so, uh-oh, referential integrity is using SPI ... Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck <JanWieck@yahoo.com> writes: >> that plpgsql's handling of command-counter changes is broken, but it >> does not follow that sprinkling the code with SetQuerySnapshot is wise. > Why do you blame PL/pgSQL for that? I don't see a single > reference to the command counter from the PL/pgSQL sources. > All it does is using SPI. So does "using SPI" by itself count > as "boken"? Sorry: SPI is broken, not plpgsql. Does that make you feel better? regards, tom lane
Tom Lane wrote: > Jan Wieck <JanWieck@yahoo.com> writes: > >> that plpgsql's handling of command-counter changes is broken, but it > >> does not follow that sprinkling the code with SetQuerySnapshot is wise. > > > Why do you blame PL/pgSQL for that? I don't see a single > > reference to the command counter from the PL/pgSQL sources. > > All it does is using SPI. So does "using SPI" by itself count > > as "boken"? > > Sorry: SPI is broken, not plpgsql. Does that make you feel better? Not that it "makes my day". But it makes me feel better, thanks. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > It's possible for a function to use a unique snapshot > > if there are only SELECT statements in the function > > but it's impossible if there are UPDATE/DELETE or > > SELECT .. FOR UPDATE statements etc. > > You are confusing No. > snapshots (which determine visibility of the results > of OTHER transactions) Yes. > with command-counter incrementing (which > determines visibility of the results of OUR OWN transaction). Yes. > I agree > that plpgsql's handling of command-counter changes is broken, Probably yes but > but it > does not follow that sprinkling the code with SetQuerySnapshot is wise. > Should both command counter and snapshots be changed properly ? Please explain me why/how we could do with no snapshot change in read committed mode. regards, Hiroshi Inoue
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > It's possible for a function to use a unique snapshot > > if there are only SELECT statements in the function > > but it's impossible if there are UPDATE/DELETE or > > SELECT .. FOR UPDATE statements etc. > > You are confusing No. > snapshots (which determine visibility of the results > of OTHER transactions) Yes. > with command-counter incrementing (which > determines visibility of the results of OUR OWN transaction). Yes. > I agree > that plpgsql's handling of command-counter changes is broken, Probably yes but > but it > does not follow that sprinkling the code with SetQuerySnapshot is wise. > Should both command counter and snapshots be changed properly ? Please explain me why/how we could do with no snapshot change in read committed mode. regards, Hiroshi Inoue
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > It's possible for a function to use a unique snapshot > > if there are only SELECT statements in the function > > but it's impossible if there are UPDATE/DELETE or > > SELECT .. FOR UPDATE statements etc. > > You are confusing snapshots (which determine visibility of the results > of OTHER transactions) Please note that the meaning of snapshots for statements other than SELECT is different from that for SELECT. For example, 1) The result of SELECT .. FOR UPDATE may be different from that of SELECT for the same snapshot. 2) Once a snapshot given, the result of SELECT is dicisive but that of SELECT .. FOR UPDATE isn't. SELECT and SELECT .. FOR UPDATE are alike in appearance but quite different in nature. There's no real snapshot for SELECT .. FOR UPDATE in the current implementation. I suggested the implementation with the real snapshot (without the word snapshot though) once before 6.5. The implementation seems hard and the possibility isn't confirmed. Even though the implementation is possible, it requires the repeated computation of snapshot until the consisteny is satisfied, and so arbitrary snapshots aren't allowed. There's little meaning for SELECT statements and subsequent other statements like UPDATE/DELETE/SELECT .. FOR UPDATE to use the same snapshot in read committed mode. There's no consistency with the current handling of snapshots inside a function call. regards, Hiroshi Inoue
Tom Lane wrote: > > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > NOTICE: ctid (0,5) xmin 645188 xmax 645190 cmin 2 cmax 2 > > This is odd too, since xmax > 0 or cmax > 0 should never happen with > > visible tuples, in my understanding. > > That's what the docs presently say, but they're in error --- nonzero > xmax could represent a not-yet-committed deleting xact (or one that > did commit, but not in your snapshot); or it could be from a deleting > xact that rolled back. or it can come from referential integrity triggers: hannu=# create table parent(parid integer primary key); NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'parent_pkey' for table 'parent' CREATE hannu=# create table child(cldid integer references parent on update cascade); NOTICE: CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s) CREATE hannu=# insert into parent values(1); INSERT 20652 1 hannu=# insert into child values(1); INSERT 20653 1 hannu=# update parent set parid=2; UPDATE 1 hannu=# select xmin,xmax,cmin,cmax,parid from parent;xmin | xmax | cmin | cmax | parid ------+------+------+------+------- 731 | 731 | 0 | 4 | 2 (1 row) Now I have a question: if xmax is not used in determining tuple visibility (as I had assumed earlier) then what is ? How does postgres decide that a tuple is deleted ? -------------------- Hannu
Tom Lane wrote: > > Hannu Krosing <hannu@tm.ee> writes: > > Tom Lane wrote: > >> That's what the docs presently say, but they're in error --- nonzero > >> xmax could represent a not-yet-committed deleting xact (or one that > >> did commit, but not in your snapshot); or it could be from a deleting > >> xact that rolled back. > > > or it can come from referential integrity triggers: > > Mmm, yeah, SELECT FOR UPDATE uses xmax to record the identity of a > transaction that has a row locked for update. In this case the xact > hasn't actually deleted the old row yet (and may never do so), but xmax > is set as though it has. > > > Now I have a question: if xmax is not used in determining tuple > > visibility (as I had assumed earlier) then what is ? > > There are additional status bits in each tuple (t_infomask) that > distinguish these various situations. The xmax field alone doesn't > tell you much, since you can't interpret it without context. As I understood it it should tell the trx id that invalidated this tuple, no ? If you must write t_infomask in the tuple anyhow, then why not clean up xmax on abort ? > I'm not sure why we bother to make xmin/xmax/etc visible to > applications. They're really of no value to an app AFAICS. > I guess they used to be of value at the time when time travel was possible and people did use xmax for documented purposes, i.e. recording tuple's lifetime and not for "other" stuff, especially without cleaning up after trx abort ;) I agree that they are losing their utility as we are moving away from the original notion of transaction ids (and oids) as something permanent that could be used for time travel or system auditing and recommending peole who need such features to reimplement those at application level, with triggers and explicitly defined fields. ------------------ Hannu
Hannu Krosing <hannu@tm.ee> writes: > Tom Lane wrote: >> That's what the docs presently say, but they're in error --- nonzero >> xmax could represent a not-yet-committed deleting xact (or one that >> did commit, but not in your snapshot); or it could be from a deleting >> xact that rolled back. > or it can come from referential integrity triggers: Mmm, yeah, SELECT FOR UPDATE uses xmax to record the identity of a transaction that has a row locked for update. In this case the xact hasn't actually deleted the old row yet (and may never do so), but xmax is set as though it has. > Now I have a question: if xmax is not used in determining tuple > visibility (as I had assumed earlier) then what is ? There are additional status bits in each tuple (t_infomask) that distinguish these various situations. The xmax field alone doesn't tell you much, since you can't interpret it without context. I'm not sure why we bother to make xmin/xmax/etc visible to applications. They're really of no value to an app AFAICS. regards, tom lane
Is this something that still needs fixing? > > I believe the reason for this is that in Read Committed mode, > > each separate query from the client computes a new snapshot (see > > SetQuerySnapshot calls in postgres.c). So, when your > > "select ctid, i from t1" query executes, it computes a snapshot > > that says T1 is committed, and then it doesn't see the row left > > over from T1. On the other hand, your plpgsql function operates > > inside a single client query and so it's using just one QuerySnaphot. > > Oh I see. So the "problem" is not specific to PL/pgSQL, but exists in > all our procedual languages. > > > One way to make the results equivalent is to compute a new QuerySnapshot > > for each SPI query. Quite aside from the cost of doing so, I do not > > think it makes sense, considering that the previous QuerySnapshot must > > be restored when we return from the function. Do we really want > > functions to see transaction status different from what's seen outside > > the function call? I doubt it. > > > > The other way to make the results the same is to omit the > > SetQuerySnapshot calls for successive client-issued queries in one > > transaction. This could perhaps be defended on logical grounds, > > but considering your complaint I'm not sure it would make people > > happier. > > Ok, maybe another workaround might be adding a checking for cmax in > the subselect: > > SELECT INTO myid i FROM t1 WHERE i = (SELECT i FROM t1 WHERE i = 1); > > to make sure that cmax > 0? > -- > Tatsuo Ishii > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I am not sure if there is a TODO item here, but if there is, please let me know. Thanks. > > -----Original Message----- > > From: Tom Lane > > > > I believe the reason for this is that in Read Committed mode, > > each separate query from the client computes a new snapshot (see > > SetQuerySnapshot calls in postgres.c). So, when your > > "select ctid, i from t1" query executes, it computes a snapshot > > that says T1 is committed, and then it doesn't see the row left > > over from T1. On the other hand, your plpgsql function operates > > inside a single client query and so it's using just one QuerySnaphot. > > > > One way to make the results equivalent is to compute a new QuerySnapshot > > for each SPI query. Quite aside from the cost of doing so, I do not > > think it makes sense, considering that the previous QuerySnapshot must > > be restored when we return from the function. Do we really want > > functions to see transaction status different from what's seen outside > > the function call? > > Yes I do. > > > I doubt it. > > > > The other way to make the results the same is to omit the > > SetQuerySnapshot calls for successive client-issued queries in one > > transaction. > > What's different from SERIALIZABLE mode ? > > regards, > Hiroshi Inoue > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > I am not sure if there is a TODO item here, but if there is, please let > me know. Thanks. There seems to be no consensus on this item currently. IMHO both the command counters and the snapshots in a function should advance except the leading SELECT statements. Note that SELECT .. FOR UPDATE statements aren't SELECT statements. regards, Hiroshi Inoue > > > > > -----Original Message----- > > > From: Tom Lane > > > > > > I believe the reason for this is that in Read Committed mode, > > > each separate query from the client computes a new snapshot (see > > > SetQuerySnapshot calls in postgres.c). So, when your > > > "select ctid, i from t1" query executes, it computes a snapshot > > > that says T1 is committed, and then it doesn't see the row left > > > over from T1. On the other hand, your plpgsql function operates > > > inside a single client query and so it's using just one QuerySnaphot. > > > > > > One way to make the results equivalent is to compute a new > QuerySnapshot > > > for each SPI query. Quite aside from the cost of doing so, I do not > > > think it makes sense, considering that the previous QuerySnapshot must > > > be restored when we return from the function. Do we really want > > > functions to see transaction status different from what's seen outside > > > the function call? > > > > Yes I do. > > > > > I doubt it. > > > > > > The other way to make the results the same is to omit the > > > SetQuerySnapshot calls for successive client-issued queries in one > > > transaction. > > > > What's different from SERIALIZABLE mode ? > > > > regards, > > Hiroshi Inoue > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 >