Обсуждение: Strange RETURN NEXT behaviour in Postgres 8.0
Hi All I have a quite strange problem with RETURN NEXT statement. I have a big table with 500 millions of rows running on Postgres 8.0. Table "public.usno"Column | Type | Modifiers --------+--------+-----------ra | real | dec | real | bmag | real | rmag | real | ipix | bigint | errbox| box | Indexes: "box_ind" rtree (errbox) "ipix_ind" btree (ipix) "radec_ind" btree (ra, "dec") I actually wrote some procedures in PL/SQL using dynamical queries, and once I obtained the following error. ERROR: wrong record type supplied in RETURN NEXT CONTEXT: PL/pgSQL function "yyy" line 8 at return next To solve the problem, I used just the following simple PL/SQL functions, and a query "select * from yyy()" CREATE OR REPLACE FUNCTION xxx(refcursor) RETURNS refcursor AS ' DECLARE query varchar; BEGIN query = ''SELECT * FROM usno''; OPEN $1 FOR EXECUTE query; RETURN $1; END; ' LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION yyy() RETURNS SETOF usno AS ' DECLARE rec record; DECLARE cur refcursor; BEGIN cur=xxx(''curs_name''); LOOP FETCH cur into rec; RETURN NEXT rec; EXIT WHEN NOT FOUND; END LOOP; RETURN; END; ' LANGUAGE plpgsql; I was quite surprised by this errors, because I have tried the same functions on rather same (but smaller table) on Postgres 7.4.6 on my laptop without any problems. For debugging purposes, I just have created by hand on Postgres 8.0 machine the small table "q3c" with just two rows, but same structure as usno table. Table "public.q3c"Column | Type | Modifiers --------+--------+-----------ra | real | dec | real | bmag | real | rmag | real | ipix | bigint | errbox| box | Indexes: "box_ind1" rtree (errbox) "ipix_ind1" btree (ipix) "radec_ind1" btree (ra, "dec") And, after replacing "usno"->"q3c" in the xxx() and yyy(), the query "select * from yyy()" worked without problems!!! So, how can it be, that my PL/SQL functions works fine on one(smaller) table, but fails on another(bigger) table. Thanks in advance for any ideas. Sergey PS I have tried my code replacing the declaration "rec record;" by "rec TABLE_NAME%ROWTYPE", and it worked for both (big and small table), but I don't understand, why it doesn't work with the type "record". ------------------------------------------------------------ Sergey E. Koposov Sternberg Astronomical Institute, Moscow University (Russia) Max-Planck Institute for Astronomy (Germany) Internet: math@sai.msu.ru, http://lnfm1.sai.msu.su/~math/
"Sergey E. Koposov" <math@sai.msu.ru> writes: > LOOP > FETCH cur into rec; > RETURN NEXT rec; > EXIT WHEN NOT FOUND; > END LOOP; > RETURN; Don't you think you should have the EXIT *above* the RETURN NEXT? I would expect this to emit a bogus row of nulls after the last row returned by the cursor. (At least that's what I get with current sources. Pre-8.0 it might return the last row twice.) Running it on a 500-million-row table would quite possibly run out of memory or disk space, too, because RETURN NEXT accumulates all the results before the function is actually exited. regards, tom lane
"Sergey E. Koposov" <math@sai.msu.ru> writes: > Concerning to the exact form of my functions (using cursors, but still > collecting all the data in the memory). As I understand this is the only one > way (or just the simplest way ???) > to execute fully dynamic queries returned by C function in PL/SQL. > For the real functions which I use, instead of > query = ''SELECT * FROM usno''; > I have > query = my_C_function(some_args); Oh? I'd make a small side bet that the underlying error is in your C function --- possibly it's tromping on some data structure and the damage doesn't have an effect till later. If you can demonstrate the problem without using any custom C functions then I'd be interested to see a test case. regards, tom lane
> > For the real functions which I use, instead of > > > query = ''SELECT * FROM usno''; > > > I have > > > query = my_C_function(some_args); > > Oh? I'd make a small side bet that the underlying error is in your C > function --- possibly it's tromping on some data structure and the > damage doesn't have an effect till later. If you can demonstrate the > problem without using any custom C functions then I'd be interested to > see a test case. I want to clarify, that I have a problem even without my C functions!! And show the full exact(but long) test case, which I performed just now specially. I begin from table usno with 500 millions records wsdb=# \d usno Table "public.usno"Column | Type | Modifiers --------+--------+-----------ra | real | dec | real | bmag | real | rmag | real | ipix | bigint | errbox| box | Indexes: "box_ind" rtree (errbox) "ipix_ind" btree (ipix) "radec_ind" btree (ra, "dec") The declaration of the functions: CREATE OR REPLACE FUNCTION xxx(refcursor) RETURNS refcursor AS ' DECLARE query varchar; BEGIN query = ''SELECT * FROM usno''; OPEN $1 FOR EXECUTE query; RETURN $1; END; ' LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION yyy() RETURNS SETOF usno AS ' DECLARE rec record; DECLARE cur refcursor; BEGIN cur=xxx(''curs_name''); LOOP FETCH cur into rec; EXIT WHEN NOT FOUND; RETURN NEXT rec; END LOOP; RETURN; END; ' LANGUAGE plpgsql; wsdb=# \i q3c.sql CREATE FUNCTION CREATE FUNCTION wsdb=# select * from yyy(); ERROR: wrong record type supplied in RETURN NEXT CONTEXT: PL/pgSQL function "yyy" line 10 at return next We see the error ############################################# Now with q3c table instead of unso wsdb=# \d q3c Table "public.q3c"Column | Type | Modifiers --------+--------+-----------ra | real | dec | real | bmag | real | rmag | real | ipix | bigint | errbox| box | Indexes: "box_ind1" rtree (errbox) "ipix_ind1" btree (ipix) "radec_ind1" btree (ra, "dec") That table is not empty but filled by random numbers wsdb=# select * from q3c;ra | dec | bmag | rmag | ipix | errbox ----+-----+------+------+------+------------- 3 | 3 | 4 | 5 | 55 | (5,6),(3,4) 4 | 5 | 6 | 5 | 33 | (3,4),(1,2) (2 rows) Now the changed functions (notice, the only difference is replacing all occurencies of "usno" to "q3c") CREATE OR REPLACE FUNCTION xxx(refcursor) RETURNS refcursor AS ' DECLARE query varchar; BEGIN query = ''SELECT * FROM q3c''; OPEN $1 FOR EXECUTE query; RETURN $1; END; ' LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION yyy() RETURNS SETOF q3c AS ' DECLARE rec record; DECLARE cur refcursor; BEGIN cur=xxx(''curs_name''); LOOP FETCH cur into rec; EXIT WHEN NOT FOUND; RETURN NEXT rec; END LOOP; RETURN; END; ' LANGUAGE plpgsql; wsdb=# drop FUNCTION yyy(); DROP FUNCTION wsdb=# \i q3c.sql CREATE FUNCTION CREATE FUNCTION wsdb=# select * from yyy();ra | dec | bmag | rmag | ipix | errbox ----+-----+------+------+------+------------- 3 | 3 | 4 | 5 | 55 | (5,6),(3,4) 4 | 5 | 6 | 5 | 33 | (3,4),(1,2) (2 rows) We don't see the error. But the only change was the change from one big table to a smaller one with the precisely same structure. ###########################################
Sergey E. Koposov wrote: >>>For the real functions which I use, instead of >> >>>query = ''SELECT * FROM usno''; >> >>>I have >> >>>query = my_C_function(some_args); >> >>Oh? I'd make a small side bet that the underlying error is in your C >>function --- possibly it's tromping on some data structure and the >>damage doesn't have an effect till later. If you can demonstrate the >>problem without using any custom C functions then I'd be interested to >>see a test case. > > > I want to clarify, that I have a problem even without my C functions!! > > And show the full exact(but long) test case, which I performed just now > specially. > > I begin from table usno with 500 millions records > > wsdb=# \d usno > Table "public.usno" > Column | Type | Modifiers > --------+--------+----------- > ra | real | > dec | real | > bmag | real | > rmag | real | > ipix | bigint | > errbox | box | > Indexes: > "box_ind" rtree (errbox) > "ipix_ind" btree (ipix) > "radec_ind" btree (ra, "dec") This is just a shot in the dark, but I don't suppose you've dropped or modified any columns in "usno" have you? I seem to remember some subtle problems with dropped columns and plpgsql functions - could be one of those still left. It'd look like tablesize was the problem because of course no-one's got time to test with 500 million test rows. -- Richard Huxton Archonet Ltd
On Wed, 16 Feb 2005, Richard Huxton wrote: > This is just a shot in the dark, but I don't suppose you've dropped or > modified any columns in "usno" have you? > > I seem to remember some subtle problems with dropped columns and plpgsql > functions - could be one of those still left. It'd look like tablesize > was the problem because of course no-one's got time to test with 500 > million test rows. 1) I have static tables. I dont modify them! 2) My test table is q3c (with 2 rows), and the table with 500 millions of rows is not test table, it is the table with data :-) > -- > Richard Huxton > Archonet Ltd >
"Sergey E. Koposov" <math@sai.msu.ru> writes: > But concerning to the added columns, I can say that before the column errbox > didn't, and I revealed the problems began after adding it (together with > index creation on that column). Does starting a new backend session make the problem go away? regards, tom lane
> Does starting a new backend session make the problem go away? No
Richard Huxton <dev@archonet.com> writes: > I seem to remember some subtle problems with dropped columns and plpgsql > functions - could be one of those still left. For instance: regression=# create table usno (ra real, dec real, bmag real, rmag real,ipix int8); CREATE TABLE regression=# [ create Sergey's functions ] regression=# insert into usno values(1,2,3,4); INSERT 1132435 1 regression=# select * from yyy();ra | dec | bmag | rmag | ipix ----+-----+------+------+------ 1 | 2 | 3 | 4 | (1 row) regression=# alter table usno add column errbox box; ALTER TABLE regression=# select * from yyy();ra | dec | bmag | rmag | ipix | errbox ----+-----+------+------+------+-------- 1 | 2 | 3 | 4 | | (1 row) regression=# alter table usno drop column errbox; ALTER TABLE regression=# select * from yyy(); ERROR: wrong record type supplied in RETURN NEXT CONTEXT: PL/pgSQL function "yyy" line 8 at return next regression=# It looks like the code that handles returning a RECORD variable doesn't cope with dropped columns in the function result rowtype. (If you instead declare rec as usno%rowtype, you get a different set of misbehaviors after adding/dropping columns, so that code path isn't perfect either :-() regards, tom lane
> It looks like the code that handles returning a RECORD variable doesn't > cope with dropped columns in the function result rowtype. > > (If you instead declare rec as usno%rowtype, you get a different set > of misbehaviors after adding/dropping columns, so that code path isn't > perfect either :-() Isn't it amazing, Tom, that that column dropping code that we did up for 7.3 is STILL causing bugs :D Chris
On Wed, 16 Feb 2005, Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > > I seem to remember some subtle problems with dropped columns and plpgsql > > functions - could be one of those still left. > > It looks like the code that handles returning a RECORD variable doesn't > cope with dropped columns in the function result rowtype. > > (If you instead declare rec as usno%rowtype, you get a different set > of misbehaviors after adding/dropping columns, so that code path isn't > perfect either :-() Finally I want to clarify, that after copying my "usno" table into another, the problems have disappeared. So I had experienced just exacty the bug with dropped columns. So, is there a chance that this bug will be fixed in some 8.X postgres ? Sergey
> "Sergey E. Koposov" <math@sai.msu.ru> writes: > > LOOP > > FETCH cur into rec; > > RETURN NEXT rec; > > EXIT WHEN NOT FOUND; > > END LOOP; > > RETURN; > > Don't you think you should have the EXIT *above* the RETURN NEXT? > I would expect this to emit a bogus row of nulls after the last row > returned by the cursor. (At least that's what I get with current > sources. Pre-8.0 it might return the last row twice.) Yes, surely EXIT should be written before RETURN NEXT, it was my error, (thanks, but I've found that error by myself, after posting my message) But that small bug does not affect the original problem. > Running it on a 500-million-row table would quite possibly run out of > memory or disk space, too, because RETURN NEXT accumulates all the > results before the function is actually exited. Yes, that's right, but I did not waited until the whole table was loaded in the function. The error, which is the subject of current thread occured just immediately after "select * from yyy()", so surely was not caused by memory overfilling. Concerning to the exact form of my functions (using cursors, but still collecting all the data in the memory). As I understand this is the only one way (or just the simplest way ???) to execute fully dynamic queries returned by C function in PL/SQL. For the real functions which I use, instead of query = ''SELECT * FROM usno''; I have query = my_C_function(some_args); (see full code in my first message) ------------------------------------------------------------ Sergey E. Koposov Sternberg Astronomical Institute, Moscow University (Russia) Max-Planck Institute for Astronomy (Germany) Internet: math@sai.msu.ru, http://lnfm1.sai.msu.su/~math/