Обсуждение: consistency check on SPI tuple count failed
Hi all,
the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.
CREATE TABLE test ( a integer, b integer );
INSERT INTO test VALUES ( 1 );
CREATE OR REPLACE FUNCTION foo(INTEGER)
RETURNS INTEGER AS'
BEGIN    RETURN $1 + 1;
END;
' LANGUAGE 'plpgsql';
CREATE OR REPLACE FUNCTION bar()
RETURNS INTEGER AS'
DECLARE   my_ret RECORD;
BEGIN        FOR my_ret IN         SELECT foo(a) AS ret         FROM test    LOOP         IF my_ret.ret = 3 THEN
       RETURN -1;         END IF;                       END LOOP;
 
    RETURN 0;
END;
' LANGUAGE 'plpgsql';
Regards
Gaetano Mendola
			
		I forgot to say to do a: select bar() at the end! Gaetano
"Gaetano Mendola" <mendola@bigfoot.com> writes:
> the following code was working properly under Postgres 7.3.X
> I'm now running my regression test with Postgres 7.4beta1 and I'm
> having the error in subj.
I tried this and got
regression=# select bar();bar
-----  0
(1 row)
regression=#
Anyone else see the problem?
        regards, tom lane
			
		On Fri, 8 Aug 2003, Tom Lane wrote: > "Gaetano Mendola" <mendola@bigfoot.com> writes: > > the following code was working properly under Postgres 7.3.X > > I'm now running my regression test with Postgres 7.4beta1 and I'm > > having the error in subj. > > I tried this and got > > regression=# select bar(); > bar > ----- > 0 > (1 row) > > regression=# > > Anyone else see the problem? I got the same thing as Gaetano on my just prior to beta1 system.
On Fri, 2003-08-08 at 11:55, Tom Lane wrote: > "Gaetano Mendola" <mendola@bigfoot.com> writes: > > the following code was working properly under Postgres 7.3.X > > I'm now running my regression test with Postgres 7.4beta1 and I'm > > having the error in subj. > > I tried this and got > > regression=# select bar(); > bar > ----- > 0 > (1 row) > > regression=# > > Anyone else see the problem? Bar gives 0 for me as well.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> I got the same thing as Gaetano on my just prior to beta1 system.
Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI.  I'm thinking it must be platform-dependent.  What
are you guys using, exactly?
        regards, tom lane
			
		On Fri, 8 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > I got the same thing as Gaetano on my just prior to beta1 system. > > Well, we couldn't have fixed it since beta1 --- there's been no changes > anywhere near SPI. I'm thinking it must be platform-dependent. What > are you guys using, exactly? I'm using RedHat 9.
"Tom Lane" <tgl@sss.pgh.pa.us> > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > I got the same thing as Gaetano on my just prior to beta1 system. > > Well, we couldn't have fixed it since beta1 --- there's been no changes > anywhere near SPI. I'm thinking it must be platform-dependent. What > are you guys using, exactly? > > regards, tom lane kalman=# select version(); version ---------------------------------------------------------------------------- --------------------------------PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5) (1 row) Regards Gateano Mendola
"Tom Lane" <tgl@sss.pgh.pa.us> wrote: > "Gaetano Mendola" <mendola@bigfoot.com> writes: > > the following code was working properly under Postgres 7.3.X > > I'm now running my regression test with Postgres 7.4beta1 and I'm > > having the error in subj. > > I tried this and got > > regression=# select bar(); > bar > ----- > 0 > (1 row) > > regression=# > > Anyone else see the problem? > > regards, tom lane Incredible to believe but after playng around that funcion started to work. I'm not crazy. I deleted the DB. Stopped postgres. Restart postgres. Create the DB. Create the language. Inserted my example. Again the error: kalman=# select bar(); ERROR: consistency check on SPI tuple count failed CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows kalman=# select bar(); ERROR: consistency check on SPI tuple count failed CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. The connection to the server was lost. Attempting reset: Failed. Gaetano
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> Again the error:
> kalman=# select bar();
> ERROR:  consistency check on SPI tuple count failed
> CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> kalman=# select bar();
> ERROR:  consistency check on SPI tuple count failed
> CONTEXT:  PL/pgSQL function "bar" line 5 at for over select rows
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
After adding a second row to the test table, I am able to reproduce
the above (including the core dump after second try) on an intel/linux
box, but *not* on HPUX.
I now suspect a memory-stomp kind of problem, like someone writing one
too many bytes in a struct.  HPUX tends to mask these in situations
where intel will not, because it uses MAXALIGN 8 rather than 4.
I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed.  How the heck could that happen?
Compiler bug, or am I just crazy?
        regards, tom lane
			
		On Fri, 8 Aug 2003, Tom Lane wrote: > "Mendola Gaetano" <mendola@bigfoot.com> writes: > > Again the error: > > > kalman=# select bar(); > > ERROR: consistency check on SPI tuple count failed > > CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows > > kalman=# select bar(); > > ERROR: consistency check on SPI tuple count failed > > CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Failed. > > After adding a second row to the test table, I am able to reproduce > the above (including the core dump after second try) on an intel/linux > box, but *not* on HPUX. > > I now suspect a memory-stomp kind of problem, like someone writing one > too many bytes in a struct. HPUX tends to mask these in situations > where intel will not, because it uses MAXALIGN 8 rather than 4. > > I have also just traced through _SPI_cursor_operation() in spi.c, > watched PortalRunFetch return 2, and then watched _SPI_checktuples read > zero from _SPI_current->processed. How the heck could that happen? > Compiler bug, or am I just crazy? Not sure, but I got the same thing. When I changed it to put the result in a temporary int variable and then put it in it started working for me (returning 0), reverting to the original made it fail again. I'm going to try -O0 and see what happens there.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Fri, 8 Aug 2003, Tom Lane wrote:
>> I have also just traced through _SPI_cursor_operation() in spi.c,
>> watched PortalRunFetch return 2, and then watched _SPI_checktuples read
>> zero from _SPI_current->processed.  How the heck could that happen?
>> Compiler bug, or am I just crazy?
> Not sure, but I got the same thing.  When I changed it to put the
> result in a temporary int variable and then put it in it started
> working for me (returning 0), reverting to the original made it fail
> again.  I'm going to try -O0 and see what happens there.
Oooohhhh ...
<lightbulb>
SPI_stack can move around as functions are entered/exited.
</lightbulb>
Wonder why we've not seen that kind of failure happen before?  Someone
(doubtless me) must have changed the coding of this routine since 7.3.
        regards, tom lane
			
		"Mendola Gaetano" <mendola@bigfoot.com> writes:
> Incredible to believe but after playng around  that funcion started
> to work. I'm not crazy.
Yeah, it was a problem with storing into a possibly-obsolete pointer ---
the visible effects could range from nothing to a core dump depending on
whether the pointer was really out-of-date and what got clobbered if it
was.
Fix is in CVS.
        regards, tom lane