Обсуждение: Bug with STABLE function using the wrong snapshot (probably during planning)

Поиск
Список
Период
Сортировка

Bug with STABLE function using the wrong snapshot (probably during planning)

От
Matthijs Bomhoff
Дата:
Hi,

The bit of SQL below does not behave the way it should on postgres 8.4.4 (t=
ested by me) and 9.0.3 (verified independently on #postgresql).

The third statement in the quux() function calls the a_bar() function that =
should find a single row in the 'bar' table and return its value. This sing=
le row is INSERTed into the 'bar' table on the previous line. However, the =
SELECT statement in the a_bar() function throws the following error: "ERROR=
:  query returned no rows". It thus appears not to see the INSERTed value i=
n the 'bar' table. (The expected behavior is that the a_bar() function retu=
rns the value 500 instead of throwing an error.)

Removing the STABLE attribute from a_bar() works around the problem, as doe=
s moving the "INSERT INTO bar ..." statement out of the quux() function and=
 executing it before calling the quux() function itself.

Some initial debugging by RhodiumToad on #postgresql led to the following o=
bservation: The error occurs only when the "SELECT ... WHERE i =3D a_bar();=
" is being planned, not when it is being executed, with the snapshot being =
used to plan the query apparently being too old to see the result of the pr=
eceding insert.

By the way: the EXECUTE around the SELECT in the a_bar() function is probab=
ly not required to trigger the bug, but this is the version we tested.

Regards,

Matthijs Bomhoff



BEGIN;

CREATE TABLE foo(i INTEGER);
CREATE TABLE bar(i INTEGER);

CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$
DECLARE
  result INTEGER;
BEGIN
  EXECUTE 'SELECT i FROM bar' INTO STRICT result;
  RETURN result;
END
$EOF$ LANGUAGE plpgsql STABLE;

CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$
DECLARE
  result INTEGER;
BEGIN
  INSERT INTO foo(i) SELECT s.a FROM generate_series(1,1000,1) s(a);
  INSERT INTO bar(i) VALUES(500);
  SELECT INTO STRICT result COUNT(*) FROM foo WHERE i =3D a_bar();
  RETURN result;
END
$EOF$ LANGUAGE plpgsql;

SELECT quux();

ROLLBACK;

Re: Bug with STABLE function using the wrong snapshot (probably during planning)

От
Noah Misch
Дата:
Hi Matthijs,

Thanks for the report.

On Tue, Mar 22, 2011 at 04:31:47PM +0100, Matthijs Bomhoff wrote:
> The bit of SQL below does not behave the way it should on postgres 8.4.4 (tested by me) and 9.0.3 (verified
independentlyon #postgresql). 

On git master, too.

> The third statement in the quux() function calls the a_bar() function that should find a single row in the 'bar'
tableand return its value. This single row is INSERTed into the 'bar' table on the previous line. However, the SELECT
statementin the a_bar() function throws the following error: "ERROR:  query returned no rows". It thus appears not to
seethe INSERTed value in the 'bar' table. (The expected behavior is that the a_bar() function returns the value 500
insteadof throwing an error.) 
>
> Removing the STABLE attribute from a_bar() works around the problem, as does moving the "INSERT INTO bar ..."
statementout of the quux() function and executing it before calling the quux() function itself. 
>
> Some initial debugging by RhodiumToad on #postgresql led to the following observation: The error occurs only when the
"SELECT... WHERE i = a_bar();" is being planned, not when it is being executed, with the snapshot being used to plan
thequery apparently being too old to see the result of the preceding insert. 

Quite so.  All the core procedural languages have _SPI_execute_plan() manage
CommandCounterIncrement() and PushActiveSnapshot()/PopActiveSnapshot() for the
SQL statements they execute.  Many statements use a snapshot during planning,
but _SPI_prepare_plan() never pushes one.  Therefore, in this example, planning
uses the snapshot pushed in PortalRunSelect().  Expressions evaluated at plan
time miss any changes from earlier in the volatile function.  This is fine when
they merely give "wrong" answers: we might get an inferior selectivity estimate.
In your example, a function that actually needs to see the latest data to avoid
throwing an error, we do have a problem.

The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new
snapshot when analyze_requires_snapshot() returns true on the raw parse tree.
That strategy can break down in the other direction if the caller is STABLE;
consider this example:

  CREATE TABLE foo(i INTEGER);
  CREATE TABLE bar(i INTEGER);
  INSERT INTO foo(i) SELECT s.a FROM generate_series(1,2) s(a);
  INSERT INTO bar(i) VALUES(500);

  BEGIN;
  CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$
  DECLARE
    result INTEGER;
  BEGIN
    EXECUTE 'SELECT i FROM bar' INTO STRICT result;
    RETURN result;
  END
  $EOF$ LANGUAGE plpgsql STABLE;

  CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$
  BEGIN
    LOOP
      RAISE NOTICE 'iteration';
      EXECUTE 'SELECT COUNT(*) FROM foo WHERE i = a_bar()';
      PERFORM pg_sleep(3);
    END LOOP;
  END
  $EOF$ LANGUAGE plpgsql STABLE;

  SELECT quux();
  -- concurrently:
  -- INSERT INTO bar VALUES (501);

  ROLLBACK;

With the current code, the function call runs indefinitely.  With the
_SPI_prepare_plan() change, it fails during planning on the next iteration after
the concurrent change.  This seems less severe than the current bug, but it's
still not great.  We could preserve the behavior of that example by instead
adding a "read_only" parameter to SPI_prepare* (or defining new functions with
the parameter) and having that parameter control snapshot acquisition as it does
for SPI_execute*.  Opinions?  Better ideas?

> BEGIN;
>
> CREATE TABLE foo(i INTEGER);
> CREATE TABLE bar(i INTEGER);
>
> CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$
> DECLARE
>   result INTEGER;
> BEGIN
>   EXECUTE 'SELECT i FROM bar' INTO STRICT result;
>   RETURN result;
> END
> $EOF$ LANGUAGE plpgsql STABLE;
>
> CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$
> DECLARE
>   result INTEGER;
> BEGIN
>   INSERT INTO foo(i) SELECT s.a FROM generate_series(1,1000,1) s(a);
>   INSERT INTO bar(i) VALUES(500);
>   SELECT INTO STRICT result COUNT(*) FROM foo WHERE i = a_bar();
>   RETURN result;
> END
> $EOF$ LANGUAGE plpgsql;
>
> SELECT quux();
>
> ROLLBACK;

Re: Re: Bug with STABLE function using the wrong snapshot (probably during planning)

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
>> Some initial debugging by RhodiumToad on #postgresql led to the following observation: The error occurs only when
the"SELECT ... WHERE i = a_bar();" is being planned, not when it is being executed, with the snapshot being used to
planthe query apparently being too old to see the result of the preceding insert. 

> The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new
> snapshot when analyze_requires_snapshot() returns true on the raw parse tree.
> That strategy can break down in the other direction if the caller is STABLE;
> consider this example:

Yes.  I'm of the opinion that we should not change this.  In general,
marking functions STABLE that have major side effects (such as throwing
errors) is not a good idea, and putting such things into WHERE clauses
is a worse one.  We explicitly do not guarantee anything about timing or
order of evaluation in WHERE clauses, because to do so would cripple the
planner's ability to optimize them.  So I think this is a "don't do
that" case rather than "we should try to make planning happen with the
same snapshot that will be used at execution" case.

            regards, tom lane

Re: Re: Bug with STABLE function using the wrong snapshot (probably during planning)

От
Matthijs Bomhoff
Дата:
On May 13, 2011, at 12:46 AM, Tom Lane wrote:

> Noah Misch <noah@leadboat.com> writes:
>>> Some initial debugging by RhodiumToad on #postgresql led to the followi=
ng observation: The error occurs only when the "SELECT ... WHERE i =3D a_ba=
r();" is being planned, not when it is being executed, with the snapshot be=
ing used to plan the query apparently being too old to see the result of th=
e preceding insert.
>=20
>> The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new
>> snapshot when analyze_requires_snapshot() returns true on the raw parse =
tree.
>> That strategy can break down in the other direction if the caller is STA=
BLE;
>> consider this example:
>=20
> Yes.  I'm of the opinion that we should not change this.  In general,
> marking functions STABLE that have major side effects (such as throwing
> errors) is not a good idea, and putting such things into WHERE clauses
> is a worse one.  We explicitly do not guarantee anything about timing or
> order of evaluation in WHERE clauses, because to do so would cripple the
> planner's ability to optimize them.  So I think this is a "don't do
> that" case rather than "we should try to make planning happen with the
> same snapshot that will be used at execution" case.

First of all, thanks to both of you for your replies to my email; at least =
now I understand a bit more of what went wrong.

As I know almost nothing about the internals, I am not one to argue about w=
hether or not to change the current behavior. It would however perhaps be n=
ice to have the documentation of the volatility categories mention explicit=
ly that throwing an error is also considered a (major) side-effect. In part=
icular: apparently not only are modifications to the database itself consid=
ered side-effects, but so is "irregular" control flow within a procedural l=
anguage... Based on the current documentation, I thought that as my functio=
n made no changes to the database and returns the same results given the sa=
me arguments for all rows within a single statement, it could safely be mar=
ked as STABLE.

Regards,

Matthijs