Обсуждение: My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

Поиск
Список
Период
Сортировка

My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

От
Bryn Llewellyn
Дата:
*Summary*

My tests show that, when a WITHOUT HOLD cursor has to cache results (see Note 1), then the WHERE clause (if present) is stripped off the cursor's defining SELECT statement and the entire unrestricted result set is cached. But when a WITH HOLD cursor is used, then it’s the *restricted* result set that’s cached.

I do see that this wouldn't have a detectable effect when the cursor's defining query doesn't involve any volatile functions. But it does seem that too much data is cached in the "not holdable" case—and this seems to be a bad thing for space use and for speed.

I'd like to know the rationale for this design choice but I've been unable to find any mention of it in the PG doc (see Note 2).

My test design used this statement:

select v from rndm_series() [where rndm_filter()]

And I used a trivial table function "cursor_rows()” that simply iterated over all the rows that the cursor defined, from first through last, to display the cursor's result set three times in succession.

This lead to four runs: "WITH or WITHOUT HOLD" by "with or without the WHERE clause".

I might just as well have used a simple "series()" function wrapped around "generate_series()" that used "raise info" to report that it was called together with a simple "filter()" function that always returned true and, again, that used "raise info" to report that it was called. But I find it too hard to divert "raise info" output to a spool file and have it interleave properly with regular SQL output. So I used a volatile "rndm_series() that always returned ten rows but (using the random() built-in function) produced a different set of values on each call. And I used a volatile "rndm_filter()" function that similarly randomly returned TRUE or FALSE. These devices dramatized the effect I'm that reporting here and made it simple to record the results ordinarily with the \o meta-command.

I'm assuming that my PL/pgSQL functions are opaque to the planner and so that it cannot run the execution plan in backwards order and must, therefore, cash the cursor's result set.

I would have expected the *restricted* result set to be cached for both flavors of cursor—holdable and not. After all, we're taught to expect nonsense results when a volatile function's results are cached—so a cursor should be no different w.r.t. this thinking.

(Of course, I've read the CAUTION in the PG doc for the DECLARE statement that says « Scrollable cursors may give unexpected results if they invoke any volatile functions » and I wouldn't use a volatile function in real life.)
——————————————————————————————

*Note 1*

I read Laurenz's blogpost "WITH HOLD cursors and transactions in PostgreSQL" (www.cybertec-postgresql.com/en/with-hold-cursors-and-transactions-in-postgresql/) and I noted this:

«
[Some] execution plans [for a WITHOUT HOLD cursor] require the explicit keyword SCROLL for the cursor to become scrollable. Such cursors incur an overhead, because the server must cache the entire result set.
»

and then this:

«
PostgreSQL must calculate the complete result set [for a WITH HOLD cursor] at COMMIT time and cache it on the server.
»

In other words, both a WITH HOLD cursor (always) and a WITHOUT HOLD cursor (sometimes) must cache the entire result set.
——————————————————————————————

*Note 2*

Laurenz also wrote this in the "Is the PL/pgSQL refcursor useful in a modern three-tier app?" thread that I started:

«
 I understand your confusion, and I believe that the documentation could be improved... I personally find that reading the PostgreSQL documentation gets you far, but only so far: for deep understanding, you have to read the code... I have come to see it as an extension of the documentation that covers the details.
»
——————————————————————————————

*Testcase*

I used PG Version 15.2. The code that follows is self-contained. You need just to connect as an ordinary user to a database where it has the CREATE privilege. Copy it into, say, "t.sql" and start it in psql. (Make sure that you have the usual « AUTOCOMMIT = 'on' » setting.)

\c :db :u1

drop schema if exists s1 cascade;
create schema s1;
set search_path = s1, pg_catalog, pg_temp;

create procedure init_rndm()
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  perform setseed(0.0::double precision);
end;
$body$;

create function rndm_filter()
  returns boolean
  set search_path = pg_catalog, pg_temp
  language sql
as $body$
  select (random() > 0.7::float8);
$body$;

-- Ensure that the planner cannot know how the rows are delivered.
create function rndm_series()
  returns table(v int)
  set search_path = s1, pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  val int not null := 0;
  ctr int not null := 0;
begin
  loop
    val := val + 1;
    if (random() > 0.9::float8) then
      ctr := ctr + 1;
      exit when ctr > 10;
      v := val; return next;
    end if;
  end loop;
end;
$body$;

create procedure open_holdable_cursor(holdable in boolean, filter_series in boolean, caption inout text)
  language plpgsql
as $body$
declare
  cur constant refcursor not null := 'cur';

  without_filter constant text not null :=
   'declare cur scroll cursor without hold for '||
   'select v from rndm_series()';

  with_filter constant text not null :=
   without_filter||' where rndm_filter()';

  without_filter_holdable constant text not null :=
    replace(without_filter, 'without', 'with');

  with_filter_holdable constant text not null :=
    replace(with_filter, 'without', 'with');
begin
  begin
    close cur;
  exception when invalid_cursor_name then
    null;
  end;

  case holdable
    when false then
      case filter_series
        when false then   execute without_filter;
        when true  then   execute with_filter;
      end case;

    when true then
      case filter_series
        when false then   execute without_filter_holdable;
        when true  then   execute with_filter_holdable;
      end case;
  end case;

  declare
    holdable text not null := '';
    stmt     text not null := '';
  begin
    select
      case is_holdable
        when true  then 'with hold'
        when false then 'without hold'
      end,
      statement
    into holdable, stmt
    from pg_cursors where name = 'cur';

    stmt := replace(stmt, ' without hold', '');
    stmt := replace(stmt, ' with hold', '');
    stmt := replace(stmt, 'declare cur scroll cursor for ', '');
    caption := rpad(holdable||':', 14)||stmt;
  end;
end;
$body$;

create function cursor_rows()
  returns table(z int)
  set search_path = s1, pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  cur constant refcursor not null := 'cur';
begin
  move absolute 0 in cur;
  loop
    fetch next from cur into z;
    exit when not found;
    return next;
  end loop;
end;
$body$;
--------------------------------------------------------------------------------

\t on
\o spool.txt

start transaction;
  call init_rndm();
  call open_holdable_cursor(false, false, '');
  select z from cursor_rows();
  select z from cursor_rows();
  select z from cursor_rows();
rollback;

start transaction;
  call init_rndm();
  call open_holdable_cursor(false, true, '');
  select z from cursor_rows();
  select z from cursor_rows();
  select z from cursor_rows();
rollback;

call init_rndm();
call open_holdable_cursor(true, false, '');
select z from cursor_rows();
select z from cursor_rows();
select z from cursor_rows();

call init_rndm();
call open_holdable_cursor(true, true, '');
select z from cursor_rows();
select z from cursor_rows();
select z from cursor_rows();

\o
\t off

--------------------------------------------------------------------------------

Here's what my "spool.txt" contained:

 without hold: select v from rndm_series()

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

 without hold: select v from rndm_series() where rndm_filter()

  9
 91

  9
 34
 39
 47
 91

  35
 136
 137

 with hold:    select v from rndm_series()

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

   9
  34
  35
  39
  40
  47
  91
 101
 136
 137

 with hold:    select v from rndm_series() where rndm_filter()

  9
 91

  9
 91

  9
 91


--------------------------------------------------------------------------------

Re: My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

От
"David G. Johnston"
Дата:
On Fri, Mar 31, 2023 at 12:35 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
*Summary*

My tests show that, when a WITHOUT HOLD cursor has to cache results (see Note 1), then the WHERE clause (if present) is stripped off the cursor's defining SELECT statement and the entire unrestricted result set is cached. But when a WITH HOLD cursor is used, then it’s the *restricted* result set that’s cached.

I do see that this wouldn't have a detectable effect when the cursor's defining query doesn't involve any volatile functions. But it does seem that too much data is cached in the "not holdable" case—and this seems to be a bad thing for space use and for speed.


IIUC, all you've demonstrated here is the (sometimes) case for the WITHOUT HOLD cursor where a cache is not used (i.e., the typical case).  In this situation the executor, when asked to rewind back to the beginning, goes and restarts execution at the beginning (executor nodes form a tree, it is probable that certain nodes are more efficient at this "start over" thing that others - e.g., I suspect a materialize node sitting in the tree would prevent a sequential scan node from being asked to "start over"), which necessarily involves potentially re-evaluating volatile functions/expressions as noted.

David J.

Re: My tests show that a WITH HOLD cursor has subtly different semantics from a WITHOUT HOLD cursor

От
Bryn Llewellyn
Дата:
david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

*Summary*

My tests show that, when a WITHOUT HOLD cursor has to cache results (see Note 1), then the WHERE clause (if present) is stripped off the cursor's defining SELECT statement and the entire unrestricted result set is cached. But when a WITH HOLD cursor is used, then it’s the *restricted* result set that’s cached.

I do see that this wouldn't have a detectable effect when the cursor's defining query doesn't involve any volatile functions. But it does seem that too much data is cached in the "not holdable" case—and this seems to be a bad thing for space use and for speed.

IIUC, all you've demonstrated here is the (sometimes) case for the WITHOUT HOLD cursor where a cache is not used (i.e., the typical case). In this situation the executor, when asked to rewind back to the beginning, goes and restarts execution at the beginning (executor nodes form a tree, it is probable that certain nodes are more efficient at this "start over" thing that others - e.g., I suspect a materialize node sitting in the tree would prevent a sequential scan node from being asked to "start over"), which necessarily involves potentially re-evaluating volatile functions/expressions as noted.

Forgive me. I don't understand your reply. I do understand (having read Laurenz's blog post) that sometimes the execution plan for the cursor's defining SELECT cannot be run backwards. I'm not sure that it matters whether this case is typical or not. It's enough that it can occur. And this is the case that I'm interested in. Laurenz says that in this case, for a WITHOUT HOLD cursor, the results must be cached to allow scrollability. And the results of my tests are consistent with this—up to a point.

However, my results show that for the WITHOUT HOLD case, the restriction that the cursor's SELECT might have is *not* applied to what's cached. But the restriction *is* applied when the WITH HOLD cache is populated.

And it's this that I'm asking about.

Forget that I ever said "volatile". I just edited the code that I included in my previous post. I globally replaced "rndm_series" with "series". And I globally replaced "rndm_filter" with "filter". I also removed the "create procedure init_rndm()" statement and removed the calls of the procedure. Here are the new implementations of "series()" and "filter()""

create function series()
  returns table(v int)
  set search_path = s1, pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info 'series() invoked';

  for v in (select generate_series(1, 10))loop
    return next;
  end loop;
end;
$body$;

and

create function filter()
  returns boolean
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
begin
  raise info 'filter() invoked';
  return true;
end;
$body$;

Then I ran the four tests by hand because I don't know how to spool the "raise info" output to a file. In all cases, the "cursor_rows()" invocation just reports the ten rows with values in 1 through 10 — of course.

Here's what I saw:

* (1) open_holdable_cursor(holdable=>false, filter_series=>false, caption=>'') *

The "open_holdable_cursor()" call completes silently.

The first "cursor_rows()" invocation reports "series() invoked" once.

Subsequent "cursor_rows()" invocations produce their rows without that message.

* (2) call open_holdable_cursor(holdable=>false, filter_series=>true, caption=>'') *

The "open_holdable_cursor()" call completes silently again.

The first "cursor_rows()" invocation again reports "series() invoked" once. And then it reports "filter() invoked" ten times.

The second "cursor_rows()" invocation again does *not* report "series() invoked". But it *does* report "filter() invoked" ten times. This tells me that its the *unrestricted* results that are cached.

It's the same for the third invocation (and any more that I care to do).

*  (3) open_holdable_cursor(holdable=>true, filter_series=>false, caption=>'') *

The "open_holdable_cursor()" call now reports "series() invoked".

The first, and all subsequent, "cursor_rows()" invocations do not say "series() invoked".

* (4) open_holdable_cursor(holdable=>true, filter_series=>true, caption=>'') *

The "open_holdable_cursor()" call now reports "series() invoked" followed by "filter() invoked" ten times.

The first, and all subsequent, "cursor_rows()" invocations do not bring any "raise info" output because the *restricted* results are cached.

I hope that my question is clearer now.