Обсуждение: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from insidepl/pgsql

Поиск
Список
Период
Сортировка

[BUGS] pg_logical_slot_peek_changes crashes postgres when called from insidepl/pgsql

От
Ben Chobot
Дата:
I'm probably doing something dumb, but even something dumb at this high level probably shouldn't result in a crash. I've tried with multiple decoders and get the same result. I also have a stack trace from 9.5.5, if that helps: https://paste.depesz.com/s/Bu

postgres@c61-pg509:~$ psql
psql (9.5.9)
Type "help" for help.

postgres=# select version();
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit
(1 row)

postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
    slot_name    | xlog_position
-----------------+---------------
 regression_slot | 541/180342D0
(1 row)

postgres=# SELECT * FROM pg_replication_slots;
    slot_name    |    plugin     | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn
-----------------+---------------+-----------+--------+----------+--------+------------+------+--------------+--------------
 regression_slot | test_decoding | logical   |  12379 | postgres | f      |            |      |    287608852 | 541/18034298
(1 row)

postgres=# CREATE TABLE public.foo(i int);
CREATE TABLE

postgres=# insert into public.foo(i) values(1);
INSERT 0 1
postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL);
   location   |    xid    |                  data
--------------+-----------+----------------------------------------
 541/18034360 | 287608852 | BEGIN 287608852
 541/180438D0 | 287608852 | COMMIT 287608852
 541/180438D0 | 287608853 | BEGIN 287608853
 541/180438D0 | 287608853 | table public.foo: INSERT: i[integer]:1
 541/18043940 | 287608853 | COMMIT 287608853
(5 rows)

postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, 1);
   location   |    xid    |       data
--------------+-----------+------------------
 541/18034360 | 287608852 | BEGIN 287608852
 541/180438D0 | 287608852 | COMMIT 287608852
(2 rows)

postgres=# SELECT * FROM pg_logical_slot_peek_changes('regression_slot', NULL, 1) limit 1;
   location   |    xid    |      data
--------------+-----------+-----------------
 541/18034360 | 287608852 | BEGIN 287608852
(1 row)

postgres=# CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS pg_lsn AS
postgres-# $$
postgres$# BEGIN
postgres$#     return location from pg_logical_slot_peek_changes(slot,null,1) limit 1;
postgres$# END
postgres$# $$ language plpgsql;
CREATE FUNCTION
postgres=# select logical_replication_slot_lsn_delta('regression_slot');   # crash!
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
->


#...and to cleanup
postgres@c61-pg509:~$ psql
psql (9.5.9)
Type "help" for help.

postgres=# select pg_drop_replication_slot('regression_slot');
 pg_drop_replication_slot
--------------------------

(1 row)

postgres=# drop table public.foo;
DROP TABLE



Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Michael Paquier
Дата:
On Thu, Oct 5, 2017 at 5:59 AM, Ben Chobot <bench@silentmedia.com> wrote:
> I'm probably doing something dumb, but even something dumb at this high
> level probably shouldn't result in a crash. I've tried with multiple
> decoders and get the same result.

Yes, I can reproduce the problem easily. With assertions enabled, this
blows up differently:   frame #3: 0x000000010278ef20
postgres`ExceptionalCondition(conditionName="!(_SPI_current->tuptable
== ((void*)0))", errorType="FailedAssertion", fileName="spi.c",
lineNumber=297) + 128 at assert.c:54   frame #4: 0x000000010240c88b
postgres`AtEOSubXact_SPI(isCommit='\0', mySubid=2) + 811 at spi.c:297   frame #5: 0x00000001021fb4d9
postgres`AbortSubTransaction+ 553 at
 
xact.c:4813   frame #6: 0x00000001021fb8d2 postgres`AbortCurrentTransaction +
306 at xact.c:3113   frame #7: 0x0000000102544315
postgres`ReorderBufferCommit(rb=0x00007fba41845a40, xid=556,
commit_lsn=23043200, end_lsn=23043608, commit_time=560497405054853,
origin_id=0, origin_lsn=0) + 2501 at reorderbuffer.c:1624   frame #8: 0x0000000102537311
postgres`DecodeCommit(ctx=0x00007fba41835840, buf=0x00007fff5daf2168,
parsed=0x00007fff5daf20b0, xid=556) + 545 at decode.c:611   frame #9: 0x000000010253682c
postgres`DecodeXactOp(ctx=0x00007fba41835840, buf=0x00007fff5daf2168)
+ 364 at decode.c:241   frame #10: 0x00000001025363fe
postgres`LogicalDecodingProcessRecord(ctx=0x00007fba41835840,
record=0x00007fba41835b00) + 142 at decode.c:113   frame #11: 0x000000010253cc42
postgres`pg_logical_slot_get_changes_guts(fcinfo=0x00007fff5daf2530,
confirm='\0', binary='\0') + 2562 at logicalfuncs.c:308   frame #12: 0x000000010253cdbb
postgres`pg_logical_slot_peek_changes(fcinfo=0x00007fff5daf2530) + 27
at logicalfuncs.c:381
(lldb) up 1
frame #4: 0x000000010240c88b postgres`AtEOSubXact_SPI(isCommit='\0',
mySubid=2) + 811 at spi.c:297  294                 }  295             }  296             /* in particular we should
havegotten rid of any
 
in-progress table */
-> 297             Assert(_SPI_current->tuptable == NULL);  298         }  299     }  300

This looks like a legit bug to me. Andres, any opinions?
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> This looks like a legit bug to me. Andres, any opinions?

I wonder why ReorderBufferCommit does this:
    if (using_subtxn)        BeginInternalSubTransaction("replay");    else        StartTransactionCommand();

and then tries to clean that up with this brain-dead-looking sequence:
    AbortCurrentTransaction();
    /* make sure there's no cache pollution */    ReorderBufferExecuteInvalidations(rb, txn);
    if (using_subtxn)        RollbackAndReleaseCurrentSubTransaction();

Shouldn't that be something like
    if (using_subtxn)        RollbackAndReleaseCurrentSubTransaction();    else        AbortCurrentTransaction();

?  Although by this theory, the using_subtxn path has never worked,
not even a little bit, which seems somewhat unlikely.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-05 11:38:40 -0400, Tom Lane wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
> > This looks like a legit bug to me. Andres, any opinions?
> 
> I wonder why ReorderBufferCommit does this:
> 
>         if (using_subtxn)
>             BeginInternalSubTransaction("replay");
>         else
>             StartTransactionCommand();
> 
> and then tries to clean that up with this brain-dead-looking sequence:
> 
>         AbortCurrentTransaction();
> 
>         /* make sure there's no cache pollution */
>         ReorderBufferExecuteInvalidations(rb, txn);
> 
>         if (using_subtxn)
>             RollbackAndReleaseCurrentSubTransaction();
> 
> Shouldn't that be something like
> 
>         if (using_subtxn)
>             RollbackAndReleaseCurrentSubTransaction();
>         else
>             AbortCurrentTransaction();
> 
> ?  Although by this theory, the using_subtxn path has never worked,
> not even a little bit, which seems somewhat unlikely.

I'm not sure what the problem is that you're seeing? The separation of
AbortCurrentTransaction() and RollbackAndReleaseCurrentSubTransaction()
is so that invalidations get executed outside an xact. The AbortCurrent
will move from TBLOCK_SUBINPROGRESS to TBLOCK_SUBABORT, the release then
from TBLOCK_SUBABORT to the containing transaction's state?

Afaict the exactly same crashing codepath would be reached if
RollbackAndReleaseCurrentSubTransaction() were immediately called,
without a preceding AbortCurrentTransaction().

I don't think that's really the problem. I don't know SPI very well, but
from a quick look it looks to me that the problem is that
AtEOSubXact_SPI() enters the/* * If we are aborting a subtransaction and there is an open SPI context * surrounding the
subxact,clean up to prevent memory leakage. */
 
block even if 'found' is false.  I need to look more at this, but just
adding a 'found &&' to that if makes things pass.

What we have here is that plpgsql uses SPI to execute
pg_logical_slot_peek_changes(), which internally does *not* use SPI. For
every replayed transaction it starts / finishes an internal
subtransaction, and when called from SPI the SPI subxabort handler acts
on the surrounding spi context.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-10-05 11:38:40 -0400, Tom Lane wrote:
>> I wonder why ReorderBufferCommit does this:
>> ...

> I'm not sure what the problem is that you're seeing?

Nah, I take that back.  The AbortCurrentTransaction call looks funny
(and is sadly underdocumented) but it's not invalid to call it and
then call RollbackAndReleaseCurrentSubTransaction.

> What we have here is that plpgsql uses SPI to execute
> pg_logical_slot_peek_changes(), which internally does *not* use SPI.

Yeah.  I think there are two separate issues:

1. The Assert that's crashing is just wrong and should be removed.
It's a hangover from the previous behavior (before 3d13623d7)
of unconditionally setting _SPI_current->tuptable = NULL there.

2. The "MemoryContextResetAndDeleteChildren(_SPI_current->execCxt)"
call, further up, is deleting a still-live executor state tree,
as well as the logical-decoding context that is a child of that
executor query context.  So if you get past the Assert you'll still
crash later on.

I'm not sure about a good way to fix #2 without re-introducing the memory
leaks that call was added to fix (cf 7ec1c5a86).  It's slightly annoying
at this point that we got rid of the SPI_push/SPI_pop mechanism, because
we could perhaps have used a check of the _SPI_curid value to distinguish
a SPI context we should clean up from one we shouldn't.  But that ship has
sailed, and even if we wanted to undo that change there'd still be the
matter of how to fix the bugs that prompted removing SPI_push/SPI_pop.

The best idea I have at the moment is to introduce a new SPI API
function along the lines of "SPI_cleanup_execution()" which would
do the execCxt cleanup, expect SPI-using callers of internal
subtransactions to do that as part of error cleanup, and drop the
execCxt cleanup from AtEOSubXact_SPI.  This is kind of annoying
because it's an API change that probably affects external PLs.
A PL that failed to absorb the change would have a memory leak,
although likely not a very bad one since leakage would only
accumulate in the case of a lot of failed SPI_executes in a row
(each in a subtransaction) with never a success.

Since the SPI_push-ectomy only happened in v10, conceivably we could avoid
the API break in prior branches by solving this with the _SPI_curid check
idea in those branches.  I'm not really enamored of that though, since it
means designing and testing two independent fixes.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
Hi,

On 2017-10-05 17:43:39 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-10-05 11:38:40 -0400, Tom Lane wrote:
> >> I wonder why ReorderBufferCommit does this:
> >> ...
>
> > I'm not sure what the problem is that you're seeing?
>
> Nah, I take that back.  The AbortCurrentTransaction call looks funny
> (and is sadly underdocumented) but it's not invalid to call it and
> then call RollbackAndReleaseCurrentSubTransaction.

It's not perfectly documented, but there's some:/* * Decoding needs access to syscaches et al., which in turn use *
heavyweightlocks and such. Thus we need to have enough state around to * keep track of those.  The easiest way is to
simplyuse a transaction * internally.  That also allows us to easily enforce that nothing writes * to the database by
checkingfor xid assignments. * * When we're called via the SQL SRF there's already a transaction * started, so start an
explicitsubtransaction there. */
 
...    /*     * Aborting the current (sub-)transaction as a whole has the right     * semantics. We want all locks
acquiredin here to be released, not     * reassigned to the parent and we do not want any database access     * have
persistenteffects.     */
 


> 2. The "MemoryContextResetAndDeleteChildren(_SPI_current->execCxt)"
> call, further up, is deleting a still-live executor state tree,
> as well as the logical-decoding context that is a child of that
> executor query context.  So if you get past the Assert you'll still
> crash later on.

Right. That's why just adding found && to the entire if "works", as it
avoids that part as well.


> I'm not sure about a good way to fix #2 without re-introducing the memory
> leaks that call was added to fix (cf 7ec1c5a86).  It's slightly annoying
> at this point that we got rid of the SPI_push/SPI_pop mechanism, because
> we could perhaps have used a check of the _SPI_curid value to distinguish
> a SPI context we should clean up from one we shouldn't.  But that ship has
> sailed, and even if we wanted to undo that change there'd still be the
> matter of how to fix the bugs that prompted removing SPI_push/SPI_pop.

I think I don't fully understand what 7ec1c5a86 is trying to
achieve. Unfortunately reading the commit message and comment hasn't
cleared it up much so far. Why do we want to clean up memory in a
subtransaction that's above the one aborted?  I can't yet meaningfully
comment on your proposals before fully understanding, sorry.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-10-05 17:43:39 -0400, Tom Lane wrote:
>> Nah, I take that back.  The AbortCurrentTransaction call looks funny
>> (and is sadly underdocumented) but it's not invalid to call it and
>> then call RollbackAndReleaseCurrentSubTransaction.

> It's not perfectly documented, but there's some:

What I'm on about is that I think the AbortCurrentTransaction call needs a
comment along the lines of
 * Roll back the current (sub)transaction.  In the using_subtxn * case, we could leave it to
RollbackAndReleaseCurrentSubTransaction* to do this, but we do it separately anyway because XYZ.
 

It's the fact that XYZ isn't very obvious that makes this comment
necessary.  I'm guessing it's because ReorderBufferExecuteInvalidations
either doesn't work or is less efficient if inside a still-valid
transaction, but the fact that I need to guess is the problem.


>> I'm not sure about a good way to fix #2 without re-introducing the memory
>> leaks that call was added to fix (cf 7ec1c5a86).

> I think I don't fully understand what 7ec1c5a86 is trying to
> achieve.

I dug around in the archives and found

https://www.postgresql.org/message-id/26365.1162532453%40sss.pgh.pa.us

The function shown there doesn't appear to leak any memory at all in HEAD,
but if you dike out the memory context reset in question, it leaks like
crazy.  I didn't try to reconfirm my old estimate of 16KB per iteration,
but it seemed to be in that ballpark still.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-05 19:42:30 -0400, Tom Lane wrote:
> I dug around in the archives and found

Ah thanks.

> https://www.postgresql.org/message-id/26365.1162532453%40sss.pgh.pa.us
> 
> The function shown there doesn't appear to leak any memory at all in HEAD,
> but if you dike out the memory context reset in question, it leaks like
> crazy.  I didn't try to reconfirm my old estimate of 16KB per iteration,
> but it seemed to be in that ballpark still.

Just ran this, got out-of-memory error, and then another out-of-memory
error, ...  I wonder if we should exclude out_of_memory from OTHERS,
like we do QUERY_CANCELED and ASSERT_FAILURE.

(looking)

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-10-05 19:42:30 -0400, Tom Lane wrote:
>> https://www.postgresql.org/message-id/26365.1162532453%40sss.pgh.pa.us

> Just ran this, got out-of-memory error, and then another out-of-memory
> error, ...  I wonder if we should exclude out_of_memory from OTHERS,
> like we do QUERY_CANCELED and ASSERT_FAILURE.

Nah; that presupposes that if a subquery runs out of memory then the
surrounding function is necessarily broken.  I don't buy that.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-05 19:42:30 -0400, Tom Lane wrote:
> >> I'm not sure about a good way to fix #2 without re-introducing the memory
> >> leaks that call was added to fix (cf 7ec1c5a86).
> 
> > I think I don't fully understand what 7ec1c5a86 is trying to
> > achieve.
> 
> I dug around in the archives and found
> 
> https://www.postgresql.org/message-id/26365.1162532453%40sss.pgh.pa.us
> 
> The function shown there doesn't appear to leak any memory at all in HEAD,
> but if you dike out the memory context reset in question, it leaks like
> crazy.  I didn't try to reconfirm my old estimate of 16KB per iteration,
> but it seemed to be in that ballpark still.

I knew that I disliked SPI, but I forgot how much I disliked it :(. I
think one of these years we should really replace it - it quite
frequently comes up as problematic.

I was wondering if the appropriate fix here wouldn't be to just always
do an SPI_connect()/finish() inside such subtransactions - that feels
more correct. But it's not easy due to the way memory management's done
in plpgsql (and presumably elsewhere) :(.

So far your option of allowing to opt in into additional cleanup in the
CATCH seems the least ugly

I'm kinda surprised that this only causes problems with logical decoding
and not elsewhere, this isn't a new issue.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'm kinda surprised that this only causes problems with logical decoding
> and not elsewhere, this isn't a new issue.

The reason for that seems to be that ReorderBufferCommit and
ReorderBufferImmediateInvalidation are the only users of
BeginInternalSubTransaction other than the PLs.  So otherwise we
never see a case of subtransaction abort where we don't want
to clean up the current operation in the surrounding SPI context.

> So far your option of allowing to opt in into additional cleanup in the
> CATCH seems the least ugly

I thought of a better way, as attached.  The core problem is that we
need to know whether the SPI context's current executor operation is
"inside" or "outside" the subtransaction being abandoned.  That info
is not tracked at the moment, but we can track it, at a cost that's
entirely trivial compared to the other stuff that a SPI operation
will do.

This seems back-patchable without much angst, although we might want
to add the new _SPI_connection field at the end in the back branches.
I'm not sure if anything outside core is looking at spi_priv.h.

            regards, tom lane

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index afe231f..a2de087 100644
*** a/src/backend/executor/spi.c
--- b/src/backend/executor/spi.c
*************** static void _SPI_cursor_operation(Portal
*** 71,78 ****
  static SPIPlanPtr _SPI_make_plan_non_temp(SPIPlanPtr plan);
  static SPIPlanPtr _SPI_save_plan(SPIPlanPtr plan);

! static int    _SPI_begin_call(bool execmem);
! static int    _SPI_end_call(bool procmem);
  static MemoryContext _SPI_execmem(void);
  static MemoryContext _SPI_procmem(void);
  static bool _SPI_checktuples(void);
--- 71,78 ----
  static SPIPlanPtr _SPI_make_plan_non_temp(SPIPlanPtr plan);
  static SPIPlanPtr _SPI_save_plan(SPIPlanPtr plan);

! static int    _SPI_begin_call(bool use_exec);
! static int    _SPI_end_call(bool use_exec);
  static MemoryContext _SPI_execmem(void);
  static MemoryContext _SPI_procmem(void);
  static bool _SPI_checktuples(void);
*************** SPI_connect(void)
*** 118,123 ****
--- 118,124 ----
      _SPI_current->processed = 0;
      _SPI_current->lastoid = InvalidOid;
      _SPI_current->tuptable = NULL;
+     _SPI_current->execSubid = InvalidSubTransactionId;
      slist_init(&_SPI_current->tuptables);
      _SPI_current->procCxt = NULL;    /* in case we fail to create 'em */
      _SPI_current->execCxt = NULL;
*************** SPI_finish(void)
*** 149,155 ****
  {
      int            res;

!     res = _SPI_begin_call(false);    /* live in procedure memory */
      if (res < 0)
          return res;

--- 150,156 ----
  {
      int            res;

!     res = _SPI_begin_call(false);    /* just check we're connected */
      if (res < 0)
          return res;

*************** AtEOSubXact_SPI(bool isCommit, SubTransa
*** 268,275 ****
      {
          slist_mutable_iter siter;

!         /* free Executor memory the same as _SPI_end_call would do */
!         MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);

          /* throw away any tuple tables created within current subxact */
          slist_foreach_modify(siter, &_SPI_current->tuptables)
--- 269,283 ----
      {
          slist_mutable_iter siter;

!         /*
!          * Throw away executor state if current executor operation was started
!          * within current subxact (essentially, force a _SPI_end_call).
!          */
!         if (_SPI_current->execSubid >= mySubid)
!         {
!             _SPI_current->execSubid = InvalidSubTransactionId;
!             MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);
!         }

          /* throw away any tuple tables created within current subxact */
          slist_foreach_modify(siter, &_SPI_current->tuptables)
*************** AtEOSubXact_SPI(bool isCommit, SubTransa
*** 293,300 ****
                  MemoryContextDelete(tuptable->tuptabcxt);
              }
          }
-         /* in particular we should have gotten rid of any in-progress table */
-         Assert(_SPI_current->tuptable == NULL);
      }
  }

--- 301,306 ----
*************** _SPI_procmem(void)
*** 2446,2460 ****

  /*
   * _SPI_begin_call: begin a SPI operation within a connected procedure
   */
  static int
! _SPI_begin_call(bool execmem)
  {
      if (_SPI_current == NULL)
          return SPI_ERROR_UNCONNECTED;

!     if (execmem)                /* switch to the Executor memory context */
          _SPI_execmem();

      return 0;
  }
--- 2452,2475 ----

  /*
   * _SPI_begin_call: begin a SPI operation within a connected procedure
+  *
+  * use_exec is true if we intend to make use of the procedure's execCxt
+  * during this SPI operation.  We'll switch into that context, and arrange
+  * for it to be cleaned up at _SPI_end_call or if an error occurs.
   */
  static int
! _SPI_begin_call(bool use_exec)
  {
      if (_SPI_current == NULL)
          return SPI_ERROR_UNCONNECTED;

!     if (use_exec)
!     {
!         /* remember when the Executor operation started */
!         _SPI_current->execSubid = GetCurrentSubTransactionId();
!         /* switch to the Executor memory context */
          _SPI_execmem();
+     }

      return 0;
  }
*************** _SPI_begin_call(bool execmem)
*** 2462,2475 ****
  /*
   * _SPI_end_call: end a SPI operation within a connected procedure
   *
   * Note: this currently has no failure return cases, so callers don't check
   */
  static int
! _SPI_end_call(bool procmem)
  {
!     if (procmem)                /* switch to the procedure memory context */
      {
          _SPI_procmem();
          /* and free Executor memory */
          MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);
      }
--- 2477,2495 ----
  /*
   * _SPI_end_call: end a SPI operation within a connected procedure
   *
+  * use_exec must be the same as in the previous _SPI_begin_call
+  *
   * Note: this currently has no failure return cases, so callers don't check
   */
  static int
! _SPI_end_call(bool use_exec)
  {
!     if (use_exec)
      {
+         /* switch to the procedure memory context */
          _SPI_procmem();
+         /* mark Executor context no longer in use */
+         _SPI_current->execSubid = InvalidSubTransactionId;
          /* and free Executor memory */
          MemoryContextResetAndDeleteChildren(_SPI_current->execCxt);
      }
diff --git a/src/include/executor/spi_priv.h b/src/include/executor/spi_priv.h
index ba7fb98..8fae755 100644
*** a/src/include/executor/spi_priv.h
--- b/src/include/executor/spi_priv.h
*************** typedef struct
*** 26,31 ****
--- 26,34 ----
      Oid            lastoid;
      SPITupleTable *tuptable;    /* tuptable currently being built */

+     /* subtransaction in which current Executor call was started */
+     SubTransactionId execSubid;
+
      /* resources of this execution context */
      slist_head    tuptables;        /* list of all live SPITupleTables */
      MemoryContext procCxt;        /* procedure context */

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

От
Tom Lane
Дата:
I wrote:
> I thought of a better way, as attached.

Pushed.  Ben, could you confirm that the committed patch fixes your
original use-case?  The 9.5 version of the patch is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=13d2ed921035f2d88adf87d796373e920bdd56ee
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Ben Chobot
Дата:
> On Oct 6, 2017, at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> I thought of a better way, as attached.
>
> Pushed.  Ben, could you confirm that the committed patch fixes your
> original use-case?  The 9.5 version of the patch is at
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=13d2ed921035f2d88adf87d796373e920bdd56ee

Not quickly, to be honest. But the test case is not hard. If you can do this without crashing, I'm convinced you've
fixedthe problem as I've seen it: 


SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');

CREATE TABLE public.foo(i int);

insert into public.foo(i) values(1);

CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS pg_lsn AS
$$BEGIN    return location from pg_logical_slot_peek_changes(slot,null,1) limit 1;END
$$ language plpgsql;

select logical_replication_slot_lsn_delta('regression_slot');



Thanks for the fast response, and for the amazing code that is Postgres!

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-06 21:24:34 -0700, Ben Chobot wrote:
> 
> > On Oct 6, 2017, at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > I wrote:
> >> I thought of a better way, as attached.
> > 
> > Pushed.  Ben, could you confirm that the committed patch fixes your
> > original use-case?  The 9.5 version of the patch is at
> > 
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=13d2ed921035f2d88adf87d796373e920bdd56ee
> 
> Not quickly, to be honest. But the test case is not hard. If you can do this without crashing, I'm convinced you've
fixedthe problem as I've seen it:
 
> 
> 
> SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
> 
> CREATE TABLE public.foo(i int);
> 
> insert into public.foo(i) values(1);
> 
> CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS pg_lsn AS
> $$
>  BEGIN
>      return location from pg_logical_slot_peek_changes(slot,null,1) limit 1;
>  END
> $$ language plpgsql;
> 
> select logical_replication_slot_lsn_delta('regression_slot');

Yep, that one's definitely fixed now.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
Hi,

On 2017-10-04 13:59:20 -0700, Ben Chobot wrote:
> postgres=# CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS pg_lsn AS
> postgres-# $$
> postgres$# BEGIN
> postgres$#     return location from pg_logical_slot_peek_changes(slot,null,1) limit 1;
> postgres$# END
> postgres$# $$ language plpgsql;

As the issue is fixed now, I just want to mention that looking at
logical decoding output via the SQL interface, especially when doing it
in very small increments as you're suggesting here, is way much more
expensive than continually streaming changes via the replication
protocol. In a lot of cases it'll be orders of magnitude more expensive.
So if you can change your usecase to use that, you'll benefit. It also
avoids having to change between peek/get, because you can just send back
messages specifying up to where you've processed changes safely.

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Ben Chobot
Дата:
On Oct 6, 2017, at 9:31 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-10-04 13:59:20 -0700, Ben Chobot wrote:
postgres=# CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS pg_lsn AS
postgres-# $$
postgres$# BEGIN
postgres$#     return location from pg_logical_slot_peek_changes(slot,null,1) limit 1;
postgres$# END
postgres$# $$ language plpgsql;

As the issue is fixed now, I just want to mention that looking at
logical decoding output via the SQL interface, especially when doing it
in very small increments as you're suggesting here, is way much more
expensive than continually streaming changes via the replication
protocol. In a lot of cases it'll be orders of magnitude more expensive.
So if you can change your usecase to use that, you'll benefit. It also
avoids having to change between peek/get, because you can just send back
messages specifying up to where you've processed changes safely.

Oh, for sure, and understood. When we actually pull data from the slot, we'll be doing it via the streaming interface. This function is reduced from what it was originally intended to be, which was an infrequent check to an alerting system to make sure nobody had stopped consuming data from their logical replication slot. FWIW, what we ended up with was this SQL function, which would have been a little easier to follow in pl/pgsql, but works just fine in this form:

CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS numeric AS
$$
  select pg_current_xlog_location()-
  case when active then
    (select flush_location from pg_stat_replication where pid=active_pid)
  else
    (select location from pg_logical_slot_peek_changes($1,null,1) union
    select pg_current_xlog_location() order by location limit 1)
  end
  from pg_replication_slots where slot_name=$1;
$$ language sql security definer;

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-06 21:40:06 -0700, Ben Chobot wrote:
> Oh, for sure, and understood. When we actually pull data from the
> slot, we'll be doing it via the streaming interface.

Ah, good ;)

> This function is reduced from what it was originally intended to be, which was an infrequent check to an alerting
systemto make sure nobody had stopped consuming data from their logical replication slot. FWIW, what we ended up with
wasthis SQL function, which would have been a little easier to follow in pl/pgsql, but works just fine in this form:
 
> 
> CREATE OR REPLACE FUNCTION logical_replication_slot_lsn_delta(slot text) RETURNS numeric AS
> $$
>   select pg_current_xlog_location()-
>   case when active then
>     (select flush_location from pg_stat_replication where pid=active_pid)
>   else
>     (select location from pg_logical_slot_peek_changes($1,null,1) union
>     select pg_current_xlog_location() order by location limit 1)
>   end
>   from pg_replication_slots where slot_name=$1;
> $$ language sql security definer;

Why don't you just look at pg_replication_slots.confirmed_flush_lsn?

Greetings,

Andres Freund


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Ben Chobot
Дата:


On Oct 6, 2017, at 9:43 PM, Andres Freund <andres@anarazel.de> wrote:

Why don't you just look at pg_replication_slots.confirmed_flush_lsn?

Because I don't seem to have that in 9.5? :)

Re: [BUGS] pg_logical_slot_peek_changes crashes postgres when calledfrom inside pl/pgsql

От
Andres Freund
Дата:
On 2017-10-06 21:47:08 -0700, Ben Chobot wrote:
> > On Oct 6, 2017, at 9:43 PM, Andres Freund <andres@anarazel.de> wrote:
> > Why don't you just look at pg_replication_slots.confirmed_flush_lsn?
> 
> Because I don't seem to have that in 9.5? :)

Isn't that release, like, from 1997? :)


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs