Обсуждение: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

Поиск
Список
Период
Сортировка

[BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
marko@joh.to
Дата:
The following bug has been logged on the website:

Bug reference:      14830
Logged by:          Marko Tiikkaja
Email address:      marko@joh.to
PostgreSQL version: Unsupported/Unknown
Operating system:   Ubuntu 14.04
Description:

Hey,

I understand this is not much information to go on (but the problem is
extremely difficult to reproduce), and that 9.1 is technically out of
support (but I don't think the relevant code has changed significantly,
either), so I fully expect that nobody will be able to figure out what's
wrong based on that.  But I thought I'd post anyway.

For the past two days I've been tracking down a bug where it would appear
that some NOTIFications are simply lost.  Then a minute later when the
notification is resent by a different transaction, it comes through just
fine.  We have a single program connected to the database all the time,
which LISTENs on around 800 channels and delivers the notifications to its
own clients.  The problem seems to only start happening, or perhaps gets
worse the longer this application is connected to the database.

I'm attaching two excerpts from the strace which, if I'm reading this
correctly, would suggest that there's a bug in postgres here.  Here's how I
read this:
 1) In strace2.txt, the send on line #1 corresponds to 28:3486 in
strace1.txt.  I know this because notification payloads on that channel are
unique. 2) In strace2.txt, on line #5 something slightly out of the ordinary
happens.  We have around 75 semop calls compared to 5400 semop calls in the
full strace, so no biggie, but perhaps noteworthy.  Contention with another
backend, perhaps. 3) The send on line #6 seems to correspond to 28:3600 in strace1.txt. 4) Then here's where the
problemseems to occur: the next send, on line 
25, corresponds to 28:4458 in strace1.txt.

Within that ~850 bytes that the sending backend seemingly jumped over, we
have multiple notifications on channels we know the backend was listening
on.  That's including a notification on channel "workerid48101842", which is
the one our application was desperately missing in this case.  PostgreSQL's
logs and the state of the database indicate that at least the transaction
which wrote the "workerid48101842" notification committed, and I have no
reason to believe that any of the other ones near it did not commit.

So.. any ideas?  Unfortunately I can't reproduce this in an isolated
environment, and in production this seems to be taking some time before it
builds up into a proper issue.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Marko Tiikkaja
Дата:
Here are said attachments.


.m
Вложения

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Michael Paquier
Дата:
On Wed, Sep 27, 2017 at 3:29 AM,  <marko@joh.to> wrote:
> So.. any ideas?  Unfortunately I can't reproduce this in an isolated
> environment, and in production this seems to be taking some time before it
> builds up into a proper issue.

Hm. Could it be some side-effect from 2bbe8a68? This has been
backpatched on all branches and it is part of 9.1.24.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Marko Tiikkaja
Дата:
On Tue, Oct 3, 2017 at 5:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Sep 27, 2017 at 3:29 AM,  <marko@joh.to> wrote:
> So.. any ideas?  Unfortunately I can't reproduce this in an isolated
> environment, and in production this seems to be taking some time before it
> builds up into a proper issue.

Hm. Could it be some side-effect from 2bbe8a68? This has been
backpatched on all branches and it is part of 9.1.24.

Unsure.  I can at least reproduce this with only one session ever listening on anything.

I've produced a test case[1] for this which matches roughly what the code does in production.  The code isn't the most pretty code out there, but basically what it does is that it has one session LISTENing on a channel, and 24 sessions sending messages with a prefix in order, so for  example:

   session 1 sends A_1, A_2, A_3, etc.
   session 2 sends B_1, B_2, B_3, ...

and the listener has a map recording what the last received number is for each prefix, checking that all notifications are received and in the right order.

After running it for a few days I start getting logged messages such as:

  out of order notification Q_97882353: 97882353 != 97882349 + 1 (prefix Q)
  out of order notification F_97947433: 97947433 != 97947429 + 1 (prefix F)
  out of order notification F_97947439: 97947439 != 97947436 + 1 (prefix F)

I did it on both 9.1.24 and 9.6.5 and they both exhibit the same behavior: it takes days to get into this state, but then notifications are missed all the time.  I currently have both systems in this state, so any idea what to look at to try and debug this further?


.m

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
Marko Tiikkaja <marko@joh.to> writes:
> After running it for a few days I start getting logged messages such as:

>   out of order notification Q_97882353: 97882353 != 97882349 + 1 (prefix Q)
>   out of order notification F_97947433: 97947433 != 97947429 + 1 (prefix F)
>   out of order notification F_97947439: 97947439 != 97947436 + 1 (prefix F)

> I did it on both 9.1.24 and 9.6.5 and they both exhibit the same behavior:
> it takes days to get into this state, but then notifications are missed all
> the time.  I currently have both systems in this state, so any idea what to
> look at to try and debug this further?

You might try gdb'ing the recipient and stepping through
asyncQueueProcessPageEntries to see what happens.  Are the missing
entries present in the queue but it decides to ignore them for some
reason, or are they just not there?

An interesting black-box test might be to do this with two receiver
processes and see if they miss identical sets of messages.  That
would be a different way of triangulating on question number 1,
which is whether the sender or the recipient is at fault.

I wonder whether the long ramp-up time indicates that you have to
wrap around some counter somewhere before things go south.  Although
the only obvious candidate is wrapping the pg_notify SLRU queue,
and I'd think that would have happened many times already.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Marko Tiikkaja
Дата:
So I managed to accidentally kill and/or restart both servers while trying to install debug symbols, but I'm doing a new run now and I noticed something interesting: the listening backend's RecentXmin doesn't seem to ever go forward.  By my reading of this code, that would mean trouble for this piece of code in TransactionIdIsInProgress:

  if (TransactionIdPrecedes(xid, RecentXmin))
      return false;

Is this not an obvious bug?  async.c would think the transaction is not running, and did not commit -> throw away the notification.  That would also explain why it takes days for this to happen (since we need to run through enough xids to wrap around).


.m

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
Marko Tiikkaja <marko@joh.to> writes:
> So I managed to accidentally kill and/or restart both servers while trying
> to install debug symbols, but I'm doing a new run now and I noticed
> something interesting: the listening backend's RecentXmin doesn't seem to
> ever go forward.  By my reading of this code, that would mean trouble for
> this piece of code in TransactionIdIsInProgress:

>   if (TransactionIdPrecedes(xid, RecentXmin))
>       return false;

Hmm ... I suppose it's possible that that happens if the listening
backend isn't executing any SQL commands but is just sitting.
While that might describe your test harness, does it describe any
real-world application?
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Marko Tiikkaja <marko@joh.to> writes:
> > So I managed to accidentally kill and/or restart both servers while trying
> > to install debug symbols, but I'm doing a new run now and I noticed
> > something interesting: the listening backend's RecentXmin doesn't seem to
> > ever go forward.  By my reading of this code, that would mean trouble for
> > this piece of code in TransactionIdIsInProgress:
> 
> >   if (TransactionIdPrecedes(xid, RecentXmin))
> >       return false;
> 
> Hmm ... I suppose it's possible that that happens if the listening
> backend isn't executing any SQL commands but is just sitting.
> While that might describe your test harness, does it describe any
> real-world application?

I think it's not totally unreasonable to have processes sitting idle for
long periods of time.  One example: a pooler configured to have more
connections that are actually needed most of the time (I'm fairly sure
I've seen this).  Would they not recompute RecentXmin if they did a
sinval reset?  Also, a listener daemon for which notifications are very
infrequent.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Marko Tiikkaja
Дата:
On Tue, Oct 10, 2017 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
> So I managed to accidentally kill and/or restart both servers while trying
> to install debug symbols, but I'm doing a new run now and I noticed
> something interesting: the listening backend's RecentXmin doesn't seem to
> ever go forward.  By my reading of this code, that would mean trouble for
> this piece of code in TransactionIdIsInProgress:

>   if (TransactionIdPrecedes(xid, RecentXmin))
>       return false;

Hmm ... I suppose it's possible that that happens if the listening
backend isn't executing any SQL commands but is just sitting.
While that might describe your test harness, does it describe any
real-world application?

Yes.  It's a service which only does LISTEN and UNLISTEN and forwards notifications to its clients.


.m

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> Hmm ... I suppose it's possible that that happens if the listening
>> backend isn't executing any SQL commands but is just sitting.
>> While that might describe your test harness, does it describe any
>> real-world application?

> I think it's not totally unreasonable to have processes sitting idle for
> long periods of time.

Sure, but if they're truly idle, they aren't looking at RecentXmin ;-)

> Would they not recompute RecentXmin if they did a
> sinval reset?

Not sure, but if not, maybe making that code path update it would be
a suitable fix in practice?  I'm a bit hesitant to put a GetSnapshotData
call into ProcessIncomingNotify, because of the possibility of adding a
lot of contention on the ProcArray if there's very heavy notify traffic.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
I wrote:
> ... I'm a bit hesitant to put a GetSnapshotData
> call into ProcessIncomingNotify, because of the possibility of adding a
> lot of contention on the ProcArray if there's very heavy notify traffic.

Wait, I have an idea.  Let's fetch a snapshot at the top of
ProcessIncomingNotify, and then use the snapshot to decide whether
xids are running, instead of calling TransactionIdIsInProgress.

GetSnapshotData is probably a bit heavier than a single
TransactionIdIsInProgress call, but if we have to process multiple
messages then this should be a net savings in ProcArray traffic.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
I wrote:
> Wait, I have an idea.  Let's fetch a snapshot at the top of
> ProcessIncomingNotify, and then use the snapshot to decide whether
> xids are running, instead of calling TransactionIdIsInProgress.

Concretely, I suggest the attached patch.  I tried Marko's testbed
against this, and it seems indeed a bit faster than before ---
running it for 100000 notifies takes about 19.25 seconds, rather than
19.5 seconds with HEAD.  But the testbed is probably about the best
case because it has a bunch of threads sending notifies to one
receiver, so that the receiver is quite likely to have multiple
messages to read at a time.  OTOH, in a situation where the NOTIFY
traffic isn't so high, it probably hardly matters anyway.

            regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index bacc08e..bec5185 100644
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
***************
*** 137,143 ****
--- 137,145 ----
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/ps_status.h"
+ #include "utils/snapmgr.h"
  #include "utils/timestamp.h"
+ #include "utils/tqual.h"


  /*
*************** static bool SignalBackends(void);
*** 387,393 ****
  static void asyncQueueReadAllNotifications(void);
  static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer);
  static void asyncQueueAdvanceTail(void);
  static void ProcessIncomingNotify(void);
  static bool AsyncExistsPendingNotify(const char *channel, const char *payload);
--- 389,396 ----
  static void asyncQueueReadAllNotifications(void);
  static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer,
!                              Snapshot snapshot);
  static void asyncQueueAdvanceTail(void);
  static void ProcessIncomingNotify(void);
  static bool AsyncExistsPendingNotify(const char *channel, const char *payload);
*************** asyncQueueReadAllNotifications(void)
*** 1744,1749 ****
--- 1747,1753 ----
      volatile QueuePosition pos;
      QueuePosition oldpos;
      QueuePosition head;
+     Snapshot    snapshot;
      bool        advanceTail;

      /* page_buffer must be adequately aligned, so use a union */
*************** asyncQueueReadAllNotifications(void)
*** 1767,1772 ****
--- 1771,1779 ----
          return;
      }

+     /* Get snapshot we'll use to decide which xacts are still in progress */
+     snapshot = RegisterSnapshot(GetLatestSnapshot());
+
      /*----------
       * Note that we deliver everything that we see in the queue and that
       * matches our _current_ listening state.
*************** asyncQueueReadAllNotifications(void)
*** 1854,1860 ****
               * while sending the notifications to the frontend.
               */
              reachedStop = asyncQueueProcessPageEntries(&pos, head,
!                                                        page_buffer.buf);
          } while (!reachedStop);
      }
      PG_CATCH();
--- 1861,1868 ----
               * while sending the notifications to the frontend.
               */
              reachedStop = asyncQueueProcessPageEntries(&pos, head,
!                                                        page_buffer.buf,
!                                                        snapshot);
          } while (!reachedStop);
      }
      PG_CATCH();
*************** asyncQueueReadAllNotifications(void)
*** 1882,1887 ****
--- 1890,1898 ----
      /* If we were the laziest backend, try to advance the tail pointer */
      if (advanceTail)
          asyncQueueAdvanceTail();
+
+     /* Done with snapshot */
+     UnregisterSnapshot(snapshot);
  }

  /*
*************** asyncQueueReadAllNotifications(void)
*** 1903,1909 ****
  static bool
  asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer)
  {
      bool        reachedStop = false;
      bool        reachedEndOfPage;
--- 1914,1921 ----
  static bool
  asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer,
!                              Snapshot snapshot)
  {
      bool        reachedStop = false;
      bool        reachedEndOfPage;
*************** asyncQueueProcessPageEntries(volatile Qu
*** 1928,1934 ****
          /* Ignore messages destined for other databases */
          if (qe->dboid == MyDatabaseId)
          {
!             if (TransactionIdIsInProgress(qe->xid))
              {
                  /*
                   * The source transaction is still in progress, so we can't
--- 1940,1947 ----
          /* Ignore messages destined for other databases */
          if (qe->dboid == MyDatabaseId)
          {
!             if (TransactionIdIsCurrentTransactionId(qe->xid) ||
!                 XidInMVCCSnapshot(qe->xid, snapshot))
              {
                  /*
                   * The source transaction is still in progress, so we can't
*************** asyncQueueProcessPageEntries(volatile Qu
*** 1939,1948 ****
                   * this advance-then-back-up behavior when dealing with an
                   * uncommitted message.)
                   *
!                  * Note that we must test TransactionIdIsInProgress before we
!                  * test TransactionIdDidCommit, else we might return a message
!                  * from a transaction that is not yet visible to snapshots;
!                  * compare the comments at the head of tqual.c.
                   */
                  *current = thisentry;
                  reachedStop = true;
--- 1952,1965 ----
                   * this advance-then-back-up behavior when dealing with an
                   * uncommitted message.)
                   *
!                  * Note that we must test XidInMVCCSnapshot before we test
!                  * TransactionIdDidCommit, else we might return a message from
!                  * a transaction that is not yet visible to snapshots; compare
!                  * the comments at the head of tqual.c.  Furthermore, because
!                  * our own xact won't be listed in the snapshot, we must test
!                  * TransactionIdIsCurrentTransactionId as well.  (It is
!                  * possible to get here inside a transaction that has sent
!                  * notifications, see Exec_ListenPreCommit.)
                   */
                  *current = thisentry;
                  reachedStop = true;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index bbac408..28060c5 100644
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 81,88 ****
  SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
  SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};

- /* local functions */
- static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

  /*
   * SetHintBits()
--- 81,86 ----
*************** HeapTupleIsSurelyDead(HeapTuple htup, Tr
*** 1482,1488 ****
   * TransactionIdIsCurrentTransactionId first, except for known-committed
   * XIDs which could not be ours anyway.
   */
! static bool
  XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  {
      uint32        i;
--- 1480,1486 ----
   * TransactionIdIsCurrentTransactionId first, except for known-committed
   * XIDs which could not be ours anyway.
   */
! bool
  XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  {
      uint32        i;
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 9a3b56e..96eaf01 100644
*** a/src/include/utils/tqual.h
--- b/src/include/utils/tqual.h
*************** extern HTSV_Result HeapTupleSatisfiesVac
*** 78,83 ****
--- 78,84 ----
                           TransactionId OldestXmin, Buffer buffer);
  extern bool HeapTupleIsSurelyDead(HeapTuple htup,
                        TransactionId OldestXmin);
+ extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

  extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
                       uint16 infomask, TransactionId xid);

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
I wrote:
> Concretely, I suggest the attached patch.  I tried Marko's testbed
> against this, and it seems indeed a bit faster than before ---
> running it for 100000 notifies takes about 19.25 seconds, rather than
> 19.5 seconds with HEAD.  But the testbed is probably about the best
> case because it has a bunch of threads sending notifies to one
> receiver, so that the receiver is quite likely to have multiple
> messages to read at a time.  OTOH, in a situation where the NOTIFY
> traffic isn't so high, it probably hardly matters anyway.

It occurred to me that it's easy to measure the worst case, ie always
one message per notify interrupt:

$ cat bnch.txt
LISTEN foo\; NOTIFY foo;
$ pgbench -n -c 1 -T 10 -f bnch.txt

On this case it seems that the patch is circa 2% slower than HEAD,
though that's only barely above the noise level in my measurements.
I think that's probably acceptable.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Tom Lane
Дата:
I wrote:
> It occurred to me that it's easy to measure the worst case, ie always
> one message per notify interrupt:
> 
> $ cat bnch.txt
> LISTEN foo\; NOTIFY foo;
> $ pgbench -n -c 1 -T 10 -f bnch.txt
> 
> On this case it seems that the patch is circa 2% slower than HEAD,
> though that's only barely above the noise level in my measurements.

On looking at the patch again, I realized that it's not really necessary
to check TransactionIdIsCurrentTransactionId in the per-message loop.
Although it is possible to run the message-scanning code in a transaction
that's executed NOTIFY (if it also executed the session's first LISTEN),
at that point we will not yet have pushed the transaction's messages
into the queue.  Therefore there is no case where the loop can see
messages queued by its own transaction.

Although TransactionIdIsCurrentTransactionId isn't really all that
expensive, removing it brings the above test case to a statistical
tie: median of 3 60-second runs is 26054 tps in HEAD or 26034 tps
with patch.  (The cross-run variation is a couple hundred tps.)

So I'm now satisfied with the attached form of the patch, and
will proceed to make back-ported versions.

            regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index bacc08e..a93c81b 100644
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
***************
*** 137,143 ****
--- 137,145 ----
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/ps_status.h"
+ #include "utils/snapmgr.h"
  #include "utils/timestamp.h"
+ #include "utils/tqual.h"


  /*
*************** static bool SignalBackends(void);
*** 387,393 ****
  static void asyncQueueReadAllNotifications(void);
  static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer);
  static void asyncQueueAdvanceTail(void);
  static void ProcessIncomingNotify(void);
  static bool AsyncExistsPendingNotify(const char *channel, const char *payload);
--- 389,396 ----
  static void asyncQueueReadAllNotifications(void);
  static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer,
!                              Snapshot snapshot);
  static void asyncQueueAdvanceTail(void);
  static void ProcessIncomingNotify(void);
  static bool AsyncExistsPendingNotify(const char *channel, const char *payload);
*************** PreCommit_Notify(void)
*** 798,804 ****
          }
      }

!     /* Queue any pending notifies */
      if (pendingNotifies)
      {
          ListCell   *nextNotify;
--- 801,807 ----
          }
      }

!     /* Queue any pending notifies (must happen after the above) */
      if (pendingNotifies)
      {
          ListCell   *nextNotify;
*************** Exec_ListenPreCommit(void)
*** 987,993 ****
       * have already committed before we started to LISTEN.
       *
       * Note that we are not yet listening on anything, so we won't deliver any
!      * notification to the frontend.
       *
       * This will also advance the global tail pointer if possible.
       */
--- 990,998 ----
       * have already committed before we started to LISTEN.
       *
       * Note that we are not yet listening on anything, so we won't deliver any
!      * notification to the frontend.  Also, although our transaction might
!      * have executed NOTIFY, those message(s) aren't queued yet so we can't
!      * see them in the queue.
       *
       * This will also advance the global tail pointer if possible.
       */
*************** asyncQueueReadAllNotifications(void)
*** 1744,1749 ****
--- 1749,1755 ----
      volatile QueuePosition pos;
      QueuePosition oldpos;
      QueuePosition head;
+     Snapshot    snapshot;
      bool        advanceTail;

      /* page_buffer must be adequately aligned, so use a union */
*************** asyncQueueReadAllNotifications(void)
*** 1767,1772 ****
--- 1773,1781 ----
          return;
      }

+     /* Get snapshot we'll use to decide which xacts are still in progress */
+     snapshot = RegisterSnapshot(GetLatestSnapshot());
+
      /*----------
       * Note that we deliver everything that we see in the queue and that
       * matches our _current_ listening state.
*************** asyncQueueReadAllNotifications(void)
*** 1854,1860 ****
               * while sending the notifications to the frontend.
               */
              reachedStop = asyncQueueProcessPageEntries(&pos, head,
!                                                        page_buffer.buf);
          } while (!reachedStop);
      }
      PG_CATCH();
--- 1863,1870 ----
               * while sending the notifications to the frontend.
               */
              reachedStop = asyncQueueProcessPageEntries(&pos, head,
!                                                        page_buffer.buf,
!                                                        snapshot);
          } while (!reachedStop);
      }
      PG_CATCH();
*************** asyncQueueReadAllNotifications(void)
*** 1882,1887 ****
--- 1892,1900 ----
      /* If we were the laziest backend, try to advance the tail pointer */
      if (advanceTail)
          asyncQueueAdvanceTail();
+
+     /* Done with snapshot */
+     UnregisterSnapshot(snapshot);
  }

  /*
*************** asyncQueueReadAllNotifications(void)
*** 1903,1909 ****
  static bool
  asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer)
  {
      bool        reachedStop = false;
      bool        reachedEndOfPage;
--- 1916,1923 ----
  static bool
  asyncQueueProcessPageEntries(volatile QueuePosition *current,
                               QueuePosition stop,
!                              char *page_buffer,
!                              Snapshot snapshot)
  {
      bool        reachedStop = false;
      bool        reachedEndOfPage;
*************** asyncQueueProcessPageEntries(volatile Qu
*** 1928,1934 ****
          /* Ignore messages destined for other databases */
          if (qe->dboid == MyDatabaseId)
          {
!             if (TransactionIdIsInProgress(qe->xid))
              {
                  /*
                   * The source transaction is still in progress, so we can't
--- 1942,1948 ----
          /* Ignore messages destined for other databases */
          if (qe->dboid == MyDatabaseId)
          {
!             if (XidInMVCCSnapshot(qe->xid, snapshot))
              {
                  /*
                   * The source transaction is still in progress, so we can't
*************** asyncQueueProcessPageEntries(volatile Qu
*** 1939,1948 ****
                   * this advance-then-back-up behavior when dealing with an
                   * uncommitted message.)
                   *
!                  * Note that we must test TransactionIdIsInProgress before we
!                  * test TransactionIdDidCommit, else we might return a message
!                  * from a transaction that is not yet visible to snapshots;
!                  * compare the comments at the head of tqual.c.
                   */
                  *current = thisentry;
                  reachedStop = true;
--- 1953,1967 ----
                   * this advance-then-back-up behavior when dealing with an
                   * uncommitted message.)
                   *
!                  * Note that we must test XidInMVCCSnapshot before we test
!                  * TransactionIdDidCommit, else we might return a message from
!                  * a transaction that is not yet visible to snapshots; compare
!                  * the comments at the head of tqual.c.
!                  *
!                  * Also, while our own xact won't be listed in the snapshot,
!                  * we need not check for TransactionIdIsCurrentTransactionId
!                  * because our transaction cannot (yet) have queued any
!                  * messages.
                   */
                  *current = thisentry;
                  reachedStop = true;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index bbac408..28060c5 100644
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***************
*** 81,88 ****
  SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
  SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};

- /* local functions */
- static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

  /*
   * SetHintBits()
--- 81,86 ----
*************** HeapTupleIsSurelyDead(HeapTuple htup, Tr
*** 1482,1488 ****
   * TransactionIdIsCurrentTransactionId first, except for known-committed
   * XIDs which could not be ours anyway.
   */
! static bool
  XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  {
      uint32        i;
--- 1480,1486 ----
   * TransactionIdIsCurrentTransactionId first, except for known-committed
   * XIDs which could not be ours anyway.
   */
! bool
  XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
  {
      uint32        i;
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 9a3b56e..96eaf01 100644
*** a/src/include/utils/tqual.h
--- b/src/include/utils/tqual.h
*************** extern HTSV_Result HeapTupleSatisfiesVac
*** 78,83 ****
--- 78,84 ----
                           TransactionId OldestXmin, Buffer buffer);
  extern bool HeapTupleIsSurelyDead(HeapTuple htup,
                        TransactionId OldestXmin);
+ extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

  extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
                       uint16 infomask, TransactionId xid);

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

От
Marko Tiikkaja
Дата:
On Wed, Oct 11, 2017 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So I'm now satisfied with the attached form of the patch, and
will proceed to make back-ported versions.

Thank you very much!  I'll let the test case run against patched master for a few days just to make sure we didn't miss anything obvious.


.m