Обсуждение: NOTIFY does not work as expected

Поиск
Список
Период
Сортировка

NOTIFY does not work as expected

От
Andrey
Дата:
PostgreSQL 9.6.9, 10.4 (broken):
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
* notification received:
  server process id: 2837
  channel: test
  payload: test2

PostgreSQL 9.6.2 and earlier (workds as expected)
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test2

----------------
* This is not client-side problem - I've verified it via Wireshark.
** I've caught the bug on Postres Pro Std builds, but according to the fact that these builds are mostly identical, the bug is related to both of them.

Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
PostgreSQL 9.6.9, 10.4 (broken):
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
* notification received:
  server process id: 2837
  channel: test
  payload: test2

PostgreSQL 9.6.2 and earlier (workds as expected)
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test2

I don't think this is a bug.   I don't see that the docs promise one behavior over the other, so it is really a dealer's choice.  Also, I can't reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Andrey
Дата:
man: "... if a listening session receives a notification signal while it is within a transaction, the notification event will not be delivered to its connected client until _just after_ the transaction is completed (either committed or aborted)."
Expected behavior is to deliver notification after pg_sleep is finished. Currently the one may hold opened connection (being idle and listening socket) for a long time, close it and never deliver the notification (if was busy while being notified).
All delivering problems were about overflowed queue, merging notifications and transactions handling. If we must not rely on delivering at all, then NOTIFY makes no sense.
вт, 3 июля 2018 г. в 3:37, Jeff Janes <jeff.janes@gmail.com>:
On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
PostgreSQL 9.6.9, 10.4 (broken):
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
* notification received:
  server process id: 2837
  channel: test
  payload: test2

PostgreSQL 9.6.2 and earlier (workds as expected)
A: listen test;
A: select pg_sleep(5);
    1
    2
B: notify test, 'test1';
    3
    4
    5
A: done
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test
    6
    7
    8
    9
B: notify test, 'test2';
A:
* notification received:
  server process id: 2837
  channel: test
  payload: test2

I don't think this is a bug.   I don't see that the docs promise one behavior over the other, so it is really a dealer's choice.  Also, I can't reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
>> [ delayed receipt of notifications ]

> I don't think this is a bug.   I don't see that the docs promise one
> behavior over the other, so it is really a dealer's choice.  Also, I can't
> reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

FWIW, it looks like a bug to me.  I don't have time to investigate
further right now, though.  It's also not at all clear whether the
issue is in the server or libpq or psql ...

            regards, tom lane


Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Tue, Jul 3, 2018 at 1:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
>> [ delayed receipt of notifications ]

> I don't think this is a bug.   I don't see that the docs promise one
> behavior over the other, so it is really a dealer's choice.  Also, I can't
> reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

FWIW, it looks like a bug to me.  I don't have time to investigate
further right now, though.  It's also not at all clear whether the
issue is in the server or libpq or psql ...

In my hands, it bisects down to this:

commit 4f85fde8eb860f263384fffdca660e16e77c7f76
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Feb 3 22:25:20 2015 +0100

    Introduce and use infrastructure for interrupt processing during client reads.

But that was committed in 9.5.dev, not between 9.6.2 and 9.6.9.

It is on the server side.  This is testing with psql, and it doesn't seem to matter which version of it.  Maybe there is something between 9.6.2 and 9.6.9 that shows up with another client or more.

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Tue, Jul 3, 2018 at 12:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 1:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
>> [ delayed receipt of notifications ]

> I don't think this is a bug.   I don't see that the docs promise one
> behavior over the other, so it is really a dealer's choice.  Also, I can't
> reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

FWIW, it looks like a bug to me.  I don't have time to investigate
further right now, though.  It's also not at all clear whether the
issue is in the server or libpq or psql ...

In my hands, it bisects down to this:

commit 4f85fde8eb860f263384fffdca660e16e77c7f76
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Feb 3 22:25:20 2015 +0100

    Introduce and use infrastructure for interrupt processing during client reads.

But that was committed in 9.5.dev, not between 9.6.2 and 9.6.9.

It is on the server side.  This is testing with psql, and it doesn't seem to matter which version of it.  Maybe there is something between 9.6.2 and 9.6.9 that shows up with another client or more.

Further diagnosis here is that in the "working" case the client receives a single packet from the server containing both the pg_sleep response, and async response, in that order, and the client processes both of them.  In the "broken" case, the client receives a single packet from the server containing the pg_sleep response, and processes it, and then blocks on user input.  The async response is immediately available in the next packet if the client would ask for it, but the client doesn't do so. 

If I am diagnosing the right problem, this still doesn't seem like a bug to me.

Andey, what is it that you saw in Wireshark?

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> Further diagnosis here is that in the "working" case the client receives a
> single packet from the server containing both the pg_sleep response, and
> async response, in that order, and the client processes both of them.  In
> the "broken" case, the client receives a single packet from the server
> containing the pg_sleep response, and processes it, and then blocks on user
> input.  The async response is immediately available in the next packet if
> the client would ask for it, but the client doesn't do so.

This suggests that 4f85fde8e introduced an extra output-flush operation
into the code path, ie it must be flushing the output buffer to the client
after ReadyForQuery and then again after emitting the Notify.

> If I am diagnosing the right problem, this still doesn't seem like a bug to
> me.

Well, it seems undesirable to me, both because it implies network traffic
inefficiency and because clients don't seem to be expecting it.

We have another recent complaint that seems to be possibly the
same thing, bug #15255.

            regards, tom lane


Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
On 2018-07-03 14:27:04 -0400, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > Further diagnosis here is that in the "working" case the client receives a
> > single packet from the server containing both the pg_sleep response, and
> > async response, in that order, and the client processes both of them.  In
> > the "broken" case, the client receives a single packet from the server
> > containing the pg_sleep response, and processes it, and then blocks on user
> > input.  The async response is immediately available in the next packet if
> > the client would ask for it, but the client doesn't do so.
> 
> This suggests that 4f85fde8e introduced an extra output-flush operation
> into the code path, ie it must be flushing the output buffer to the client
> after ReadyForQuery and then again after emitting the Notify.

Hm. There's indeed a

    /*
     * Must flush the notify messages to ensure frontend gets them promptly.
     */
    pq_flush();

in ProcessIncomingNotify(). But that was there before, too. And I don't
see any argument why it'd be a good idea to remove it?


> > If I am diagnosing the right problem, this still doesn't seem like a bug to
> > me.
> 
> Well, it seems undesirable to me, both because it implies network traffic
> inefficiency and because clients don't seem to be expecting it.

A report after ~3 years doesn't strike me as a huge argument for that,
and it doesn't seem crazy to believe it'd hurt some users changing
that. And when would you avoid flushing?


> We have another recent complaint that seems to be possibly the
> same thing, bug #15255.

That seems more related to the logical replication apply code than
anything?

Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Tue, Jul 3, 2018 at 12:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 12:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 1:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Jul 2, 2018 at 4:33 PM, Andrey <parihaaraka@gmail.com> wrote:
>> [ delayed receipt of notifications ]

> I don't think this is a bug.   I don't see that the docs promise one
> behavior over the other, so it is really a dealer's choice.  Also, I can't
> reliably reproduce the reported 9.6.2 behavior on my own 9.6.2 server.

FWIW, it looks like a bug to me.  I don't have time to investigate
further right now, though.  It's also not at all clear whether the
issue is in the server or libpq or psql ...

In my hands, it bisects down to this:

commit 4f85fde8eb860f263384fffdca660e16e77c7f76
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Feb 3 22:25:20 2015 +0100

    Introduce and use infrastructure for interrupt processing during client reads.

But that was committed in 9.5.dev, not between 9.6.2 and 9.6.9.

It is on the server side.  This is testing with psql, and it doesn't seem to matter which version of it.  Maybe there is something between 9.6.2 and 9.6.9 that shows up with another client or more.

Further diagnosis here is that in the "working" case the client receives a single packet from the server containing both the pg_sleep response, and async response, in that order, and the client processes both of them.  In the "broken" case, the client receives a single packet from the server containing the pg_sleep response, and processes it, and then blocks on user input.  The async response is immediately available in the next packet if the client would ask for it, but the client doesn't do so.

It looks like I was wrong here.  The 2nd packet with the async message is not generally sent immediately, the server backend can hold it up until the next time it either hears from the frontend, or until it gets a SIGUSR1 due to another incoming NOTIFY.

But I still see this undesired behavior showing up in 9.5dev in the mentioned commit, not showing up between 9.6.2 and 9.6.9.

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Tue, Jul 3, 2018 at 9:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 12:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 12:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

In my hands, it bisects down to this:

commit 4f85fde8eb860f263384fffdca660e16e77c7f76
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Feb 3 22:25:20 2015 +0100

    Introduce and use infrastructure for interrupt processing during client reads.

But that was committed in 9.5.dev, not between 9.6.2 and 9.6.9.

It is on the server side.  This is testing with psql, and it doesn't seem to matter which version of it.  Maybe there is something between 9.6.2 and 9.6.9 that shows up with another client or more.

Further diagnosis here is that in the "working" case the client receives a single packet from the server containing both the pg_sleep response, and async response, in that order, and the client processes both of them.  In the "broken" case, the client receives a single packet from the server containing the pg_sleep response, and processes it, and then blocks on user input.  The async response is immediately available in the next packet if the client would ask for it, but the client doesn't do so.

It looks like I was wrong here.  The 2nd packet with the async message is not generally sent immediately, the server backend can hold it up until the next time it either hears from the frontend, or until it gets a SIGUSR1 due to another incoming NOTIFY.

But I still see this undesired behavior showing up in 9.5dev in the mentioned commit, not showing up between 9.6.2 and 9.6.9.

Andrey confirmed to me off list that he was mistaken about it working the way he expected in 9.6.2, the issue started earlier.

Reading through the comments touched by the commit, it seems obvious what the bug is.  It says "cause the processing to occur just before we next go idle", but also says "This is called just *after* waiting for a frontend command", which is too late to be "before we next go idle"

I've attached a simple proof-of-concept patch which calls ProcessClientReadInterrupt() before the client read is started so it can process pre-existing interrupt flags, not just when it is interrupted during the read or when the read is done.  (That does seem to render the name of the function inapt).  

I don't know if this is the correct way to fix it, it is just a POC.  It looks like cache invalidation catch-up demands might also be involved in the delay.

With this patch, the async arrives right after the pg_sleep response, but it is still in a second packet.  That means it doesn't fix the apparent regression when viewed with psql, as psql doesn't process the 2nd packet right away.  So I instead tested it with this Perl script:

perl -le 'use DBI; my $dbh=DBI->connect("dbi:Pg:port=9999;host=localhost"); $dbh->do("listen test"); $dbh->do("select pg_sleep(5)"); warn "done with sleep"; while (true) { $r=$dbh->pg_notifies(); warn join "\t", @$r if $r}'
 
Without my patch, notices sent during the pg_sleep will not get delivered until another notice is sent after the sleep is over.  With the they patch, they get delivered as soon as the pg_sleep is done.

Cheers

Jeff
Вложения

Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
Hi,


On 2018-07-04 08:50:12 -0400, Jeff Janes wrote:
> Reading through the comments touched by the commit, it seems obvious what
> the bug is.  It says "cause the processing to occur just before we next go
> idle", but also says "This is called just *after* waiting for a frontend
> command", which is too late to be "before we next go idle"

I've not looked at this issue in depth yet. So I might be completely off
base.  But I'm confused by your comment - we're doing it *after*,
because we do a non-blocking read. And the latch will notify us
(event.events & WL_LATCH_SET) if there was a read.

Are you arguing that we should also notify in cases where we actually
never become idle? I'm not sure that's particularly meaningful, given
there's no guarantee that that actually works, because we could just
have read multiple commands from the client?

Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Jeff Janes
Дата:
On Wed, Jul 4, 2018 at 2:30 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,


On 2018-07-04 08:50:12 -0400, Jeff Janes wrote:
> Reading through the comments touched by the commit, it seems obvious what
> the bug is.  It says "cause the processing to occur just before we next go
> idle", but also says "This is called just *after* waiting for a frontend
> command", which is too late to be "before we next go idle"

I've not looked at this issue in depth yet. So I might be completely off
base.  But I'm confused by your comment - we're doing it *after*,
because we do a non-blocking read. And the latch will notify us
(event.events & WL_LATCH_SET) if there was a read.

We get a signal while we are busy.  We set the flag and the latch.  The latch 
wakes us up, but since we are busy (in a transaction, specifically) we don't
do anything.  Later, the transaction ends and we are about to go idle, but 
no one checks the flag again.  We start a read using the latch mechanism, but 
the flag notifyInterruptPending being set true from a now-long-gone signal is not on
the list of things the latch wakes us for.  It is technically a non-blocking read but from 
the perspective if the pending notify message it is a blocking read, unless another
signal comes along and rescues us. 

Cheers,

Jeff

Re: NOTIFY does not work as expected

От
Andrey
Дата:
чт, 5 июл. 2018 г. в 1:11, Jeff Janes <jeff.janes@gmail.com>:
On Wed, Jul 4, 2018 at 2:30 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,


On 2018-07-04 08:50:12 -0400, Jeff Janes wrote:
> Reading through the comments touched by the commit, it seems obvious what
> the bug is.  It says "cause the processing to occur just before we next go
> idle", but also says "This is called just *after* waiting for a frontend
> command", which is too late to be "before we next go idle"

I've not looked at this issue in depth yet. So I might be completely off
base.  But I'm confused by your comment - we're doing it *after*,
because we do a non-blocking read. And the latch will notify us
(event.events & WL_LATCH_SET) if there was a read.

We get a signal while we are busy.  We set the flag and the latch.  The latch 
wakes us up, but since we are busy (in a transaction, specifically) we don't
do anything.  Later, the transaction ends and we are about to go idle, but 
no one checks the flag again.  We start a read using the latch mechanism, but 
the flag notifyInterruptPending being set true from a now-long-gone signal is not on
the list of things the latch wakes us for.  It is technically a non-blocking read but from 
the perspective if the pending notify message it is a blocking read, unless another
signal comes along and rescues us. 

Cheers,

Jeff

Hello. I beg your pardon, but the problem is still in 10.5. May we expect it to be fixed in 11?
Thanks.

Regards,
Andrey L 

Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Andrey <parihaaraka@gmail.com> writes:
> Hello. I beg your pardon, but the problem is still in 10.5. May we expect
> it to be fixed in 11?

Nope :-(.  However, I got around to looking at this problem, and I concur
with Jeff's diagnosis: the code around ProcessClientReadInterrupt is
buggy because it does not account for the possibility that the process
latch was cleared some time ago while unhandled interrupt-pending flags
remain set.  There are some other issues too:

1. ProcessClientWriteInterrupt has the same problem.

2. I don't believe the "blocked" vs "not-blocked" distinction one bit.
At best, it creates race-condition-like changes in behavior depending
on exactly when a signal arrives vs when data arrives or is sent.
At worst, I think it creates the same problem it's purporting to solve,
ie failure to respond to ProcDiePending at all.  I think the
before/during/after calls to ProcessClientXXXInterrupt should just all
behave the same and always be willing to execute ProcDiePending.

3. We've got bugs on the client side too.  The documentation is pretty
clear that libpq users ought to call PQconsumeInput before PQnotifies,
but psql had not read the manual at all.  Also, most callers were
calling PQconsumeInput only once and then looping on PQnotifies, which
assumes not-very-safely that we could only see at most one TCP packet
worth of notify messages at a time.  That's even less safe now that
we have "payload" strings than it was before.  So we ought to adjust
the code and documentation to recommend doing another PQconsumeInput
inside the loop.  (Congratulations to dblink for getting this right.)

In short, I think we need something like the attached.  With these
patches, psql consistently reports the notification promptly (for
me anyway).

            regards, tom lane

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d349d7c..2880736 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,6 +145,16 @@ secure_read(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /*
+     * We might already have a pending interrupt condition to deal with.  If
+     * the process latch is set, that would cause us to fall through the read
+     * and handle the condition below --- but the latch might have been
+     * cleared since the condition interrupt happened.  This is cheap enough
+     * (if there's nothing to do) that it's not worth being too tense about
+     * avoiding it.
+     */
+    ProcessClientReadInterrupt();
+
 retry:
 #ifdef USE_SSL
     waitfor = 0;
@@ -197,7 +207,7 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientReadInterrupt(true);
+            ProcessClientReadInterrupt();

             /*
              * We'll retry the read. Most likely it will return immediately
@@ -209,11 +219,10 @@ retry:
     }

     /*
-     * Process interrupts that happened while (or before) receiving. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or hard-failed)
+     * read.
      */
-    ProcessClientReadInterrupt(false);
+    ProcessClientReadInterrupt();

     return n;
 }
@@ -248,6 +257,16 @@ secure_write(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /*
+     * We might already have a pending interrupt condition to deal with.  If
+     * the process latch is set, that would cause us to fall through the write
+     * and handle the condition below --- but the latch might have been
+     * cleared since the condition interrupt happened.  This is cheap enough
+     * (if there's nothing to do) that it's not worth being too tense about
+     * avoiding it.
+     */
+    ProcessClientWriteInterrupt();
+
 retry:
     waitfor = 0;
 #ifdef USE_SSL
@@ -283,7 +302,7 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientWriteInterrupt(true);
+            ProcessClientWriteInterrupt();

             /*
              * We'll retry the write. Most likely it will return immediately
@@ -295,11 +314,10 @@ retry:
     }

     /*
-     * Process interrupts that happened while (or before) sending. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or hard-failed)
+     * write.
      */
-    ProcessClientWriteInterrupt(false);
+    ProcessClientWriteInterrupt();

     return n;
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e4c6e3d..57f8075 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -315,7 +315,7 @@ interactive_getc(void)

     c = getc(stdin);

-    ProcessClientReadInterrupt(true);
+    ProcessClientReadInterrupt();

     return c;
 }
@@ -520,13 +520,12 @@ ReadCommand(StringInfo inBuf)
 /*
  * ProcessClientReadInterrupt() - Process interrupts specific to client reads
  *
- * This is called just after low-level reads. That might be after the read
- * finished successfully, or it was interrupted via interrupt.
+ * This is called just before and after low-level reads.
  *
  * Must preserve errno!
  */
 void
-ProcessClientReadInterrupt(bool blocked)
+ProcessClientReadInterrupt(void)
 {
     int            save_errno = errno;

@@ -543,7 +542,7 @@ ProcessClientReadInterrupt(bool blocked)
         if (notifyInterruptPending)
             ProcessNotifyInterrupt();
     }
-    else if (ProcDiePending && blocked)
+    else if (ProcDiePending)
     {
         /*
          * We're dying. It's safe (and sane) to handle that now.
@@ -557,25 +556,16 @@ ProcessClientReadInterrupt(bool blocked)
 /*
  * ProcessClientWriteInterrupt() - Process interrupts specific to client writes
  *
- * This is called just after low-level writes. That might be after the read
- * finished successfully, or it was interrupted via interrupt. 'blocked' tells
- * us whether the
+ * This is called just before and after low-level writes.
  *
  * Must preserve errno!
  */
 void
-ProcessClientWriteInterrupt(bool blocked)
+ProcessClientWriteInterrupt(void)
 {
     int            save_errno = errno;

-    /*
-     * We only want to process the interrupt here if socket writes are
-     * blocking to increase the chance to get an error message to the client.
-     * If we're not blocked there'll soon be a CHECK_FOR_INTERRUPTS(). But if
-     * we're blocked we'll never get out of that situation if the client has
-     * died.
-     */
-    if (ProcDiePending && blocked)
+    if (ProcDiePending)
     {
         /*
          * We're dying. It's safe (and sane) to handle that now. But we don't
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 63b4e48..8051d9a 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -71,8 +71,8 @@ extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
                                                                  * handler */
-extern void ProcessClientReadInterrupt(bool blocked);
-extern void ProcessClientWriteInterrupt(bool blocked);
+extern void ProcessClientReadInterrupt(void);
+extern void ProcessClientWriteInterrupt(void);

 extern void process_postgres_switches(int argc, char *argv[],
                           GucContext ctx, const char **dbname);
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 06d909e..82a4405 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5284,7 +5284,7 @@ typedef struct pgNotify
   <para>
    <function>PQnotifies</function> does not actually read data from the
    server; it just returns messages previously absorbed by another
-   <application>libpq</application> function.  In prior releases of
+   <application>libpq</application> function.  In ancient releases of
    <application>libpq</application>, the only way to ensure timely receipt
    of <command>NOTIFY</command> messages was to constantly submit commands, even
    empty ones, and then check <function>PQnotifies</function> after each
@@ -8711,6 +8711,7 @@ main(int argc, char **argv)
                     notify->relname, notify->be_pid);
             PQfreemem(notify);
             nnotifies++;
+            PQconsumeInput(conn);
         }
     }

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index b569959..62c2928 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -836,7 +836,8 @@ PrintNotifications(void)
 {
     PGnotify   *notify;

-    while ((notify = PQnotifies(pset.db)))
+    PQconsumeInput(pset.db);
+    while ((notify = PQnotifies(pset.db)) != NULL)
     {
         /* for backward compatibility, only show payload if nonempty */
         if (notify->extra[0])
@@ -847,6 +848,7 @@ PrintNotifications(void)
                     notify->relname, notify->be_pid);
         fflush(pset.queryFout);
         PQfreemem(notify);
+        PQconsumeInput(pset.db);
     }
 }

diff --git a/src/interfaces/ecpg/ecpglib/execute.c b/src/interfaces/ecpg/ecpglib/execute.c
index ff24449..42640ba 100644
--- a/src/interfaces/ecpg/ecpglib/execute.c
+++ b/src/interfaces/ecpg/ecpglib/execute.c
@@ -1722,12 +1722,13 @@ ecpg_process_output(struct statement *stmt, bool clear_result)
     }

     /* check for asynchronous returns */
-    notify = PQnotifies(stmt->connection->connection);
-    if (notify)
+    PQconsumeInput(stmt->connection->connection);
+    while ((notify = PQnotifies(stmt->connection->connection)) != NULL)
     {
         ecpg_log("ecpg_process_output on line %d: asynchronous notification of \"%s\" from backend PID %d received\n",
                  stmt->lineno, notify->relname, notify->be_pid);
         PQfreemem(notify);
+        PQconsumeInput(stmt->connection->connection);
     }

     return status;
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index e8b28d9..6aed8c8 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -2265,6 +2265,9 @@ sendFailed:
  * no unhandled async notification from the backend
  *
  * the CALLER is responsible for FREE'ing the structure returned
+ *
+ * Note that this function does not read any new data from the socket;
+ * so usually, caller should call PQconsumeInput() first.
  */
 PGnotify *
 PQnotifies(PGconn *conn)
diff --git a/src/test/examples/testlibpq2.c b/src/test/examples/testlibpq2.c
index 62ecd68..6cdf8c8 100644
--- a/src/test/examples/testlibpq2.c
+++ b/src/test/examples/testlibpq2.c
@@ -140,6 +140,7 @@ main(int argc, char **argv)
                     notify->relname, notify->be_pid);
             PQfreemem(notify);
             nnotifies++;
+            PQconsumeInput(conn);
         }
     }


Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
On 2018-10-18 18:39:34 -0400, Tom Lane wrote:
> Nope :-(.  However, I got around to looking at this problem, and I concur
> with Jeff's diagnosis: the code around ProcessClientReadInterrupt is
> buggy because it does not account for the possibility that the process
> latch was cleared some time ago while unhandled interrupt-pending flags
> remain set.  There are some other issues too:
> 
> 1. ProcessClientWriteInterrupt has the same problem.
> 
> 2. I don't believe the "blocked" vs "not-blocked" distinction one bit.
> At best, it creates race-condition-like changes in behavior depending
> on exactly when a signal arrives vs when data arrives or is sent.
> At worst, I think it creates the same problem it's purporting to solve,
> ie failure to respond to ProcDiePending at all.  I think the
> before/during/after calls to ProcessClientXXXInterrupt should just all
> behave the same and always be willing to execute ProcDiePending.

That distinction was introduced because people (IIRC you actually) were
worried that we'd be less likely to get error messages out to the
client. Especially when you check unconditionally before actually doing
the write, it's going to be far less likely that we are able to send
something out to the client.


Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-18 18:39:34 -0400, Tom Lane wrote:
>> 2. I don't believe the "blocked" vs "not-blocked" distinction one bit.
>> At best, it creates race-condition-like changes in behavior depending
>> on exactly when a signal arrives vs when data arrives or is sent.
>> At worst, I think it creates the same problem it's purporting to solve,
>> ie failure to respond to ProcDiePending at all.  I think the
>> before/during/after calls to ProcessClientXXXInterrupt should just all
>> behave the same and always be willing to execute ProcDiePending.

> That distinction was introduced because people (IIRC you actually) were
> worried that we'd be less likely to get error messages out to the
> client. Especially when you check unconditionally before actually doing
> the write, it's going to be far less likely that we are able to send
> something out to the client.

Far less likely than what?  If we got a ProcDie signal we'd more often
than not have handled it long before reaching here.  If we hadn't, though,
we could arrive here with ProcDiePending set but the latch clear, in which
case we fail to honor the interrupt until the client does something.
Don't really think that's acceptable :-(.  I'm also not seeing why it's
okay to commit ProcDie hara-kiri immediately if the socket is
write-blocked but not otherwise --- the two cases are going to look about
the same from the client side.

            regards, tom lane


Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> That distinction was introduced because people (IIRC you actually) were
>> worried that we'd be less likely to get error messages out to the
>> client. Especially when you check unconditionally before actually doing
>> the write, it's going to be far less likely that we are able to send
>> something out to the client.

> Far less likely than what?  If we got a ProcDie signal we'd more often
> than not have handled it long before reaching here.  If we hadn't, though,
> we could arrive here with ProcDiePending set but the latch clear, in which
> case we fail to honor the interrupt until the client does something.
> Don't really think that's acceptable :-(.  I'm also not seeing why it's
> okay to commit ProcDie hara-kiri immediately if the socket is
> write-blocked but not otherwise --- the two cases are going to look about
> the same from the client side.

I spent some more time thinking about this.  It's possible to fix the bug
while preserving the current behavior for ProcDiePending if we adjust the
ProcessClientXXXInterrupt code to set the process latch in the cases where
ProcDiePending is true but we're not waiting for the socket to come ready.
See attached variant of 0001.

I am not, however, convinced that this is better rather than just more
complicated.

If we're willing to accept a ProcDie interrupt during secure_read at all,
I don't see why not to do it even if we got some data.  We'll accept the
interrupt anyway the next time something happens to do
CHECK_FOR_INTERRUPTS; and it's unlikely that that would not be till after
we'd completed the query, so the net effect is just going to be that we
waste some cycles first.

Likewise, I see little merit in holding off ProcDie during secure_write.
If we could try to postpone the interrupt until a message boundary, so as
to avoid losing protocol sync, there would be value in that --- but this
code is at the wrong level to implement any such behavior, and it's
not trying to.  So we still have the situation that the interrupt service
is being postponed without any identifiable gain in predictability of
behavior.

In short, I don't think that the existing logic here does anything useful
to meet the concern I had, and so I wouldn't mind throwing it away.

            regards, tom lane

PS: this patch extends the ProcessClientXXXInterrupt API to distinguish
before/during/after calls.  As written, there are only two behaviors
so we could've stuck with the bool API, but I thought this was clearer.

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d349d7c..66f61ce 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,6 +145,9 @@ secure_read(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /* Deal with any already-pending interrupt condition. */
+    ProcessClientReadInterrupt(+1);
+
 retry:
 #ifdef USE_SSL
     waitfor = 0;
@@ -197,7 +200,7 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientReadInterrupt(true);
+            ProcessClientReadInterrupt(0);

             /*
              * We'll retry the read. Most likely it will return immediately
@@ -209,11 +212,10 @@ retry:
     }

     /*
-     * Process interrupts that happened while (or before) receiving. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or non-blocking,
+     * or hard-failed) read.
      */
-    ProcessClientReadInterrupt(false);
+    ProcessClientReadInterrupt(-1);

     return n;
 }
@@ -248,6 +250,9 @@ secure_write(Port *port, void *ptr, size_t len)
     ssize_t        n;
     int            waitfor;

+    /* Deal with any already-pending interrupt condition. */
+    ProcessClientWriteInterrupt(+1);
+
 retry:
     waitfor = 0;
 #ifdef USE_SSL
@@ -283,23 +288,22 @@ retry:
         if (event.events & WL_LATCH_SET)
         {
             ResetLatch(MyLatch);
-            ProcessClientWriteInterrupt(true);
+            ProcessClientWriteInterrupt(0);

             /*
              * We'll retry the write. Most likely it will return immediately
-             * because there's still no data available, and we'll wait for the
-             * socket to become ready again.
+             * because there's still no buffer space available, and we'll wait
+             * for the socket to become ready again.
              */
         }
         goto retry;
     }

     /*
-     * Process interrupts that happened while (or before) sending. Note that
-     * we signal that we're not blocking, which will prevent some types of
-     * interrupts from being processed.
+     * Process interrupts that happened during a successful (or non-blocking,
+     * or hard-failed) write.
      */
-    ProcessClientWriteInterrupt(false);
+    ProcessClientWriteInterrupt(-1);

     return n;
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6e13d14..23c04fa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -315,7 +315,7 @@ interactive_getc(void)

     c = getc(stdin);

-    ProcessClientReadInterrupt(true);
+    ProcessClientReadInterrupt(-1);

     return c;
 }
@@ -520,35 +520,44 @@ ReadCommand(StringInfo inBuf)
 /*
  * ProcessClientReadInterrupt() - Process interrupts specific to client reads
  *
- * This is called just after low-level reads. That might be after the read
- * finished successfully, or it was interrupted via interrupt.
+ * This is called just before and after low-level reads.
+ * 'before' is +1 if about to read, 0 if no data was available to read and
+ * we plan to retry, -1 if done reading.
  *
  * Must preserve errno!
  */
 void
-ProcessClientReadInterrupt(bool blocked)
+ProcessClientReadInterrupt(int before)
 {
     int            save_errno = errno;

     if (DoingCommandRead)
     {
-        /* Check for general interrupts that arrived while reading */
+        /* Check for general interrupts that arrived before/while reading */
         CHECK_FOR_INTERRUPTS();

-        /* Process sinval catchup interrupts that happened while reading */
+        /* Process sinval catchup interrupts, if any */
         if (catchupInterruptPending)
             ProcessCatchupInterrupt();

-        /* Process sinval catchup interrupts that happened while reading */
+        /* Process notify interrupts, if any */
         if (notifyInterruptPending)
             ProcessNotifyInterrupt();
     }
-    else if (ProcDiePending && blocked)
+    else if (ProcDiePending)
     {
         /*
-         * We're dying. It's safe (and sane) to handle that now.
+         * We're dying.  If there is no data available to read, then it's safe
+         * (and sane) to handle that now.  If we haven't tried to read yet,
+         * make sure the process latch is set, so that if there is no data
+         * then we'll come back here and die.  If we're done reading, also
+         * make sure the process latch is set, as we might've undesirably
+         * cleared it while reading.
          */
-        CHECK_FOR_INTERRUPTS();
+        if (before == 0)
+            CHECK_FOR_INTERRUPTS();
+        else
+            SetLatch(MyLatch);
     }

     errno = save_errno;
@@ -557,36 +566,50 @@ ProcessClientReadInterrupt(bool blocked)
 /*
  * ProcessClientWriteInterrupt() - Process interrupts specific to client writes
  *
- * This is called just after low-level writes. That might be after the read
- * finished successfully, or it was interrupted via interrupt. 'blocked' tells
- * us whether the
+ * This is called just before and after low-level writes.
+ * 'before' is +1 if about to write, 0 if no data could be written and we plan
+ * to retry, -1 if done writing.
  *
  * Must preserve errno!
  */
 void
-ProcessClientWriteInterrupt(bool blocked)
+ProcessClientWriteInterrupt(int before)
 {
     int            save_errno = errno;

-    /*
-     * We only want to process the interrupt here if socket writes are
-     * blocking to increase the chance to get an error message to the client.
-     * If we're not blocked there'll soon be a CHECK_FOR_INTERRUPTS(). But if
-     * we're blocked we'll never get out of that situation if the client has
-     * died.
-     */
-    if (ProcDiePending && blocked)
+    if (ProcDiePending)
     {
         /*
-         * We're dying. It's safe (and sane) to handle that now. But we don't
-         * want to send the client the error message as that a) would possibly
-         * block again b) would possibly lead to sending an error message to
-         * the client, while we already started to send something else.
+         * We're dying.  If it's not possible to write, then we should handle
+         * that immediately, else a stuck client could indefinitely delay our
+         * response to the signal.  If we haven't tried to write yet, make
+         * sure the process latch is set, so that if the write would block
+         * then we'll come back here and die.  If we're done writing, also
+         * make sure the process latch is set, as we might've undesirably
+         * cleared it while writing.
          */
-        if (whereToSendOutput == DestRemote)
-            whereToSendOutput = DestNone;
+        if (before == 0)
+        {
+            /*
+             * Don't mess with whereToSendOutput if ProcessInterrupts wouldn't
+             * do anything.
+             */
+            if (InterruptHoldoffCount == 0 && CritSectionCount == 0)
+            {
+                /*
+                 * We don't want to send the client the error message, as a)
+                 * that would possibly block again, and b) it would likely
+                 * lead to loss of protocol sync because we may have already
+                 * sent a partial protocol message.
+                 */
+                if (whereToSendOutput == DestRemote)
+                    whereToSendOutput = DestNone;

-        CHECK_FOR_INTERRUPTS();
+                CHECK_FOR_INTERRUPTS();
+            }
+        }
+        else
+            SetLatch(MyLatch);
     }

     errno = save_errno;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 63b4e48..b46c05c 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -71,8 +71,8 @@ extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
                                                                  * handler */
-extern void ProcessClientReadInterrupt(bool blocked);
-extern void ProcessClientWriteInterrupt(bool blocked);
+extern void ProcessClientReadInterrupt(int before);
+extern void ProcessClientWriteInterrupt(int before);

 extern void process_postgres_switches(int argc, char *argv[],
                           GucContext ctx, const char **dbname);

Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
On 2018-10-19 13:36:31 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> That distinction was introduced because people (IIRC you actually) were
> >> worried that we'd be less likely to get error messages out to the
> >> client. Especially when you check unconditionally before actually doing
> >> the write, it's going to be far less likely that we are able to send
> >> something out to the client.
> 
> > Far less likely than what?  If we got a ProcDie signal we'd more often
> > than not have handled it long before reaching here.  If we hadn't, though,
> > we could arrive here with ProcDiePending set but the latch clear, in which
> > case we fail to honor the interrupt until the client does something.
> > Don't really think that's acceptable :-(.  I'm also not seeing why it's
> > okay to commit ProcDie hara-kiri immediately if the socket is
> > write-blocked but not otherwise --- the two cases are going to look about
> > the same from the client side.

The reason we ended up there is that before the change that made it
behave like that, is that otherwise backends that are trying to write
something to the client, but the client isn't accepting any writes
(hung, forgotten, intentional DOS, whatnot), you have an unkillable
backend as soon as the the network buffers fill up.

But if we always check for procDiePending, even when not blocked, we'd
be able to send out an error message in fewer cases.

There's no problem with being unkillable when writing out data without
blocking, because it's going to take pretty finite time to copy a few
bytes into the kernel buffers.

Obviously it's not perfect to not be able to send a message in cases a
backend is killed while blocked in network, but there's not really a way
around that.

You can pretty easily trigger these cases and observe the difference by
doing something like COPY TO STDOUT of a large table, SIGSTOP the psql,
attach strace to the backend, and then terminate the backend.  Without
checking interrupts while blocked the backend doesn't get terminated.


> If we're willing to accept a ProcDie interrupt during secure_read at all,
> I don't see why not to do it even if we got some data.  We'll accept the
> interrupt anyway the next time something happens to do
> CHECK_FOR_INTERRUPTS; and it's unlikely that that would not be till after
> we'd completed the query, so the net effect is just going to be that we
> waste some cycles first.

I don't immediately see a problem with changing this for reads.


> Likewise, I see little merit in holding off ProcDie during secure_write.
> If we could try to postpone the interrupt until a message boundary, so as
> to avoid losing protocol sync, there would be value in that --- but this
> code is at the wrong level to implement any such behavior, and it's
> not trying to.  So we still have the situation that the interrupt service
> is being postponed without any identifiable gain in predictability of
> behavior.

See earlier explanation.

Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
Hi,

On 2018-10-19 13:45:42 -0700, Andres Freund wrote:
> On 2018-10-19 13:36:31 -0400, Tom Lane wrote:
> > If we're willing to accept a ProcDie interrupt during secure_read at all,
> > I don't see why not to do it even if we got some data.  We'll accept the
> > interrupt anyway the next time something happens to do
> > CHECK_FOR_INTERRUPTS; and it's unlikely that that would not be till after
> > we'd completed the query, so the net effect is just going to be that we
> > waste some cycles first.
> 
> I don't immediately see a problem with changing this for reads.

One argument against changing it, although not a very strong one, is
that processing a proc die even when non-blocking prevents us from
processing commands like a client's X/terminate even if we already have
the necessary input.

Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-19 13:45:42 -0700, Andres Freund wrote:
>> I don't immediately see a problem with changing this for reads.

> One argument against changing it, although not a very strong one, is
> that processing a proc die even when non-blocking prevents us from
> processing commands like a client's X/terminate even if we already have
> the necessary input.

I'm pretty skeptical of these arguments, as they depend on assumptions
that there are no CHECK_FOR_INTERRUPTS calls anywhere in the relevant
code paths outside be-secure.c.  Even if that's true today, it doesn't
seem like something to depend on.

However, there's definitely merit in the idea that we shouldn't change
the ProcDie behavior if we don't have to in order to fix the NOTIFY
bug --- especially since I'd like to backpatch this.  So if you're
happy with the revised patch, I can go with that one.

            regards, tom lane


Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
Hi,

On 2018-10-19 19:53:06 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-19 13:45:42 -0700, Andres Freund wrote:
> >> I don't immediately see a problem with changing this for reads.
> 
> > One argument against changing it, although not a very strong one, is
> > that processing a proc die even when non-blocking prevents us from
> > processing commands like a client's X/terminate even if we already have
> > the necessary input.
> 
> I'm pretty skeptical of these arguments, as they depend on assumptions
> that there are no CHECK_FOR_INTERRUPTS calls anywhere in the relevant
> code paths outside be-secure.c.  Even if that's true today, it doesn't
> seem like something to depend on.

I'm not particularly convinced either, if you're just talking about
reads, rather than writes. I think it'd matter more if we had a mode
where we told clients "dear clients please shut down now, I'm about to
do the same" and we wanted to avoid redundant log-spam about client EOF
etc, but we don't.


> However, there's definitely merit in the idea that we shouldn't change
> the ProcDie behavior if we don't have to in order to fix the NOTIFY
> bug --- especially since I'd like to backpatch this.  So if you're
> happy with the revised patch, I can go with that one.

I find the +1, 0, -1 pretty confusing and less clear than what was there
before. If we do something like it, could we introduce an enum or macros
for it?

Greetings,

Andres Freund


Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I find the +1, 0, -1 pretty confusing and less clear than what was there
> before. If we do something like it, could we introduce an enum or macros
> for it?

Meh.  Shall I just go back to the blocked/not blocked approach?  There
was a need for the 3-way argument at one point while I was fooling with
it, but once I decided that it'd be a good idea to re-set the latch on
the way out, there wasn't anymore.

            regards, tom lane


Re: NOTIFY does not work as expected

От
Andres Freund
Дата:
On 2018-10-19 20:27:20 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I find the +1, 0, -1 pretty confusing and less clear than what was there
> > before. If we do something like it, could we introduce an enum or macros
> > for it?
> 
> Meh.  Shall I just go back to the blocked/not blocked approach?

+1

- Andres


Re: NOTIFY does not work as expected

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-10-19 20:27:20 -0400, Tom Lane wrote:
>> Meh.  Shall I just go back to the blocked/not blocked approach?

> +1

OK, I'll make it so.

            regards, tom lane