Re: libpq copy error handling busted

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: libpq copy error handling busted
Дата
Msg-id 1187243.1591234553@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: libpq copy error handling busted  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: libpq copy error handling busted  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
I wrote:
> * pqSendSome() is responsible not only for pushing out data, but for
> calling pqReadData in any situation where it can't get rid of the data
> promptly.  1f39a1c06 overlooked that requirement, and the upshot is
> that we don't necessarily notice that the connection is broken (it's
> pqReadData's job to detect that).  Putting a pqReadData call into
> the early-exit path helps, but doesn't fix things completely.

Ah, it's better if I put the pqReadData call into *both* the paths
where 1f39a1c06 made pqSendSome give up.  The attached patch seems
to fix the issue for the "pgbench -i" scenario, with either fast-
or immediate-mode server stop.  I tried it with and without SSL too,
just to see.  Still, it's not clear to me whether this might worsen
any of the situations we discussed in the lead-up to 1f39a1c06 [1].
Thomas, are you in a position to redo any of that testing?

> * The more longstanding problem is that the PQputCopyData code path
> doesn't have any mechanism for consuming an 'E' (error) message
> once pqReadData has collected it.

At least with pgbench's approach (die immediately on PQputline failure)
this isn't very relevant once we apply the attached.  Perhaps we should
revisit this behavior anyway, but I'd be afraid to back-patch a change
of that nature.

> * As for control-C not getting out of it: there is
>         if (CancelRequested)
>             break;
> in pgbench's loop, but this does nothing in this scenario because
> fe-utils/cancel.c only sets that flag when it successfully sends a
> Cancel ... which it certainly cannot if the postmaster is gone.

I'll send a patch for this later.

            regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/CAEepm%3D2n6Nv%2B5tFfe8YnkUm1fXgvxR0Mm1FoD%2BQKG-vLNGLyKg%40mail.gmail.com
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 9afa0533a6..9273984727 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -823,6 +823,10 @@ definitelyFailed:
  * Return 0 on success, -1 on failure and 1 when not all data could be sent
  * because the socket would block and the connection is non-blocking.
  *
+ * Note that this is also responsible for consuming data from the socket
+ * (putting it in conn->inBuffer) in any situation where we can't send
+ * all the specified data immediately.
+ *
  * Upon write failure, conn->write_failed is set and the error message is
  * saved in conn->write_err_msg, but we clear the output buffer and return
  * zero anyway; this is because callers should soldier on until it's possible
@@ -842,12 +846,20 @@ pqSendSome(PGconn *conn, int len)
      * on that connection.  Even if the kernel would let us, we've probably
      * lost message boundary sync with the server.  conn->write_failed
      * therefore persists until the connection is reset, and we just discard
-     * all data presented to be written.
+     * all data presented to be written.  However, as long as we still have a
+     * valid socket, we should continue to absorb data from the backend, so
+     * that we can collect any final error messages.
      */
     if (conn->write_failed)
     {
         /* conn->write_err_msg should be set up already */
         conn->outCount = 0;
+        /* Absorb input data if any, and detect socket closure */
+        if (conn->sock != PGINVALID_SOCKET)
+        {
+            if (pqReadData(conn) < 0)
+                return -1;
+        }
         return 0;
     }
 
@@ -917,6 +929,13 @@ pqSendSome(PGconn *conn, int len)
 
                     /* Discard queued data; no chance it'll ever be sent */
                     conn->outCount = 0;
+
+                    /* Absorb input data if any, and detect socket closure */
+                    if (conn->sock != PGINVALID_SOCKET)
+                    {
+                        if (pqReadData(conn) < 0)
+                            return -1;
+                    }
                     return 0;
             }
         }

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical ()at walsender.c:2762
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: elog(DEBUG2 in SpinLocked section.