Обсуждение: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

Поиск
Список
Период
Сортировка

PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
The new connection made by PQcancel does not use the tcp_user_timeout, connect_timeout or any of the keepalive settings
thatare provided in the connection string. This means that a call to PQcancel can block for a much longer time than
intendedif there are network issues. This can be especially impactful, because PQcancel is a blocking function an there
isno non blocking version of it.  

I attached a proposed patch to use the tcp_user_timeout from the connection string when connecting to Postgres in
PQcancel.This resolves the issue for me, since this will make connecting timeout after a configurable time. So the
otheroptions are not strictly needed. It might still be nice for completeness to support them too though. I didn't do
thisyet, because I first wanted some feedback and also because implementing connect_timeout would require using non
blockingTCP to connect and then use select to have a timeout. 
Вложения

Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Zhihong Yu
Дата:


On Thu, Sep 30, 2021 at 7:45 AM Jelte Fennema <Jelte.Fennema@microsoft.com> wrote:
The new connection made by PQcancel does not use the tcp_user_timeout, connect_timeout or any of the keepalive settings that are provided in the connection string. This means that a call to PQcancel can block for a much longer time than intended if there are network issues. This can be especially impactful, because PQcancel is a blocking function an there is no non blocking version of it.

I attached a proposed patch to use the tcp_user_timeout from the connection string when connecting to Postgres in PQcancel. This resolves the issue for me, since this will make connecting timeout after a configurable time. So the other options are not strictly needed. It might still be nice for completeness to support them too though. I didn't do this yet, because I first wanted some feedback and also because implementing connect_timeout would require using non blocking TCP to connect and then use select to have a timeout.

Hi,

    int         be_key;         /* key of backend --- needed for cancels */
+   int         pgtcp_user_timeout; /* tcp user timeout */ 

The other field names are quite short. How about naming the field tcp_timeout ?

Cheers

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
We actually ran into an issue caused by this in production, where a PQcancel connection was open on the client for a 2+
daysbecause the server had restarted at the wrong moment in the cancel handshake. The client was now indefinitely
waitingfor the server to send an EOF back, and because keepalives were not enabled on this socket it was never closed. 

I attached an updated patch which also uses the keepalive settings in PQ. The connect_timeout is a bit harder to get it
towork. As far as I can tell it would require something like this. https://stackoverflow.com/a/2597774/2570866 

> The other field names are quite short. How about naming the field tcp_timeout ?

I kept the same names as in the pg_conn struct for consistency sake.



Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
Ugh forgot to attach the patch. Here it is.

From: Jelte Fennema <Jelte.Fennema@microsoft.com>
Sent: Wednesday, October 6, 2021 21:56
To: Zhihong Yu <zyu@yugabyte.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings
 
We actually ran into an issue caused by this in production, where a PQcancel connection was open on the client for a 2+ days because the server had restarted at the wrong moment in the cancel handshake. The client was now indefinitely waiting for the server to send an EOF back, and because keepalives were not enabled on this socket it was never closed.

I attached an updated patch which also uses the keepalive settings in PQ. The connect_timeout is a bit harder to get it to work. As far as I can tell it would require something like this. https://stackoverflow.com/a/2597774/2570866

> The other field names are quite short. How about naming the field tcp_timeout ?

I kept the same names as in the pg_conn struct for consistency sake.
Вложения

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Fujii Masao
Дата:

On 2021/10/07 4:58, Jelte Fennema wrote:
> Ugh forgot to attach the patch. Here it is.

Thanks for working on this patch!


@@ -4546,10 +4684,21 @@ PQrequestCancel(PGconn *conn)
  
          return false;
      }
-
-    r = internal_cancel(&conn->raddr, conn->be_pid, conn->be_key,

Since PQrequestCancel() is marked as deprecated, I don't think that
we need to add the feature into it.


+        if (cancel->pgtcp_user_timeout >= 0) {
+            if (setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+                           (char *) &cancel->pgtcp_user_timeout,
+                           sizeof(cancel->pgtcp_user_timeout)) < 0) {
+                goto cancel_errReturn;
+            }
+        }

libpq has already setKeepalivesXXX() functions to do the almost same thing.
Isn't it better to modify and reuse them instead of adding the almost
same code, to simplify the code?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> Since PQrequestCancel() is marked as deprecated, I don't think that
> we need to add the feature into it.

Not absolutely necessary, agreed, but it looks like it's pretty
easy to make that happen, so why not?  I'd suggest dropping the
separate implementation and turning PQrequestCancel() into a
thin wrapper around PQgetCancel, PQcancel, PQfreeCancel.

> libpq has already setKeepalivesXXX() functions to do the almost same thing.
> Isn't it better to modify and reuse them instead of adding the almost
> same code, to simplify the code?

I find this patch fairly scary, because it's apparently been coded
with little regard for the expectation that PQcancel can be called
within a signal handler.

I see that setsockopt(2) is specified to be async signal safe by
POSIX, so at least in principle it is okay to add those calls to
PQcancel.  But you've got to be really careful what else you do in
there.  You can NOT use appendPQExpBuffer.  You can NOT access the
PGconn (I don't think the Windows part of this even compiles ...
nope, it doesn't, per the cfbot).  I'm not sure that WSAIoctl
is safe to use in a signal handler, so on the whole I think
I'd drop the Windows-specific chunk altogether.  But in any case,
I'm very strongly against calling out to other libpq code from here,
because then the signal-safety restrictions start applying to that
other code too, and that's a recipe for trouble in future.

The patch could use a little attention to conforming to PG coding
conventions (comment style, brace style, C99 declarations are all
wrong --- pgindent would fix much of that, but maybe not in a way
you like).  The lack of comments about why it's doing what it's doing
needs to be rectified, too.  Why are these specific options important
and not any others?

            regards, tom lane



Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
> I'd suggest dropping the separate implementation and turning 
> PQrequestCancel() into a thin wrapper around PQgetCancel, 
> PQcancel, PQfreeCancel.

Fine by me. I didn't want to change behavior of a deprecated 
function.

> I find this patch fairly scary, because it's apparently been coded
> with little regard for the expectation that PQcancel can be called
> within a signal handler.
> You can NOT use appendPQExpBuffer.  You can NOT access the
> PGconn (I don't think the Windows part of this even compiles ...
> nope, it doesn't, per the cfbot).

I guess I was tired or at least inattentive when I added the windows part at the end
of my coding session. It really was the Linux part that I cared about. For that part 
I definitely took care to make the code signal safe. Which is also why I did not call out to 
any of the existing functions, like setKeepalivesXXX(). I don't think I'm the right person
to write the windows code for this (I have zero C windows experience). So, if it's not 
required for this patch to be accepted I'll happily remove it.

> The patch could use a little attention to conforming to PG coding
> conventions (comment style, brace style, C99 declarations are all
> wrong --- pgindent would fix much of that, but maybe not in a way
> you like).  

Sure, I'll run pgindent for my next version of the patch.

> The lack of comments about why it's doing what it's doing
> needs to be rectified, too.  Why are these specific options important
> and not any others?

I'll make sure to add comments before the final version of this patch. This 
patch was more meant as a draft to gauge if this was even the correct way of fixing 
this problem. 

To be honest I think it would make sense to add a new PQcancel function that is not 
required to be signal safe and reuses regular connection setup code. This would make sure
options like this are supported automatically in the future. Another advantage is that it would 
allow for sending cancel messages in a non-blocking way. So, you would be able to easily 
send multiple cancels in a concurrent way. It looks to me like PQcancel is mostly designed 
the way it is to keep it easy for psql to send cancelations. I think many other uses of PQcancel
don't require it to be signal safe at all (at least for Citus its usage signal safety is not required).

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
I was able to spend some time on this again. I attached two patches to this email:

The first patch is a cleaned up version of my previous patch. I think I addressed
all feedback on the previous version in that patch (e.g. removed windows code,
fixed formatting).

The second patch is a new one, it implements honouring of the connect_timeout
connection option in PQcancel. This patch requires the first patch to also be applied,
but since it seemed fairly separate and the code is not trivial I didn't want the first
patch to be blocked on this.

Finally, I would love it if once these fixes are merged the would also be backpatched to
previous versions of libpq. Does that seem possible? As far as I can tell it would be fine,
since it doesn't really change any of the public APIs. The only change is that the pg_cancel
struct now has a few additional fields. But since that struct is defined in libpq-int.h, so that
struct should not be used by users of libpq directly, right?.
Вложения

Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Andres Freund
Дата:
Hi,

On 2021-12-28 15:49:00 +0000, Jelte Fennema wrote:
> The first patch is a cleaned up version of my previous patch. I think I addressed
> all feedback on the previous version in that patch (e.g. removed windows code, 
> fixed formatting).

To me it seems a bit problematic to introduce a divergence between windows /
everything else here. Isn't that just going to lead to other complaints just
like this thread, where somebody discovered the hard way that there's platform
dependent behaviour here?


> The second patch is a new one, it implements honouring of the connect_timeout 
> connection option in PQcancel. This patch requires the first patch to also be applied,
> but since it seemed fairly separate and the code is not trivial I didn't want the first
> patch to be blocked on this.
> 
> Finally, I would love it if once these fixes are merged the would also be backpatched to 
> previous versions of libpq. Does that seem possible? As far as I can tell it would be fine, 
> since it doesn't really change any of the public APIs. The only change is that the pg_cancel 
> struct now has a few additional fields. But since that struct is defined in libpq-int.h, so that 
> struct should not be used by users of libpq directly, right?.

I'm not really convinced this is a good patch to backpatch. There does seem to
be some potential for subtle breakage - code in signal handlers is notoriously
finnicky, it's a rarely exercised code path, etc. It's also not fixing
something that previously worked.


> +     * NOTE: These socket options are currently not set for Windows. The
> +     * reason is that signal safety in this function is very important, and it
> +     * was not clear to if the functions required to set the socket options on
> +     * Windows were signal-safe.
> +     */
> +#ifndef WIN32
> +    if (!IS_AF_UNIX(cancel->raddr.addr.ss_family))
> +    {
> +#ifdef TCP_USER_TIMEOUT
> +        if (cancel->pgtcp_user_timeout >= 0)
> +        {
> +            if (setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> +                           (char *) &cancel->pgtcp_user_timeout,
> +                           sizeof(cancel->pgtcp_user_timeout)) < 0)
> +            {
> +                strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_USER_TIMEOUT) failed: ", errbufsize);
> +                goto cancel_errReturn;
> +            }
> +        }
> +#endif
> +
> +        if (cancel->keepalives != 0)
> +        {
> +            int            on = 1;
> +
> +            if (setsockopt(tmpsock,
> +                           SOL_SOCKET, SO_KEEPALIVE,
> +                           (char *) &on, sizeof(on)) < 0)
> +            {
> +                strlcpy(errbuf, "PQcancel() -- setsockopt(SO_KEEPALIVE) failed: ", errbufsize);
> +                goto cancel_errReturn;
> +            }
> +        }

This is very repetitive - how about introducing a helper function for this?



> @@ -4467,8 +4601,8 @@ retry3:
>  
>      crp.packetlen = pg_hton32((uint32) sizeof(crp));
>      crp.cp.cancelRequestCode = (MsgType) pg_hton32(CANCEL_REQUEST_CODE);
> -    crp.cp.backendPID = pg_hton32(be_pid);
> -    crp.cp.cancelAuthCode = pg_hton32(be_key);
> +    crp.cp.backendPID = pg_hton32(cancel->be_pid);
> +    crp.cp.cancelAuthCode = pg_hton32(cancel->be_key);


Others might differ, but I'd separate changing the type passed to
internal_cancel() into its own commit.


Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2021-12-28 15:49:00 +0000, Jelte Fennema wrote:
>> Finally, I would love it if once these fixes are merged the would also be backpatched to
>> previous versions of libpq.

> I'm not really convinced this is a good patch to backpatch. There does seem to
> be some potential for subtle breakage - code in signal handlers is notoriously
> finnicky, it's a rarely exercised code path, etc. It's also not fixing
> something that previously worked.

IMO, this is a new feature not a bug fix, and as such it's not backpatch
material ... especially if we don't have 100.00% confidence in it.

            regards, tom lane



Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
Attached are 3 patches that address the feedback from Andres about code duplication
and splitting up commits. I completely removed internal_cancel now, since it only had
one caller at this point.

> IMO, this is a new feature not a bug fix

IMO this is definitely a bugfix. Nowhere in the libpq docs it stated that the connection
options in question do not apply to connections that are opened for cancellations. So
as a user I definitely expect that any connections that libpq opens would use these options.
Which is why I definitely consider it a bug that they are currently not honoured for cancel
requests.

However, even though I think it's a bugfix, I can understand the being hesitant to
backport this. IMHO in that case at least the docs should be updated to explain this
discrepancy. I attached a patch to do so against the docs on the REL_14_STABLE branch.

> To me it seems a bit problematic to introduce a divergence between windows /
> everything else here. Isn't that just going to lead to other complaints just
> like this thread, where somebody discovered the hard way that there's platform
> dependent behaviour here?

Of course, fixing this also for windows would be much better. There's two problems:
1. I cannot find any clear documentation on which functions are signal safe in Windows
   and which are not. The only reference I can find is this:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170 
   However, this says that you should not use any function that generates a system call.
   PQcancel is obviously already violating that when calling "connect", so this is not very helpful.
2. My Windows C experience is non existent, so I don't think I would be the right person to write this code.

IMO blocking this bugfix, because it does not fix it for Windows, would be an example of perfect
becoming the enemy of good. One thing I could do is add a note to the docs that these options
are not supported on Windows for cancellation requests (similar to my proposed doc change
for PG14 and below).
Вложения
Jelte Fennema <Jelte.Fennema@microsoft.com> writes:
> Attached are 3 patches that address the feedback from Andres about code duplication
> and splitting up commits. I completely removed internal_cancel now, since it only had
> one caller at this point.

Here's some cleaned-up versions of 0001 and 0002.  I have not bothered
with 0003 because I for one will not be convinced to commit it.  The
risk-vs-reward ratio looks far too high on that, and it's not obvious
why 0002 doesn't already solve whatever problem there is.

A couple of notes:

0001 introduces malloc/free into PQrequestCancel, which promotes it
from being "probably unsafe in a signal handler" to "definitely unsafe
in a signal handler".  Before, maybe you could have gotten away with
it if you were sure the PGconn object was idle, but now it's no-go for
sure.  I don't have a big problem with that, given that it's been
deprecated for decades, but I made the warning text in libpq.sgml a
little stronger.

As for 0002, I don't see a really good reason why we shouldn't try
to do it on Windows too.  If connect() will work, then it seems
likely that setsockopt() and WSAIOCtl() will too.  Moreover, note
that at least in our own programs, PQcancel doesn't *really* run in a
signal handler on Windows: see commentary in src/fe_utils/cancel.c.
(The fact that we now have test coverage for PQcancel makes me a lot
more willing to try this than I might otherwise be.  Will be
interested to see the cfbot's results.)

Also, I was somewhat surprised while working on this to realize
that PQconnectPoll doesn't call setTCPUserTimeout if keepalives
are disabled per useKeepalives().  I made PQcancel act the same,
but I wonder if that was intentional or a bug.  I'd personally
have thought that those settings were independent.

            regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 14f35d37f6..e0ab7cd555 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5707,8 +5707,8 @@ int PQrequestCancel(PGconn *conn);
        <structname>PGconn</structname> object, and in case of failure stores the
        error message in the <structname>PGconn</structname> object (whence it can
        be retrieved by <xref linkend="libpq-PQerrorMessage"/>).  Although
-       the functionality is the same, this approach creates hazards for
-       multiple-thread programs and signal handlers, since it is possible
+       the functionality is the same, this approach is not safe within
+       multiple-thread programs or signal handlers, since it is possible
        that overwriting the <structname>PGconn</structname>'s error message will
        mess up the operation currently in progress on the connection.
       </para>
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 5fc16be849..9e75abd304 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4394,14 +4394,17 @@ PQfreeCancel(PGcancel *cancel)


 /*
- * PQcancel and PQrequestCancel: attempt to request cancellation of the
- * current operation.
+ * PQcancel: request query cancel
  *
  * The return value is true if the cancel request was successfully
  * dispatched, false if not (in which case an error message is available).
  * Note: successful dispatch is no guarantee that there will be any effect at
  * the backend.  The application must read the operation result as usual.
  *
+ * On failure, an error message is stored in *errbuf, which must be of size
+ * errbufsize (recommended size is 256 bytes).  *errbuf is not changed on
+ * success return.
+ *
  * CAUTION: we want this routine to be safely callable from a signal handler
  * (for example, an application might want to call it in a SIGINT handler).
  * This means we cannot use any C library routine that might be non-reentrant.
@@ -4409,13 +4412,9 @@ PQfreeCancel(PGcancel *cancel)
  * just as dangerous.  We avoid sprintf here for that reason.  Building up
  * error messages with strcpy/strcat is tedious but should be quite safe.
  * We also save/restore errno in case the signal handler support doesn't.
- *
- * internal_cancel() is an internal helper function to make code-sharing
- * between the two versions of the cancel function possible.
  */
-static int
-internal_cancel(SockAddr *raddr, int be_pid, int be_key,
-                char *errbuf, int errbufsize)
+int
+PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
 {
     int            save_errno = SOCK_ERRNO;
     pgsocket    tmpsock = PGINVALID_SOCKET;
@@ -4427,18 +4426,26 @@ internal_cancel(SockAddr *raddr, int be_pid, int be_key,
         CancelRequestPacket cp;
     }            crp;

+    if (!cancel)
+    {
+        strlcpy(errbuf, "PQcancel() -- no cancel object supplied", errbufsize);
+        /* strlcpy probably doesn't change errno, but be paranoid */
+        SOCK_ERRNO_SET(save_errno);
+        return false;
+    }
+
     /*
      * We need to open a temporary connection to the postmaster. Do this with
      * only kernel calls.
      */
-    if ((tmpsock = socket(raddr->addr.ss_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+    if ((tmpsock = socket(cancel->raddr.addr.ss_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
     {
         strlcpy(errbuf, "PQcancel() -- socket() failed: ", errbufsize);
         goto cancel_errReturn;
     }
 retry3:
-    if (connect(tmpsock, (struct sockaddr *) &raddr->addr,
-                raddr->salen) < 0)
+    if (connect(tmpsock, (struct sockaddr *) &cancel->raddr.addr,
+                cancel->raddr.salen) < 0)
     {
         if (SOCK_ERRNO == EINTR)
             /* Interrupted system call - we'll just try again */
@@ -4455,8 +4462,8 @@ retry3:

     crp.packetlen = pg_hton32((uint32) sizeof(crp));
     crp.cp.cancelRequestCode = (MsgType) pg_hton32(CANCEL_REQUEST_CODE);
-    crp.cp.backendPID = pg_hton32(be_pid);
-    crp.cp.cancelAuthCode = pg_hton32(be_key);
+    crp.cp.backendPID = pg_hton32(cancel->be_pid);
+    crp.cp.cancelAuthCode = pg_hton32(cancel->be_key);

 retry4:
     if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))
@@ -4508,27 +4515,6 @@ cancel_errReturn:
     return false;
 }

-/*
- * PQcancel: request query cancel
- *
- * Returns true if able to send the cancel request, false if not.
- *
- * On failure, an error message is stored in *errbuf, which must be of size
- * errbufsize (recommended size is 256 bytes).  *errbuf is not changed on
- * success return.
- */
-int
-PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
-{
-    if (!cancel)
-    {
-        strlcpy(errbuf, "PQcancel() -- no cancel object supplied", errbufsize);
-        return false;
-    }
-
-    return internal_cancel(&cancel->raddr, cancel->be_pid, cancel->be_key,
-                           errbuf, errbufsize);
-}

 /*
  * PQrequestCancel: old, not thread-safe function for requesting query cancel
@@ -4546,6 +4532,7 @@ int
 PQrequestCancel(PGconn *conn)
 {
     int            r;
+    PGcancel   *cancel;

     /* Check we have an open connection */
     if (!conn)
@@ -4561,8 +4548,19 @@ PQrequestCancel(PGconn *conn)
         return false;
     }

-    r = internal_cancel(&conn->raddr, conn->be_pid, conn->be_key,
-                        conn->errorMessage.data, conn->errorMessage.maxlen);
+    cancel = PQgetCancel(conn);
+    if (cancel)
+    {
+        r = PQcancel(cancel, conn->errorMessage.data,
+                     conn->errorMessage.maxlen);
+        PQfreeCancel(cancel);
+    }
+    else
+    {
+        strlcpy(conn->errorMessage.data, "out of memory",
+                conn->errorMessage.maxlen);
+        r = false;
+    }

     if (!r)
         conn->errorMessage.len = strlen(conn->errorMessage.data);
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 9e75abd304..a4a559904d 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1944,26 +1944,17 @@ setKeepalivesCount(PGconn *conn)
 /*
  * Enable keepalives and set the keepalive values on Win32,
  * where they are always set in one batch.
+ *
+ * setKeepalivesWin32 is used by PQcancel, and had better be signal safe.
  */
 static int
-setKeepalivesWin32(PGconn *conn)
+setKeepalivesWin32(pgsocket sock, int idle, int interval)
 {
     struct tcp_keepalive ka;
     DWORD        retsize;
-    int            idle = 0;
-    int            interval = 0;

-    if (conn->keepalives_idle &&
-        !parse_int_param(conn->keepalives_idle, &idle, conn,
-                         "keepalives_idle"))
-        return 0;
     if (idle <= 0)
         idle = 2 * 60 * 60;        /* 2 hours = default */
-
-    if (conn->keepalives_interval &&
-        !parse_int_param(conn->keepalives_interval, &interval, conn,
-                         "keepalives_interval"))
-        return 0;
     if (interval <= 0)
         interval = 1;            /* 1 second = default */

@@ -1971,7 +1962,7 @@ setKeepalivesWin32(PGconn *conn)
     ka.keepalivetime = idle * 1000;
     ka.keepaliveinterval = interval * 1000;

-    if (WSAIoctl(conn->sock,
+    if (WSAIoctl(sock,
                  SIO_KEEPALIVE_VALS,
                  (LPVOID) &ka,
                  sizeof(ka),
@@ -1981,6 +1972,26 @@ setKeepalivesWin32(PGconn *conn)
                  NULL,
                  NULL)
         != 0)
+        return 0;
+    return 1;
+}
+
+static int
+prepKeepalivesWin32(PGconn *conn)
+{
+    int            idle = -1;
+    int            interval = -1;
+
+    if (conn->keepalives_idle &&
+        !parse_int_param(conn->keepalives_idle, &idle, conn,
+                         "keepalives_idle"))
+        return 0;
+    if (conn->keepalives_interval &&
+        !parse_int_param(conn->keepalives_interval, &interval, conn,
+                         "keepalives_interval"))
+        return 0;
+
+    if (!setKeepalivesWin32(conn->sock, idle, interval))
     {
         appendPQExpBuffer(&conn->errorMessage,
                           libpq_gettext("%s(%s) failed: error code %d\n"),
@@ -2644,7 +2655,7 @@ keep_going:                        /* We will come back to here until there is
                             err = 1;
 #else                            /* WIN32 */
 #ifdef SIO_KEEPALIVE_VALS
-                        else if (!setKeepalivesWin32(conn))
+                        else if (!prepKeepalivesWin32(conn))
                             err = 1;
 #endif                            /* SIO_KEEPALIVE_VALS */
 #endif                            /* WIN32 */
@@ -4380,8 +4391,53 @@ PQgetCancel(PGconn *conn)
     memcpy(&cancel->raddr, &conn->raddr, sizeof(SockAddr));
     cancel->be_pid = conn->be_pid;
     cancel->be_key = conn->be_key;
+    /* We use -1 to indicate an unset connection option */
+    cancel->pgtcp_user_timeout = -1;
+    cancel->keepalives = -1;
+    cancel->keepalives_idle = -1;
+    cancel->keepalives_interval = -1;
+    cancel->keepalives_count = -1;
+    if (conn->pgtcp_user_timeout != NULL)
+    {
+        if (!parse_int_param(conn->pgtcp_user_timeout,
+                             &cancel->pgtcp_user_timeout,
+                             conn, "tcp_user_timeout"))
+            goto fail;
+    }
+    if (conn->keepalives != NULL)
+    {
+        if (!parse_int_param(conn->keepalives,
+                             &cancel->keepalives,
+                             conn, "keepalives"))
+            goto fail;
+    }
+    if (conn->keepalives_idle != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_idle,
+                             &cancel->keepalives_idle,
+                             conn, "keepalives_idle"))
+            goto fail;
+    }
+    if (conn->keepalives_interval != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_interval,
+                             &cancel->keepalives_interval,
+                             conn, "keepalives_interval"))
+            goto fail;
+    }
+    if (conn->keepalives_count != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_count,
+                             &cancel->keepalives_count,
+                             conn, "keepalives_count"))
+            goto fail;
+    }

     return cancel;
+
+fail:
+    free(cancel);
+    return NULL;
 }

 /* PQfreeCancel: free a cancel structure */
@@ -4393,6 +4449,23 @@ PQfreeCancel(PGcancel *cancel)
 }


+/*
+ * Sets an integer socket option on a TCP socket, if the provided value is
+ * not negative.  Returns false if setsockopt fails for some reason.
+ *
+ * CAUTION: This needs to be signal safe, since it's used by PQcancel.
+ */
+static bool
+optional_setsockopt(int fd, int protoid, int optid, int value)
+{
+    if (value < 0)
+        return true;
+    if (setsockopt(fd, protoid, optid, (char *) &value, sizeof(value)) < 0)
+        return false;
+    return true;
+}
+
+
 /*
  * PQcancel: request query cancel
  *
@@ -4443,6 +4516,78 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
         strlcpy(errbuf, "PQcancel() -- socket() failed: ", errbufsize);
         goto cancel_errReturn;
     }
+
+    /*
+     * Since this connection will only be used to send a tiny bit of data, we
+     * don't need NODELAY.  We also don't set the socket to nonblocking mode,
+     * because the API definition of PQcancel requires the cancel to be sent
+     * in a blocking way.
+     *
+     * We do set socket options related to keepalives and other TCP timeouts.
+     * This ensures that this function does not block indefinitely when
+     * reasonable keepalive and timeout settings have been provided.
+     */
+    if (!IS_AF_UNIX(cancel->raddr.addr.ss_family) &&
+        cancel->keepalives != 0)
+    {
+#ifndef WIN32
+        if (!optional_setsockopt(tmpsock, SOL_SOCKET, SO_KEEPALIVE, 1))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(SO_KEEPALIVE) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+
+#ifdef PG_TCP_KEEPALIVE_IDLE
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, PG_TCP_KEEPALIVE_IDLE,
+                                 cancel->keepalives_idle))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(" PG_TCP_KEEPALIVE_IDLE_STR ") failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#ifdef TCP_KEEPINTVL
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_KEEPINTVL,
+                                 cancel->keepalives_interval))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_KEEPINTVL) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#ifdef TCP_KEEPCNT
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_KEEPCNT,
+                                 cancel->keepalives_count))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_KEEPCNT) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#else                            /* WIN32 */
+
+#ifdef SIO_KEEPALIVE_VALS
+        if (!setKeepalivesWin32(tmpsock,
+                                cancel->keepalives_idle,
+                                cancel->keepalives_interval))
+        {
+            strlcpy(errbuf, "PQcancel() -- WSAIoctl(SIO_KEEPALIVE_VALS) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif                            /* SIO_KEEPALIVE_VALS */
+#endif                            /* WIN32 */
+
+        /* TCP_USER_TIMEOUT works the same way on Unix and Windows */
+#ifdef TCP_USER_TIMEOUT
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+                                 cancel->pgtcp_user_timeout))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_USER_TIMEOUT) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+    }
+
 retry3:
     if (connect(tmpsock, (struct sockaddr *) &cancel->raddr.addr,
                 cancel->raddr.salen) < 0)
@@ -4454,10 +4599,6 @@ retry3:
         goto cancel_errReturn;
     }

-    /*
-     * We needn't set nonblocking I/O or NODELAY options here.
-     */
-
     /* Create and send the cancel request packet. */

     crp.packetlen = pg_hton32((uint32) sizeof(crp));
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index fcce13843e..4290553482 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -583,6 +583,13 @@ struct pg_cancel
     SockAddr    raddr;            /* Remote address */
     int            be_pid;            /* PID of backend --- needed for cancels */
     int            be_key;            /* key of backend --- needed for cancels */
+    int            pgtcp_user_timeout; /* tcp user timeout */
+    int            keepalives;        /* use TCP keepalives? */
+    int            keepalives_idle;    /* time between TCP keepalives */
+    int            keepalives_interval;    /* time between TCP keepalive
+                                         * retransmits */
+    int            keepalives_count;    /* maximum number of TCP keepalive
+                                     * retransmits */
 };



I wrote:
> (The fact that we now have test coverage for PQcancel makes me a lot
> more willing to try this than I might otherwise be.  Will be
> interested to see the cfbot's results.)

On closer look, I'm not sure that psql/t/020_cancel.pl is actually doing
anything on Windows; the cfbot's test transcript says it ran without
skipping, but looking at the test itself, it seems like it would skip on
Windows.

Meanwhile, the warnings build noticed that we need to #ifdef
out the optional_setsockopt function in some cases.  Revised
0002 attached.

            regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 14f35d37f6..e0ab7cd555 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5707,8 +5707,8 @@ int PQrequestCancel(PGconn *conn);
        <structname>PGconn</structname> object, and in case of failure stores the
        error message in the <structname>PGconn</structname> object (whence it can
        be retrieved by <xref linkend="libpq-PQerrorMessage"/>).  Although
-       the functionality is the same, this approach creates hazards for
-       multiple-thread programs and signal handlers, since it is possible
+       the functionality is the same, this approach is not safe within
+       multiple-thread programs or signal handlers, since it is possible
        that overwriting the <structname>PGconn</structname>'s error message will
        mess up the operation currently in progress on the connection.
       </para>
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 5fc16be849..9e75abd304 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4394,14 +4394,17 @@ PQfreeCancel(PGcancel *cancel)


 /*
- * PQcancel and PQrequestCancel: attempt to request cancellation of the
- * current operation.
+ * PQcancel: request query cancel
  *
  * The return value is true if the cancel request was successfully
  * dispatched, false if not (in which case an error message is available).
  * Note: successful dispatch is no guarantee that there will be any effect at
  * the backend.  The application must read the operation result as usual.
  *
+ * On failure, an error message is stored in *errbuf, which must be of size
+ * errbufsize (recommended size is 256 bytes).  *errbuf is not changed on
+ * success return.
+ *
  * CAUTION: we want this routine to be safely callable from a signal handler
  * (for example, an application might want to call it in a SIGINT handler).
  * This means we cannot use any C library routine that might be non-reentrant.
@@ -4409,13 +4412,9 @@ PQfreeCancel(PGcancel *cancel)
  * just as dangerous.  We avoid sprintf here for that reason.  Building up
  * error messages with strcpy/strcat is tedious but should be quite safe.
  * We also save/restore errno in case the signal handler support doesn't.
- *
- * internal_cancel() is an internal helper function to make code-sharing
- * between the two versions of the cancel function possible.
  */
-static int
-internal_cancel(SockAddr *raddr, int be_pid, int be_key,
-                char *errbuf, int errbufsize)
+int
+PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
 {
     int            save_errno = SOCK_ERRNO;
     pgsocket    tmpsock = PGINVALID_SOCKET;
@@ -4427,18 +4426,26 @@ internal_cancel(SockAddr *raddr, int be_pid, int be_key,
         CancelRequestPacket cp;
     }            crp;

+    if (!cancel)
+    {
+        strlcpy(errbuf, "PQcancel() -- no cancel object supplied", errbufsize);
+        /* strlcpy probably doesn't change errno, but be paranoid */
+        SOCK_ERRNO_SET(save_errno);
+        return false;
+    }
+
     /*
      * We need to open a temporary connection to the postmaster. Do this with
      * only kernel calls.
      */
-    if ((tmpsock = socket(raddr->addr.ss_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
+    if ((tmpsock = socket(cancel->raddr.addr.ss_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
     {
         strlcpy(errbuf, "PQcancel() -- socket() failed: ", errbufsize);
         goto cancel_errReturn;
     }
 retry3:
-    if (connect(tmpsock, (struct sockaddr *) &raddr->addr,
-                raddr->salen) < 0)
+    if (connect(tmpsock, (struct sockaddr *) &cancel->raddr.addr,
+                cancel->raddr.salen) < 0)
     {
         if (SOCK_ERRNO == EINTR)
             /* Interrupted system call - we'll just try again */
@@ -4455,8 +4462,8 @@ retry3:

     crp.packetlen = pg_hton32((uint32) sizeof(crp));
     crp.cp.cancelRequestCode = (MsgType) pg_hton32(CANCEL_REQUEST_CODE);
-    crp.cp.backendPID = pg_hton32(be_pid);
-    crp.cp.cancelAuthCode = pg_hton32(be_key);
+    crp.cp.backendPID = pg_hton32(cancel->be_pid);
+    crp.cp.cancelAuthCode = pg_hton32(cancel->be_key);

 retry4:
     if (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp))
@@ -4508,27 +4515,6 @@ cancel_errReturn:
     return false;
 }

-/*
- * PQcancel: request query cancel
- *
- * Returns true if able to send the cancel request, false if not.
- *
- * On failure, an error message is stored in *errbuf, which must be of size
- * errbufsize (recommended size is 256 bytes).  *errbuf is not changed on
- * success return.
- */
-int
-PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
-{
-    if (!cancel)
-    {
-        strlcpy(errbuf, "PQcancel() -- no cancel object supplied", errbufsize);
-        return false;
-    }
-
-    return internal_cancel(&cancel->raddr, cancel->be_pid, cancel->be_key,
-                           errbuf, errbufsize);
-}

 /*
  * PQrequestCancel: old, not thread-safe function for requesting query cancel
@@ -4546,6 +4532,7 @@ int
 PQrequestCancel(PGconn *conn)
 {
     int            r;
+    PGcancel   *cancel;

     /* Check we have an open connection */
     if (!conn)
@@ -4561,8 +4548,19 @@ PQrequestCancel(PGconn *conn)
         return false;
     }

-    r = internal_cancel(&conn->raddr, conn->be_pid, conn->be_key,
-                        conn->errorMessage.data, conn->errorMessage.maxlen);
+    cancel = PQgetCancel(conn);
+    if (cancel)
+    {
+        r = PQcancel(cancel, conn->errorMessage.data,
+                     conn->errorMessage.maxlen);
+        PQfreeCancel(cancel);
+    }
+    else
+    {
+        strlcpy(conn->errorMessage.data, "out of memory",
+                conn->errorMessage.maxlen);
+        r = false;
+    }

     if (!r)
         conn->errorMessage.len = strlen(conn->errorMessage.data);
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 9e75abd304..fb3b1764fa 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1944,26 +1944,17 @@ setKeepalivesCount(PGconn *conn)
 /*
  * Enable keepalives and set the keepalive values on Win32,
  * where they are always set in one batch.
+ *
+ * setKeepalivesWin32 is used by PQcancel, and had better be signal safe.
  */
 static int
-setKeepalivesWin32(PGconn *conn)
+setKeepalivesWin32(pgsocket sock, int idle, int interval)
 {
     struct tcp_keepalive ka;
     DWORD        retsize;
-    int            idle = 0;
-    int            interval = 0;

-    if (conn->keepalives_idle &&
-        !parse_int_param(conn->keepalives_idle, &idle, conn,
-                         "keepalives_idle"))
-        return 0;
     if (idle <= 0)
         idle = 2 * 60 * 60;        /* 2 hours = default */
-
-    if (conn->keepalives_interval &&
-        !parse_int_param(conn->keepalives_interval, &interval, conn,
-                         "keepalives_interval"))
-        return 0;
     if (interval <= 0)
         interval = 1;            /* 1 second = default */

@@ -1971,7 +1962,7 @@ setKeepalivesWin32(PGconn *conn)
     ka.keepalivetime = idle * 1000;
     ka.keepaliveinterval = interval * 1000;

-    if (WSAIoctl(conn->sock,
+    if (WSAIoctl(sock,
                  SIO_KEEPALIVE_VALS,
                  (LPVOID) &ka,
                  sizeof(ka),
@@ -1981,6 +1972,26 @@ setKeepalivesWin32(PGconn *conn)
                  NULL,
                  NULL)
         != 0)
+        return 0;
+    return 1;
+}
+
+static int
+prepKeepalivesWin32(PGconn *conn)
+{
+    int            idle = -1;
+    int            interval = -1;
+
+    if (conn->keepalives_idle &&
+        !parse_int_param(conn->keepalives_idle, &idle, conn,
+                         "keepalives_idle"))
+        return 0;
+    if (conn->keepalives_interval &&
+        !parse_int_param(conn->keepalives_interval, &interval, conn,
+                         "keepalives_interval"))
+        return 0;
+
+    if (!setKeepalivesWin32(conn->sock, idle, interval))
     {
         appendPQExpBuffer(&conn->errorMessage,
                           libpq_gettext("%s(%s) failed: error code %d\n"),
@@ -2644,7 +2655,7 @@ keep_going:                        /* We will come back to here until there is
                             err = 1;
 #else                            /* WIN32 */
 #ifdef SIO_KEEPALIVE_VALS
-                        else if (!setKeepalivesWin32(conn))
+                        else if (!prepKeepalivesWin32(conn))
                             err = 1;
 #endif                            /* SIO_KEEPALIVE_VALS */
 #endif                            /* WIN32 */
@@ -4380,8 +4391,53 @@ PQgetCancel(PGconn *conn)
     memcpy(&cancel->raddr, &conn->raddr, sizeof(SockAddr));
     cancel->be_pid = conn->be_pid;
     cancel->be_key = conn->be_key;
+    /* We use -1 to indicate an unset connection option */
+    cancel->pgtcp_user_timeout = -1;
+    cancel->keepalives = -1;
+    cancel->keepalives_idle = -1;
+    cancel->keepalives_interval = -1;
+    cancel->keepalives_count = -1;
+    if (conn->pgtcp_user_timeout != NULL)
+    {
+        if (!parse_int_param(conn->pgtcp_user_timeout,
+                             &cancel->pgtcp_user_timeout,
+                             conn, "tcp_user_timeout"))
+            goto fail;
+    }
+    if (conn->keepalives != NULL)
+    {
+        if (!parse_int_param(conn->keepalives,
+                             &cancel->keepalives,
+                             conn, "keepalives"))
+            goto fail;
+    }
+    if (conn->keepalives_idle != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_idle,
+                             &cancel->keepalives_idle,
+                             conn, "keepalives_idle"))
+            goto fail;
+    }
+    if (conn->keepalives_interval != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_interval,
+                             &cancel->keepalives_interval,
+                             conn, "keepalives_interval"))
+            goto fail;
+    }
+    if (conn->keepalives_count != NULL)
+    {
+        if (!parse_int_param(conn->keepalives_count,
+                             &cancel->keepalives_count,
+                             conn, "keepalives_count"))
+            goto fail;
+    }

     return cancel;
+
+fail:
+    free(cancel);
+    return NULL;
 }

 /* PQfreeCancel: free a cancel structure */
@@ -4393,6 +4449,25 @@ PQfreeCancel(PGcancel *cancel)
 }


+/*
+ * Sets an integer socket option on a TCP socket, if the provided value is
+ * not negative.  Returns false if setsockopt fails for some reason.
+ *
+ * CAUTION: This needs to be signal safe, since it's used by PQcancel.
+ */
+#if defined(TCP_USER_TIMEOUT) || !defined(WIN32)
+static bool
+optional_setsockopt(int fd, int protoid, int optid, int value)
+{
+    if (value < 0)
+        return true;
+    if (setsockopt(fd, protoid, optid, (char *) &value, sizeof(value)) < 0)
+        return false;
+    return true;
+}
+#endif
+
+
 /*
  * PQcancel: request query cancel
  *
@@ -4443,6 +4518,78 @@ PQcancel(PGcancel *cancel, char *errbuf, int errbufsize)
         strlcpy(errbuf, "PQcancel() -- socket() failed: ", errbufsize);
         goto cancel_errReturn;
     }
+
+    /*
+     * Since this connection will only be used to send a tiny bit of data, we
+     * don't need NODELAY.  We also don't set the socket to nonblocking mode,
+     * because the API definition of PQcancel requires the cancel to be sent
+     * in a blocking way.
+     *
+     * We do set socket options related to keepalives and other TCP timeouts.
+     * This ensures that this function does not block indefinitely when
+     * reasonable keepalive and timeout settings have been provided.
+     */
+    if (!IS_AF_UNIX(cancel->raddr.addr.ss_family) &&
+        cancel->keepalives != 0)
+    {
+#ifndef WIN32
+        if (!optional_setsockopt(tmpsock, SOL_SOCKET, SO_KEEPALIVE, 1))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(SO_KEEPALIVE) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+
+#ifdef PG_TCP_KEEPALIVE_IDLE
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, PG_TCP_KEEPALIVE_IDLE,
+                                 cancel->keepalives_idle))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(" PG_TCP_KEEPALIVE_IDLE_STR ") failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#ifdef TCP_KEEPINTVL
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_KEEPINTVL,
+                                 cancel->keepalives_interval))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_KEEPINTVL) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#ifdef TCP_KEEPCNT
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_KEEPCNT,
+                                 cancel->keepalives_count))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_KEEPCNT) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+
+#else                            /* WIN32 */
+
+#ifdef SIO_KEEPALIVE_VALS
+        if (!setKeepalivesWin32(tmpsock,
+                                cancel->keepalives_idle,
+                                cancel->keepalives_interval))
+        {
+            strlcpy(errbuf, "PQcancel() -- WSAIoctl(SIO_KEEPALIVE_VALS) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif                            /* SIO_KEEPALIVE_VALS */
+#endif                            /* WIN32 */
+
+        /* TCP_USER_TIMEOUT works the same way on Unix and Windows */
+#ifdef TCP_USER_TIMEOUT
+        if (!optional_setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+                                 cancel->pgtcp_user_timeout))
+        {
+            strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_USER_TIMEOUT) failed: ", errbufsize);
+            goto cancel_errReturn;
+        }
+#endif
+    }
+
 retry3:
     if (connect(tmpsock, (struct sockaddr *) &cancel->raddr.addr,
                 cancel->raddr.salen) < 0)
@@ -4454,10 +4601,6 @@ retry3:
         goto cancel_errReturn;
     }

-    /*
-     * We needn't set nonblocking I/O or NODELAY options here.
-     */
-
     /* Create and send the cancel request packet. */

     crp.packetlen = pg_hton32((uint32) sizeof(crp));
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index fcce13843e..4290553482 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -583,6 +583,13 @@ struct pg_cancel
     SockAddr    raddr;            /* Remote address */
     int            be_pid;            /* PID of backend --- needed for cancels */
     int            be_key;            /* key of backend --- needed for cancels */
+    int            pgtcp_user_timeout; /* tcp user timeout */
+    int            keepalives;        /* use TCP keepalives? */
+    int            keepalives_idle;    /* time between TCP keepalives */
+    int            keepalives_interval;    /* time between TCP keepalive
+                                         * retransmits */
+    int            keepalives_count;    /* maximum number of TCP keepalive
+                                     * retransmits */
 };



... btw, speaking of signal-safe functions: I am dismayed to
notice that strerror (and strerror_r) are *not* in POSIX's
list of async-signal-safe functions.  This is really quite
unsurprising, considering that they are chartered to return
locale-dependent strings.  Unless the data has already been
collected in the current process, that'd imply reading something
from the locale definition files, allocating memory to hold it,
etc.

So I'm now thinking this bit in PQcancel is completely unsafe:

        strncat(errbuf, SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)),
                maxlen);

Seems we have to give up on providing any details beyond the
name of the function that failed.

            regards, tom lane



Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
Thanks for all the cleanup and adding of windows support. To me it now looks good to merge.

Meanwhile I've created another patch that adds, a non-blocking version of PQcancel to libpq.
Which doesn't have this problem by design, because it simply reuses the normal code for
connection establishement. And it also includes tests for PQcancel itself.


Jelte Fennema <Jelte.Fennema@microsoft.com> writes:
> Thanks for all the cleanup and adding of windows support. To me it now looks good to merge.

I was about to commit this when I started to wonder if it actually does
anything useful.  In particular, I read in the Linux tcp(7) man page

       TCP_USER_TIMEOUT (since Linux 2.6.37)
              ...
              This option can be set during any state of a TCP connection, but
              is effective only during the synchronized states of a connection
              (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT,  CLOSING,  and
              LAST-ACK).

ISTM that the case we care about is where the server fails to respond
to the TCP connection request.  If it does so, it seems pretty unlikely
that it wouldn't then eat the small amount of data we're going to send.

While the keepalive options aren't so explicitly documented, I'd bet that
they too don't have any effect until the connection is known established.

So I'm unconvinced that setting these values really has much effect
to make PQcancel more robust (or more accurately, make it fail faster).
I would like to know what scenarios you tested to convince you that
this is worth doing.

            regards, tom lane



Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

От
Jelte Fennema
Дата:
It seems the man page of TCP_USER_TIMEOUT does not align with
reality then. When I use it on my local machine it is effectively used
as a connection timeout too. The second command times out after
two seconds:

sudo iptables -A INPUT -p tcp --destination-port 5432 -j DROP
psql 'host=localhost tcp_user_timeout=2000'

The keepalive settings only apply once you get to the recv however. And yes,
it is pretty unlikely for the connection to break right when it is waiting for data.
But it has happened for us. And when it happens it is really bad, because
the process will be blocked forever. Since it is a blocking call.

After investigation when this happened it seemed to be a combination of a few
things making this happen:
1. The way citus uses cancelation requests: A Citus query on the coordinator creates
   multiple connections to a worker and with 2PC for distributed transactions. If one
   connection receives an error it sends a cancel request for all others.
2. When a machine is under heavy CPU or memory pressure things don't work
   well:
   i. errors can occur pretty frequently, causing lots of cancels to be sent by Citus.
   ii. postmaster can be slow in handling new cancelation requests.
   iii. Our failover system can think the node is down, because health checks are
      failing.
3. Our failover system effectively cuts the power and the network of the primary
   when it triggers a fail over to the secondary

This all together can result in a cancel request being interrupted right at that
wrong moment. And when it happens a distributed query on the Citus
coordinator, becomes blocked forever. We've had queries stuck in this state
for multiple days. The only way to get out of it at that point is either by restarting
postgres or manually closing the blocked socket (either with ss or gdb).

Jelte


Jelte Fennema <Jelte.Fennema@microsoft.com> writes:
> It seems the man page of TCP_USER_TIMEOUT does not align with 
> reality then. When I use it on my local machine it is effectively used
> as a connection timeout too.

Huh, I should have thought to try that.  I confirm this behavior
on RHEL8 (kernel 4.18.0).  Not the first mistake I've seen in
Linux man pages :-(.

Doesn't seem to help on macOS, but AFAICT that platform doesn't
have TCP_USER_TIMEOUT, so no surprise there.

Anyway, that removes my objection, so I'll proceed with committing.
Thanks for working on this patch!

            regards, tom lane