Обсуждение: [HACKERS] [doc fix] PG10: wroing description on connect_timeout when multiplehosts are specified
[HACKERS] [doc fix] PG10: wroing description on connect_timeout when multiplehosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        Hello, Robert I found a wrong sentence here in the doc. I'm sorry, this is what I asked you to add... https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq-paramkeywords connect_timeout Maximum wait for connection, in seconds (write as a decimal integer string). Zero or not specified means wait indefinitely.It is not recommended to use a timeout of less than 2 seconds. This timeout applies separately to each connectionattempt. For example, if you specify two hosts and both of them are unreachable, and connect_timeout is 5, thetotal time spent waiting for a connection might be up to 10 seconds. The program behavior is that libpq times out after connect_timeout seconds regardless of how many hosts are specified. Iconfirmed it like this: $ export PGOPTIONS="-c post_auth_delay=30" $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p 5432,5433 (psql erros out after 5 seconds) Could you fix the doc with something like this? "This timeout applies across all the connection attempts. For example, if you specify two hosts and both of them are unreachable,and connect_timeout is 5, the total time spent waiting for a connection is up to 5 seconds." Should we also change the minimum "2 seconds" part to be longer, according to the number of hosts? Regards Takayuki Tsunakawa
[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout whenmultiple hosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa, > Takayuki > I found a wrong sentence here in the doc. I'm sorry, this is what I asked > you to add... > > https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq- > paramkeywords > > connect_timeout > Maximum wait for connection, in seconds (write as a decimal integer string). > Zero or not specified means wait indefinitely. It is not recommended to > use a timeout of less than 2 seconds. This timeout applies separately to > each connection attempt. For example, if you specify two hosts and both > of them are unreachable, and connect_timeout is 5, the total time spent > waiting for a connection might be up to 10 seconds. > > > The program behavior is that libpq times out after connect_timeout seconds > regardless of how many hosts are specified. I confirmed it like this: > > $ export PGOPTIONS="-c post_auth_delay=30" > $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p > 5432,5433 > (psql erros out after 5 seconds) > > Could you fix the doc with something like this? > > "This timeout applies across all the connection attempts. For example, if > you specify two hosts and both of them are unreachable, and connect_timeout > is 5, the total time spent waiting for a connection is up to 5 seconds." > > Should we also change the minimum "2 seconds" part to be longer, according > to the number of hosts? Instead, I think we should fix the program to match the documented behavior. Otherwise, if the first database machine isdown, libpq might wait for about 2 hours (depending on the OS's TCP keepalive setting), during which it tims out afterconnect_timeout and does not attempt to connect to other hosts. I'll add this item in the PostgreSQL 10 Open Items. Regards Takayuki Tsunakawa
[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout whenmultiple hosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa,> Takayuki > Instead, I think we should fix the program to match the documented behavior. > Otherwise, if the first database machine is down, libpq might wait for about > 2 hours (depending on the OS's TCP keepalive setting), during which it tims > out after connect_timeout and does not attempt to connect to other hosts. > > I'll add this item in the PostgreSQL 10 Open Items. Please use the attached patch to fix the problem. I confirmed the success as follows: $ add "post_auth_delay = 10" in postgresql.conf on host1 $ start database servers on host1 and host2 $ psql -h host1,host2 -p 5432,5433 -d "dbname=postgres connect_timeout=3" (psql connected to host2 after 3 seconds, which I checked with \conninfo) Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Fri, May 12, 2017 at 08:54:13AM +0000, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tsunakawa, > > Takayuki > > I found a wrong sentence here in the doc. I'm sorry, this is what I asked > > you to add... > > > > https://www.postgresql.org/docs/devel/static/libpq-connect.html#libpq- > > paramkeywords > > > > connect_timeout > > Maximum wait for connection, in seconds (write as a decimal integer string). > > Zero or not specified means wait indefinitely. It is not recommended to > > use a timeout of less than 2 seconds. This timeout applies separately to > > each connection attempt. For example, if you specify two hosts and both > > of them are unreachable, and connect_timeout is 5, the total time spent > > waiting for a connection might be up to 10 seconds. > > > > > > The program behavior is that libpq times out after connect_timeout seconds > > regardless of how many hosts are specified. I confirmed it like this: > > > > $ export PGOPTIONS="-c post_auth_delay=30" > > $ psql -d "dbname=postgres connect_timeout=5" -h localhost,localhost -p > > 5432,5433 > > (psql erros out after 5 seconds) > > > > Could you fix the doc with something like this? > > > > "This timeout applies across all the connection attempts. For example, if > > you specify two hosts and both of them are unreachable, and connect_timeout > > is 5, the total time spent waiting for a connection is up to 5 seconds." > > > > Should we also change the minimum "2 seconds" part to be longer, according > > to the number of hosts? > > Instead, I think we should fix the program to match the documented behavior. Otherwise, if the first database machineis down, libpq might wait for about 2 hours (depending on the OS's TCP keepalive setting), during which it tims outafter connect_timeout and does not attempt to connect to other hosts. > > I'll add this item in the PostgreSQL 10 Open Items. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
On Sun, May 14, 2017 at 11:45 PM, Noah Misch <noah@leadboat.com> wrote: >> I'll add this item in the PostgreSQL 10 Open Items. > > [Action required within three days. This is a generic notification.] I think there is a good argument that the existing behavior is as per the documentation, but I think we may want to change it anyway. What the documentation is saying - or at least what I believe I intended for it to say - is that connect_timeout is restarted for each new host, so you could end up waiting longer than connect_timeout - but not forever - if you specify multiple hosts. And I believe that statement to be correct. Takayuki Tsunakawa is saying something different. He's saying that when connect_timeout expires, we should try the next host instead of giving up. That may or may not be a good idea, but it doesn't contradict the passage from the documentation which he quoted. That passage from the documentation doesn't say anything at all about what happens when connect_timeout expires. It only talks about how much time might pass before that happens. Takayuki Tsunakawa raised a very similar issue in another thread related to another open item, namely https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05 in which he argued that libpq ought to try then next host after a connection failure regardless of the reason for the connection failure. Tom, Michael Paquier, and I all disagreed; none of us believe that this feature was intended to retry the connection to a different host after an arbitrary error reported by the remote server. This thread is essentially the same issue, except here the question isn't what should happen after we connect to a server and it returns an error, but rather what happens when we time out waiting to connect to a server. When that happens, should we give up, or try the next server? Despite the chorus of support for the opposite conclusion on the other thread, I'm inclined to think that it would be best to change the behavior here as per the proposed patch. The point of being able to specify multiple hosts is to be able to have multiple database servers (or perhaps, multiple ways to access the same database server) and use whichever one of those servers is currently up. I think that when the server fails with a complaint like "I've never heard of the database to which you want to connect" that's not a case of the server being down, but some other kind of trouble that the administrator really ought to fix; thus it's best to stop and report the error. But if connect_timeout expires, that sounds a whole lot like the server being down. It sounds morally equivalent to socket() or connect() failing outright, which *would* trigger advancing to the next host. So I'm inclined to accept the patch, but as a definitional change rather than a bug fix. However, I'd like to hear some other opinions. I'll wait until Friday for such opinions to arrive, and then update on next steps. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Takayuki Tsunakawa raised a very similar issue in another thread > related to another open item, namely > https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05 > in which he argued that libpq ought to try then next host after a > connection failure regardless of the reason for the connection > failure. Tom, Michael Paquier, and I all disagreed; none of us > believe that this feature was intended to retry the connection to a > different host after an arbitrary error reported by the remote server. > This thread is essentially the same issue, except here the question > isn't what should happen after we connect to a server and it returns > an error, but rather what happens when we time out waiting to connect > to a server. When that happens, should we give up, or try the next > server? FWIW, I think the position most of us were taking is that this feature is meant to retry transport-level connection failures, not cases where we successfully make a connection to a server and then it rejects our login attempt. I would classify a timeout as a transport-level failure as long as it occurs before we got any server response --- if it happens during the authentication protocol, that's less clear. But it might not be very practical to distinguish those two cases. In short, +1 for retrying on timeout during connection, and I'm okay with retrying a timeout during authentication if it's not practical to treat that differently. regards, tom lane
On Tue, May 16, 2017 at 3:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > FWIW, I think the position most of us were taking is that this feature > is meant to retry transport-level connection failures, not cases where > we successfully make a connection to a server and then it rejects our > login attempt. I would classify a timeout as a transport-level failure > as long as it occurs before we got any server response --- if it happens > during the authentication protocol, that's less clear. But it might not > be very practical to distinguish those two cases. > > In short, +1 for retrying on timeout during connection, and I'm okay with > retrying a timeout during authentication if it's not practical to treat > that differently. Sensible argument here. It could happen that a server is unresponsive, for example in split-brains (?). I have been playing a bit with the patch. + * + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if it timed out. */ pqWait is internal to libpq, so we are free to set up what we want here. Still I think that we should be consistent with what pqSocketCheck returns: * >0 means that the socket is readable/writable, counting things. * 0 is for timeout. * -1 on failure. + int ret = 0; + int timeout = 0; The declaration of "ret" should be internal in the for(;;) loop. + /* Attempt connection to the next host, starting the connect_timeout timer */ + pqDropConnection(conn, true); + conn->addr_cur = conn->connhost[conn->whichhost].addrlist; + conn->status = CONNECTION_NEEDED; + finish_time = time(NULL) + timeout; + } I think that it would be safer to not set finish_time if conn->connect_timeout is NULL. I agree that your code works because pqWaitTimed() will never complain on timeout reached if finish_time is -1. That's for robustness sake. The docs say that for connect_timeout: <para> Maximum wait for connection, in seconds (write as a decimal integer string). Zero or not specified means wait indefinitely. It is not recommended to use a timeout of lessthan 2 seconds. This timeout applies separately to each connection attempt. For example, if you specify twohosts and both of them are unreachable, and <literal>connect_timeout</> is 5, the total time spent waiting for a connection might be up to 10 seconds. </para> It seems to me that this implies that if a timeout occurs on the first connection, the counter is reset, which is what this patch is doing. So we are all set. -- Michael
Re: [HACKERS] Re: [doc fix] PG10: wroing description onconnect_timeout when multiple hosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        Hello Robert, Tom, Michael, Thanks a lot for checking my patch. Sorry, let me check Michael's review comments and reply tomorrow. Regards Takayuki Tsunakawa
On Mon, May 15, 2017 at 9:59 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > + * > + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if > it timed out. > */ > pqWait is internal to libpq, so we are free to set up what we want > here. Still I think that we should be consistent with what > pqSocketCheck returns: > * >0 means that the socket is readable/writable, counting things. > * 0 is for timeout. > * -1 on failure. That would imply a lot more change, though. The way that the patch currently does it, none of the other callers of pqWait() or pqWaitTimed() need to be adjusted. So I prefer the way that Tsunakawa Takayuki currently has this over your proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 17, 2017 at 12:47 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, May 15, 2017 at 9:59 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> + * >> + * Returns -1 on failure, 0 if the socket is readable/writable, 1 if >> it timed out. >> */ >> pqWait is internal to libpq, so we are free to set up what we want >> here. Still I think that we should be consistent with what >> pqSocketCheck returns: >> * >0 means that the socket is readable/writable, counting things. >> * 0 is for timeout. >> * -1 on failure. > > That would imply a lot more change, though. The way that the patch > currently does it, none of the other callers of pqWait() or > pqWaitTimed() need to be adjusted. So I prefer the way that Tsunakawa > Takayuki currently has this over your proposal. Consistency in APIs matters, but I won't fight hard in favor of this item either. In short I am fine to discard this comment. -- Michael
Re: [HACKERS] Re: [doc fix] PG10: wroing description onconnect_timeout when multiple hosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > pqWait is internal to libpq, so we are free to set up what we want here. > Still I think that we should be consistent with what pqSocketCheck returns: Please let this what it is now for the same reason Robert mentioned. > + int ret = 0; > + int timeout = 0; > The declaration of "ret" should be internal in the for(;;) loop. Done. > + /* Attempt connection to the next host, starting the > connect_timeout timer */ > + pqDropConnection(conn, true); > + conn->addr_cur = conn->connhost[conn->whichhost].addrlist; > + conn->status = CONNECTION_NEEDED; > + finish_time = time(NULL) + timeout; > + } > I think that it would be safer to not set finish_time if > conn->connect_timeout is NULL. I agree that your code works because > pqWaitTimed() will never complain on timeout reached if finish_time is -1. > That's for robustness sake. Done, but I'm not sure how this contributes to the robustness. I guess you were concerned just in case pqWaitTimed() returned0 (timeout) even when it should not. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, May 17, 2017 at 11:49 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier >> pqWait is internal to libpq, so we are free to set up what we want here. >> Still I think that we should be consistent with what pqSocketCheck returns: > > Please let this what it is now for the same reason Robert mentioned. > >> + int ret = 0; >> + int timeout = 0; >> The declaration of "ret" should be internal in the for(;;) loop. > > Done. > >> + /* Attempt connection to the next host, starting the >> connect_timeout timer */ >> + pqDropConnection(conn, true); >> + conn->addr_cur = conn->connhost[conn->whichhost].addrlist; >> + conn->status = CONNECTION_NEEDED; >> + finish_time = time(NULL) + timeout; >> + } >> I think that it would be safer to not set finish_time if >> conn->connect_timeout is NULL. I agree that your code works because >> pqWaitTimed() will never complain on timeout reached if finish_time is -1. >> That's for robustness sake. > > Done, but I'm not sure how this contributes to the robustness. I guess you were concerned just in case pqWaitTimed() returned0 (timeout) even when it should not. Thanks for the updated patch. This looks good to me. -- Michael
On Tue, May 16, 2017 at 11:58 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Thanks for the updated patch. This looks good to me. Committed. I also added a slight tweak to the wording of the documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: [doc fix] PG10: wroing description onconnect_timeout when multiple hosts are specified
От
 
		    	"Tsunakawa, Takayuki"
		    Дата:
		        From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Committed. I also added a slight tweak to the wording of the documentation. Thank you, the doc looks clearer. Regards Takayuki Tsunakawa