Обсуждение: [HACKERS] PQhost may return socket dir for network connection
Hello. As the subject, PQhost() seems to be forgeting about the case where only hostaddr is specified in a connection string. PQhost() returns /var/run/postgresql or such for a connection made from the connection string "hostaddr=127.0.0.1 port=5432 dbname=postgres". I don't believe this behavior is useful. The first attached file is a fix for master. connhost[]->pghost is properly set by connectOptions2() so we don't need to handle the default case in PQhost. The second is a fix for 9.6. connectOption2 was setting pgunixsocket then clearing pghost reagardless of the value of pghostaddr. This behavior inhibited PQhost() from getting the right name to return. The behavior for several connection strings after this fix are the follows. The first line is the case fixed by this patch. 1. PQhost = 127.0.0.1 <= "hostaddr=127.0.0.1 port=5432 dbname=postgres" 2. PQhost = /tmp <= "port=5432 dbname=postgres" 3. PQhost = /tmp <= "host=/tmp port=5432 dbname=postgres" 4. PQhost = /tmp <= "host=/tmp hostaddr=127.0.0.1 port=5432 dbname=postgres" 5. PQhost = localhost <= "host=localhost port=5432 dbname=postgres" 6. PQhost = localhost <= "host=localhost hostaddr=127.0.0.1 port=5432 dbname=postgres" 7. PQhost = hoge <= "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres" The third case is somewhat confusing but it would be the user's fault. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *************** *** 5836,5854 **** PQhost(const PGconn *conn) { if (!conn) return NULL; ! if (conn->connhost != NULL && ! conn->connhost[conn->whichhost].type != CHT_HOST_ADDRESS) ! return conn->connhost[conn->whichhost].host; ! else if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! else ! { ! #ifdef HAVE_UNIX_SOCKETS ! return DEFAULT_PGSOCKET_DIR; ! #else ! return DefaultHost; ! #endif ! } } char * --- 5836,5857 ---- { if (!conn) return NULL; ! ! /* ! * We should try to avoid returning ip address. connhost[]->host stores IP ! * address in the case of CHT_HOST_ADDRESS so try to use conn->pghost ! * instead. ! */ ! if (conn->connhost == NULL || ! (conn->connhost[conn->whichhost].type == CHT_HOST_ADDRESS && ! conn->pghost != NULL && conn->pghost[0] != '\0')) return conn->pghost; ! ! /* ! * Otherwise we use this as host name even though it might be an IP ! * address. ! */ ! return conn->connhost[conn->whichhost].host; } char * *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *************** *** 819,827 **** connectOptions2(PGconn *conn) } /* ! * Allow unix socket specification in the host name */ ! if (conn->pghost && is_absolute_path(conn->pghost)) { if (conn->pgunixsocket) free(conn->pgunixsocket); --- 819,828 ---- } /* ! * Allow unix socket specification in the host name if hostaddr is not set */ ! if ((conn->pghostaddr == NULL || conn->pghostaddr[0] == '\0') && ! conn->pghost && is_absolute_path(conn->pghost)) { if (conn->pgunixsocket) free(conn->pgunixsocket); *************** *** 5354,5370 **** PQhost(const PGconn *conn) return NULL; if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! else ! { #ifdef HAVE_UNIX_SOCKETS ! if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0') ! return conn->pgunixsocket; ! else ! return DEFAULT_PGSOCKET_DIR; #else ! return DefaultHost; #endif - } } char * --- 5355,5369 ---- return NULL; if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0') ! return conn->pghostaddr; #ifdef HAVE_UNIX_SOCKETS ! if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0') ! return conn->pgunixsocket; ! return DEFAULT_PGSOCKET_DIR; #else ! return DefaultHost; #endif } char *
On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > As the subject, PQhost() seems to be forgeting about the case > where only hostaddr is specified in a connection string. I suspect that may have been intentional. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> As the subject, PQhost() seems to be forgeting about the case >> where only hostaddr is specified in a connection string. > I suspect that may have been intentional. See commits 11003eb55 and 40cb21f70 for recent history in this area. Further back there's more history around host vs. hostaddr. We've gone back and forth on this in the past, including an ultimately reverted attempt to add "PQhostaddr()", so it'd be a good idea to study those past threads before proposing a change here. Having said that, the behavior stated in $subject does sound wrong. regards, tom lane
On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>> As the subject, PQhost() seems to be forgeting about the case >>> where only hostaddr is specified in a connection string. > >> I suspect that may have been intentional. > > See commits 11003eb55 and 40cb21f70 for recent history in this area. > Further back there's more history around host vs. hostaddr. We've > gone back and forth on this in the past, including an ultimately > reverted attempt to add "PQhostaddr()", so it'd be a good idea to > study those past threads before proposing a change here. > > Having said that, the behavior stated in $subject does sound wrong. I'm not sure. My understanding of the relationship between host and hostaddr is that hostaddr overrides our notion of where to find host, but not our notion of the host to which we're connecting. Under that definition, the current behavior as described by Kyotaro sounds correct. When you say host=X hostaddr=Y, we act as though we're connecting to X but try to connect to IP Y. When you just specify hostaddr=Y, we act as if we're trying to connect to the default host (which happens to be a socket address in that example) but actually use the specified IP address. That's consistent. Now, against that, https://www.postgresql.org/docs/9.6/static/libpq-connect.html says: -- If hostaddr is specified without host, the value for hostaddr gives the server network address. The connection attempt will fail if the authentication method requires a host name. -- And PQhost() asks for the host name, so you could argue that it too should "fail" in this situation. But it has no way to report failure, so that's kind of problematic. https://www.postgresql.org/docs/9.6/static/libpq-status.html says: -- Without either a host name or host address, libpq will connect using a local Unix-domain socket; or on machines without Unix-domain sockets, it will attempt to connect to localhost. -- But that's just about where to make the connection, not what we should consider the hostname to be for purposes other than calling connect(2). Kyotaro Horiguchi argues that the current behavior is "not useful" and that's probably true, but I don't think it's the job of an API to try as hard as possible to do something useful in every case. It's more important that the behavior is predictable and understandable. In short, if we're going to change the behavior of PQhost() here, then there should be a documentation change to go with it, because the current documentation around the interaction between host and hostaddr does not make it at all clear that the current behavior is wrong, at least not as far as I can see. To the contrary, it suggests that if you use hostaddr without host, you may find the results surprising or even unfortunate: -- Using hostaddr instead of host allows the application to avoid a host name look-up, which might be important in applications with time constraints. However, a host name is required for GSSAPI or SSPI authentication methods, as well as for verify-full SSL certificate verification. ... Note that authentication is likely to fail if host is not the name of the server at network address hostaddr. -- The overall impression that the documentation leaves me with is that you are expected to use only host unless you care about saving name lookups; then, use both host and hostaddr; if you want to use just hostaddr you can try it, but it'll fail to work properly if you then try to do something that needs a host name. Calling PQhost() is, perhaps, one such thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Having said that, the behavior stated in $subject does sound wrong. > I'm not sure. My understanding of the relationship between host and > hostaddr is that hostaddr overrides our notion of where to find host, > but not our notion of the host to which we're connecting. Under that > definition, the current behavior as described by Kyotaro sounds > correct. Perhaps. But hostaddr also forces us to believe that we're making an IP connection, so it still seems pretty dubious to return a socket path. The true situation is that we're connecting to an IP host that we do not know the name of. I notice that one of the recent changes was made to avoid situations where PQhost() would return NULL and thereby provoke a crash if the application wasn't expecting that (which is not unreasonable of it, since the PQhost() documentation mentions no such hazard). So I would not want to see us return NULL in this case. And I believe we already considered and rejected the idea of having it return the hostaddr string, back in some of the older discussions. (We could revisit that decision, no doubt, but let's go back and see what the reasoning was first.) But maybe returning an empty string ("") would be OK? regards, tom lane
On Mon, May 1, 2017 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Having said that, the behavior stated in $subject does sound wrong. > >> I'm not sure. My understanding of the relationship between host and >> hostaddr is that hostaddr overrides our notion of where to find host, >> but not our notion of the host to which we're connecting. Under that >> definition, the current behavior as described by Kyotaro sounds >> correct. > > Perhaps. But hostaddr also forces us to believe that we're making an > IP connection, so it still seems pretty dubious to return a socket > path. The true situation is that we're connecting to an IP host that > we do not know the name of. Yes, I think that's a reasonable interpretation. > I notice that one of the recent changes was made to avoid situations where > PQhost() would return NULL and thereby provoke a crash if the application > wasn't expecting that (which is not unreasonable of it, since the PQhost() > documentation mentions no such hazard). So I would not want to see us > return NULL in this case. > > And I believe we already considered and rejected the idea of having it > return the hostaddr string, back in some of the older discussions. > (We could revisit that decision, no doubt, but let's go back and see > what the reasoning was first.) > > But maybe returning an empty string ("") would be OK? Yeah, that might be OK. But I'd be inclined not to back-patch any behavior changes we make in this area unless it's clear that 9.6 regressed relative to previous releases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At Mon, 1 May 2017 15:48:17 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZcyGsb621i35AC9TJar__F9fQbmRPHfCOnE4aNE227HA@mail.gmail.com> > On Mon, May 1, 2017 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Having said that, the behavior stated in $subject does sound wrong. > > > >> I'm not sure. My understanding of the relationship between host and > >> hostaddr is that hostaddr overrides our notion of where to find host, > >> but not our notion of the host to which we're connecting. Under that > >> definition, the current behavior as described by Kyotaro sounds > >> correct. > > > > Perhaps. But hostaddr also forces us to believe that we're making an > > IP connection, so it still seems pretty dubious to return a socket > > path. The true situation is that we're connecting to an IP host that > > we do not know the name of. > > Yes, I think that's a reasonable interpretation. > > > I notice that one of the recent changes was made to avoid situations where > > PQhost() would return NULL and thereby provoke a crash if the application > > wasn't expecting that (which is not unreasonable of it, since the PQhost() > > documentation mentions no such hazard). So I would not want to see us > > return NULL in this case. > > And I believe we already considered and rejected the idea of having it > > return the hostaddr string, back in some of the older discussions. > > (We could revisit that decision, no doubt, but let's go back and see > > what the reasoning was first.) > > > > But maybe returning an empty string ("") would be OK? > > Yeah, that might be OK. But I'd be inclined not to back-patch any > behavior changes we make in this area unless it's clear that 9.6 > regressed relative to previous releases. I personally don't have a specific wish on this since I don't have a specific usage of PQhost. (I think that users are reposible for the result of contradicting host and hostaddr.) However, it might be a problem that the documentation doesn't mention what the returned value from PQhost is. https://www.postgresql.org/docs/9.6/static/libpq-status.html > Returns the server host name of the connection. This can be a > host name, an IP address, or a directory path if the connection > is via Unix socket. (The path case can be distinguished because > it will always be an absolute path, beginning with /.) I don't think more strict definition is required but the above should describe the expected usage or obvious limitation. Anyway it is contradicting to the current behavior. It can return a socket path for a IP connection. regards, -- Kyotaro Horiguchi NTT Open Source Software Center