Re: Patch: Implement failover on libpq connect level.
От | Korry Douglas |
---|---|
Тема | Re: Patch: Implement failover on libpq connect level. |
Дата | |
Msg-id | 5665EB79.9070805@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Patch: Implement failover on libpq connect level. (Victor Wagner <vitus@wagner.pp.ru>) |
Ответы |
Re: Patch: Implement failover on libpq connect level.
(Korry Douglas <korry.douglas@enterprisedb.com>)
Re: Patch: Implement failover on libpq connect level. (Victor Wagner <vitus@wagner.pp.ru>) |
Список | pgsql-hackers |
> I've tried to deal with some of these problems. > > My patch have support for following things: > > 1. Check whether database instance is in the recovery/standby mode and > try to find another one if so. > 2. Let cluster management software to have some time to promote one of > the standbys to master. I.e. there can be failover timeout specified to > allow retry after some time if no working master found. > > Really there is room for some improvements in handling of connect > timeout (which became much more important thing when ability to try > next host appears). Now it is handled only by blocking-mode connect > functions, not by async state machine. But I decided to publish patch > without these improvements to get feedback from community. A bit of testing on this turns up a problem. Consider a connection string that specifies two hosts and a read/write connection: postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0 If the first host is a healthy standby (meaning that I can connect to it but pg_is_in_recovery() returns 't'), the state machine will never move on to the second host. The problem seems to be in PQconnectPoll() in the case for CONNECTION_AUTH_OK, specifically this code: /* We can release the address list now. */ pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); conn->addrlist= NULL; conn->addr_cur = NULL; That frees up the list of alternative host addresses. The state machine then progresses to CONNECTION_CHECK_RO (which invokes pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the response from the server). Since we just connected to a standby replica, pg_is_in_recovery() returns 't' and the state changes to CONNECTION_NEEDED. The next call to try_next_address() will fail to find a next address because we freed the list in the case for CONNECTION_AUTH_OK. A related issue: when the above sequence occurs, no error message is returned (because the case for CONNECTION_NEEDED thinks "an appropriate error message is already set up"). In short, if you successfully connect to standby replica (and specify readonly=0), the remaining hosts are ignored, even though one of those hosts is a master. And one comment about the code itself - in connectDBStart(), you've added quite a bit of code to parse multiple hosts/ports. I would recommend adding a comment that shows the expected format, and then choosing better variable names (better than 'p', 'q', and 'r'); perhaps the variable names could refer to components of the connection string that you are parsing (like 'host', 'port', 'delimiter', ...). That would make the code much easier to read/maintain. Thanks. -- Korry
В списке pgsql-hackers по дате отправления: