Обсуждение: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch

Поиск
Список
Период
Сортировка
Hi hackers,
This is my first time submitting a patch to PostgreSQL, so please bear with me if I've missed anything in the process.
We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when libpq connects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs mismatch (e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses for that hostname and moves directly to the next host in the connection string.

Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new parameter." The original mailing list discussion is at [1], though I wasn't able to find a clear explanation of why this approach was preferred over trying all addresses.

This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with target_session_attrs=read-write to find the primary - only the first responding IP is tried before giving up on that hostname.
The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching the existing behavior for connection failures. This would enable simpler DNS-based service discovery without requiring external tools like Consul or explicit multi-host connection strings. 
If there was a specific reason for the original design that I'm missing, I'd be happy to learn more.
Happy to address any feedback or rework the patch as needed.



Thanks,
Evgeny
Вложения
Evgeny Kuzin <evgeny.kuzin@outlook.com> writes:
> We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when
libpqconnects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs
mismatch(e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses
forthat hostname and moves directly to the next host in the connection string. 

> Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he
noted"I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new
parameter."The original mailing list discussion is at [1], though I wasn't able to find a clear explanation of why this
approachwas preferred over trying all addresses. 

> This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with
target_session_attrs=read-writeto find the primary - only the first responding IP is tried before giving up on that
hostname.

> The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching
theexisting behavior for connection failures. This would enable simpler DNS-based service discovery without requiring
externaltools like Consul or explicit multi-host connection strings. 

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts.  The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

            regards, tom lane



Hi Tom,
Thanks for the feedback. I should clarify the use case - we're not mixing read-write and read-only hosts under one DNS name by accident. This is intentional for HA failover.
We run a PostgreSQL clusters with streaming replication. After a failover, the old primary becomes a standby and vice versa. The challenge is: how do clients find the new primary?
Current options:
  1. Update DNS on every failover - operationally complex, TTL delays, requires automation
  2. Consul/etcd - adds operational complexity and another failure domain
  3. Multiple hosts in connection string - requires application changes when cluster topology changes (e.g., adding a new standby)
The proposed approach:
  • Single A-record (db.internal) pointing to all cluster member IPs
  • Clients connect with host=db.internal target_session_attrs=read-write
  • libpq tries each IP until it finds the primary
IIUC this​ is how JDBC's targetServerType=primary works - it iterates through all resolved addresses. The "useless connection attempts" are actually the feature: it's probing to find the right server, same as when you specify multiple hosts explicitly.
The only difference from host=pg1,pg2,pg3 is that DNS provides the list instead of the connection string. From libpq's perspective, why should it matter where the address list came from?



From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, March 5, 2026 2:55 PM
To: Evgeny Kuzin <evgeny.kuzin@outlook.com>
Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch
 
Evgeny Kuzin <evgeny.kuzin@outlook.com> writes:
> We've been running into an issue with "target_session_attrs" when using dns-based service discovery. Currently, when libpq connects to a host with multiple A-records and the connection succeeds but is rejected due to target_session_attrs mismatch (e.g., connecting to a read-only server with target_session_attrs=read-write), it skips all remaining addresses for that hostname and moves directly to the next host in the connection string.

> Looking at git history, I found this was a deliberate choice by Robert Haas in commit 721f7bd3cbc (2016), where he noted "I changed Mithun's patch to skip all remaining IPs for a host if we reject a connection based on this new parameter." The original mailing list discussion is at [1], though I wasn't able to find a clear explanation of why this approach was preferred over trying all addresses.

> This makes it impractical to use a single multi-A-record DNS name pointing to all cluster members with target_session_attrs=read-write to find the primary - only the first responding IP is tried before giving up on that hostname.

> The attached patch changes the behavior to try all addresses for a hostname before moving to the next host, matching the existing behavior for connection failures. This would enable simpler DNS-based service discovery without requiring external tools like Consul or explicit multi-host connection strings.

TBH, I'd say that your DNS setup is broken and you should fix it.
It makes no sense to have the same DNS entry pointing to both
read-write and read-only hosts.  The proposed patch will mainly
result in useless connection attempts in more-sanely-constructed
setups.

                        regards, tom lane

> On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> TBH, I'd say that your DNS setup is broken and you should fix it.
> It makes no sense to have the same DNS entry pointing to both
> read-write and read-only hosts.  The proposed patch will mainly
> result in useless connection attempts in more-sanely-constructed
> setups.

This is very desired feature by cloud providers.

We sell PGaaS clusters which are just a bunch of hosts. Each of 
these hosts can became primary any time.
Currently, when user adds more hosts they have to redeploy\reconfigure
their app.

Unless user uses pgx that already works this way, then we can just give
them one FQDN for whole cluster and update DNS records.

This was proposed before [0] and I think Andrew and Evgeny could join
efforts. Certainly, this can be implemented without affecting those
who do not need it.


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/patch/5396/



Thanks for the pointer to patch 5396 - I wasn't aware of Andrew Jackson's prior work on this.
I'd also like to add another argument from that thread. Artem Navrotskiy pointed out [1] that the current behavior actually contradict the documentation. The libpq docs [2] state:
"When multiple hosts are specified, or when a single host name is translated to multiple addresses, all the hosts and addresses will be tried in order, until one succeeds."
The current behavior where target_session_attrs mismatch skips remaining addresses doesn't match this. A standby successfully responding but not matching target_session_attrs isn't a "connection failure" per se, but it does prevent finding a "successful" connection according to the user's requirements.
This suggests the simpler fix might actually be correcting a deviation from documented behavior, rather than introducing new behavior requiring a new parameter (as in 5396).
Happy to coordinate with Andrew on this - perhaps the question is whether this should be:
  1. An opt-in feature (5396's check_all_addrs parameter) - preserves backward compatibility
Given the documentation wording, I'd lean toward (1), but curious what others think.
[1] https://www.postgresql.org/message-id/235381750793454@mail.yandex.ru
[2] https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS
Thanks,
Evgeny


From: Andrey Borodin <x4mmm@yandex-team.ru>
Sent: Thursday, March 5, 2026 3:16 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Evgeny Kuzin <evgeny.kuzin@outlook.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch
 


> On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> TBH, I'd say that your DNS setup is broken and you should fix it.
> It makes no sense to have the same DNS entry pointing to both
> read-write and read-only hosts.  The proposed patch will mainly
> result in useless connection attempts in more-sanely-constructed
> setups.

This is very desired feature by cloud providers.

We sell PGaaS clusters which are just a bunch of hosts. Each of
these hosts can became primary any time.
Currently, when user adds more hosts they have to redeploy\reconfigure
their app.

Unless user uses pgx that already works this way, then we can just give
them one FQDN for whole cluster and update DNS records.

This was proposed before [0] and I think Andrew and Evgeny could join
efforts. Certainly, this can be implemented without affecting those
who do not need it.


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/patch/5396/
On Sat, 7 Mar 2026 at 14:08, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 5 Mar 2026, at 19:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > TBH, I'd say that your DNS setup is broken and you should fix it.
> > It makes no sense to have the same DNS entry pointing to both
> > read-write and read-only hosts.  The proposed patch will mainly
> > result in useless connection attempts in more-sanely-constructed
> > setups.
>
> This is very desired feature by cloud providers.
>
> We sell PGaaS clusters which are just a bunch of hosts. Each of
> these hosts can became primary any time.
> Currently, when user adds more hosts they have to redeploy\reconfigure
> their app.

Somewhat related, we're using dynamic DNS to track the primary, but we
want a backup in case the dynamic DNS fails. We're using multi-host
connection strings for this, with a hostname like
"foo,foo1,foo2,foo3,foo4", where "foo" is the dynamic hostname and
"foo1"..."foo4" are CNAMEs to individual hosts. By updating the
CNAMEs, we can bring hosts in and out without reconfiguring clients.

Managing that is more complex than using a single fallback hostname
with an IP address for each host. It's annoying that we need an upper
bound on the number of potential primaries when configuring the
client. We could do better if libpq tried each IP address of a host
until it got an acceptable connection.



On Thu, Mar 5, 2026 at 10:31 AM Evgeny Kuzin <evgeny.kuzin@outlook.com> wrote:
This suggests the simpler fix might actually be correcting a deviation from documented behavior, rather than introducing new behavior requiring a new parameter (as in 5396).
+1, I think the docs have the right idea here which is "try until we get exactly what we want"

Cheers,
Greg

On Thu, 2026-03-05 at 14:59 +0000, Evgeny Kuzin wrote:
> We run a PostgreSQL clusters with streaming replication. After a failover, the old primary
> becomes a standby and vice versa. The challenge is: how do clients find the new primary?
>
> Current options:
>    1. Update DNS on every failover - operationally complex, TTL delays, requires automation

Your proposal would also suffer from TTL delays in the case of a cluster reconfiguration.

>    2. Consul/etcd - adds operational complexity and another failure domain
>    3. Multiple hosts in connection string - requires application changes when cluster
>       topology changes (e.g., adding a new standby)
>
> The proposed approach:
>  * Single A-record (db.internal) pointing to all cluster member IPs
>  * Clients connect with
>    host=db.internal target_session_attrs=read-write
>  * libpq tries each IP until it finds the primary
>
> IIUC this​ is how JDBC'stargetServerType=primary works - it iterates through all resolved
> addresses. The "useless connection attempts" are actually the feature: it's probing to
> find the right server, same as when you specify multiple hosts explicitly.
> The only difference fromhost=pg1,pg2,pg3 is that DNS provides the list instead of the
> connection string. From libpq's perspective, why should it matter where the address list came from?

I see the point of your proposal.

One example of what Tom worries about is "localhost" resolving to both "127.0.0.1" and "::1",
a very common case.  With the proposed change, any connection attempt to "localhost" that fails
would now take twice as long to fail.  Also, if the problem is authentication, the server would
perform two authentication attempts.  That is a clear regression that may affect many people.

The question is whether the overall benefits of your proposal (which certainly makes sense
in a setup like you describe) would be worth a performance and resource usage regression like
the one I described above.  Or can you see a way to modify your approach so that that problem
can be avoided?

Yours,
Laurenz Albe




> On 11 Mar 2026, at 03:18, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> The question is whether the overall benefits of your proposal (which certainly makes sense
> in a setup like you describe) would be worth a performance and resource usage regression like
> the one I described above.  Or can you see a way to modify your approach so that that problem
> can be avoided?

Version proposed by Andrew Jackson [0] adds a connection option check_all_addrs. Off by default.
This resolves potential problems of existing users.


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/patch/5396/


> One example of what Tom worries about is "localhost" resolving to both "127.0.0.1" and "::1",
> a very common case.  With the proposed change, any connection attempt to "localhost" that fails
> would now take twice as long to fail.  Also, if the problem is authentication, the server would
> perform two authentication attempts.  That is a clear regression that may affect many people.
>
> The question is whether the overall benefits of your proposal (which certainly makes sense
> in a setup like you describe) would be worth a performance and resource usage regression like
> the one I described above.  Or can you see a way to modify your approach so that that problem
> can be avoided?


Good point about the localhost regression. I agree that changing default behavior might not be the right approach.
A refinement: what if we only change behavior when target_session_attrs is explicitly set to something other than any? The logic would be:
  • target_session_attrs=any (default): current behavior unchanged
  • target_session_attrs=read-write/primary/standby/etc: iterate all addresses on mismatch
In the explicit role-aware routing case, the user is already saying "I need a specific type of server" - so probing multiple addresses is the expected behavior. It's similar to specifying host=pg1,pg2,pg3 manually.
This would address the localhost concern while enabling the HA use case for those who explicitly opt in via target_session_attrs.
The question becomes: is this a cleaner approach than a separate check_all_addrs parameter (patch 5396)? It's opt-in either way, but this ties the behavior to the feature that actually needs it.
That said, I'm happy either way - if the consensus is that 5396's explicit parameter is the better path, that works for me too. It solves the same problem. I just want to find whichever approach has the best chance of actually getting accepted, rather than having a good feature sit in review for another year.
Best regards,
Evgeny
On Wed, Mar 11, 2026 at 10:29 AM Evgeny Kuzin <evgeny.kuzin@outlook.com> wrote:
A refinement: what if we only change behavior when target_session_attrs is explicitly set to something other than any?

-1. That seems to complicate things further, and still doesn't make our code match our docs

resolving to both "127.0.0.1" and "::1",

Bit of a contrived case, but I'm not really seeing the problem here. It's certainly possible for postgres to be listening on one and not the other, and if you want to connect to a specific one, then call it out by name. Otherwise, they all get tried, which is the whole reason the Internet has those one to many mappings. To give up after the first one fails seems inherently wrong.

Cheers,
Greg

On Wed, 2026-03-11 at 15:01 +0500, Andrey Borodin wrote:
> > On 11 Mar 2026, at 03:18, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> >
> > The question is whether the overall benefits of your proposal (which certainly makes sense
> > in a setup like you describe) would be worth a performance and resource usage regression like
> > the one I described above.  Or can you see a way to modify your approach so that that problem
> > can be avoided?
>
> Version proposed by Andrew Jackson [0] adds a connection option check_all_addrs. Off by default.
> This resolves potential problems of existing users.

Ah, ok, I didn't read the patch.  If resolving all addresses is disabled by default and
has to be enabled explicitly, I have no objection.

Yours,
Laurenz Albe



Hi Evgeny,

(Evgeny asked me to weigh in on the patch. Careful what you wish for...)

I would like to, as kindly as possible, say that I don't like *either*
of these approaches, on this thread or the other. General concerns up
front:

- A read-only host and a read-write host aren't the "same host".
`target_session_attrs=any` doesn't work for your case *because*
they're not. Our protocol, and the applications on top of it, do not
consider them interchangeable. (You can maybe argue that multiple
read-only hosts could be treated as one, and I think I'd agree -- but
the proof of that is, round-robin DNS already works in that case.
Right?)

- Is POSIX getaddrinfo *guaranteed* to return every record, on all
platforms? It's not a DNS-specific API, so what's preventing a libc
from omitting the single read-write IP address you need out of a group
of twenty because [insert POSIX-allowed or IETF-mandated reason]?

- I'm no DNS expert, but I can't shake the feeling that you're
(mis)using round-robin A records to reimplement, say, SRV records [1]
(or SRVB, which dovetails with recently-standardized ECH).

On Wed, Mar 11, 2026 at 7:29 AM Evgeny Kuzin <evgeny.kuzin@outlook.com> wrote:
> A refinement: what if we only change behavior when target_session_attrs is explicitly set to something other than
any?The logic would be: 
>
> target_session_attrs=any (default): current behavior unchanged
> target_session_attrs=read-write/primary/standby/etc: iterate all addresses on mismatch

Users should not have to choose between a) target_session_attrs
fallback and b) reasonable and performant behavior with modern
hybrid-stack/multi-NIC/multihomed/etc. setups.

I think you've tangled a Postgres-level concern (find me a host with
these characteristics) with a socket-level concern (find me the
addresses for a host), and the main reason you were able to do that
was because PQconnectPoll() currently puts all those concerns into one
impossibly complex function. If someone later wanted to replace
getaddrinfo/connect with a Happy Eyeballs library, to cut down on
connection times, this proposal would prevent them from doing that.
(Both your patch, and the other thread's.) Personally I think we
should reserve the ability to use any API that says "connect me to
this hostname as fast as possible; I do not care how."

> I just want to find whichever approach has the best chance of actually getting accepted, rather than having a good
featuresit in review for another year. 

The bar for getting something into a release can (sometimes? often?)
be too high, for the wrong reasons, especially for a new contributor.
I don't want to make that problem worse; I'm very glad you're here and
focusing on this use case. But I don't think you should expect either
patch to make it into PG19 in the middle of March, unless you've
already found another committer who's willing to maintain them.

I understand why it's appealing, I think, but the discussions so far
on both threads don't convince me that this is an overall reduction of
complexity. It exposes more implementation details, which makes it
harder to improve our network connection behavior in the future. It
potentially collides with attempts to encode network topology within
the Postgres protocol. I don't think we're likely to be happy with it
in a few years.

But I do want you to be able to point libpq at a cluster and have it
Just Work. It's a good conversation to have, even if this doesn't make
it in.

--Jacob

[1] https://postgr.es/m/CAK_s-G2_3S09_EA%2BnRxxefMW%2B0-UwKE%3DUj6bCdBpPncPVRpM_g%40mail.gmail.com



> I would like to, as kindly as possible, say that I don't like *either*
> of these approaches, on this thread or the other.

I appreciate the careful pushback. A week into this discussion, I'm realizing why postgres takes this approach - a "simple" change touches millions of connections across every imaginable setup. It's worth getting right.


> I'm no DNS expert, but I can't shake the feeling that you're
> (mis)using round-robin A records to reimplement, say, SRV records

The SRV thread you mentioned seems promising - same use case (patroni/HA + target_session_attrs), clean separation of concerns. Would reviving SRV support be a direction you'd consider architecturally sound?


> I think you've tangled a Postgres-level concern (find me a host with
> these characteristics) with a socket-level concern (find me the
> addresses for a host), and the main reason you were able to do that
> was because PQconnectPoll() currently puts all those concerns into one
> impossibly complex function. If someone later wanted to replace
> getaddrinfo/connect with a Happy Eyeballs library, to cut down on
> connection times, this proposal would prevent them from doing that.
> (Both your patch, and the other thread's.) Personally I think we
> should reserve the ability to use any API that says "connect me to
> this hostname as fast as possible; I do not care how."


Another thought - what about cluster-aware routing at the protocol level? A standby could redirect to the primary - similar to HTTP 302. The cluster knows its own topology, libpq stays fast and dumb about it. That would preserve the "connect me as fast as possible" ability you mentioned. Though that feels like a bigger architectural lift compared to SRV.


Would either of these be worth exploring further?

On Wed, 11 Mar 2026 at 20:57, Jacob Champion <jacob.champion@enterprisedb.com> wrote:
Hi Evgeny,

(Evgeny asked me to weigh in on the patch. Careful what you wish for...)

I would like to, as kindly as possible, say that I don't like *either*
of these approaches, on this thread or the other. General concerns up
front:
<snip> 
- I'm no DNS expert, but I can't shake the feeling that you're
(mis)using round-robin A records to reimplement, say, SRV records [1]
(or SRVB, which dovetails with recently-standardized ECH).

Neither an A record with multiple IP addresses or SRV, or SVCB which builds on SRV, are a perfect fit here, but an A record with multiple addresses feels to me like a better fit. SRV and SVCB are intended to be used at domain level, which works well for services like LDAP, which cover full domains. So _postgres._tcp.appone.prod.example.com implies a subdomain for appone.prod.myexample.com, and may actually require the creation of that subdomain hierarchy in some DNS tooling. An A record is not necessarily a hostname, but that's generally how they're used, so having read-only and read-write services behind one record doesn't feel quite right, as you say. Viewed a bit more broadly, as an Address (or Addresses) for a resource, we end up with much the same are outcome as the SRV solution, a list of addresses. Administering A records with multiple IP addresses is also a simpler, flat process. 
 
<snip> 
I think you've tangled a Postgres-level concern (find me a host with
these characteristics) with a socket-level concern (find me the
addresses for a host), and the main reason you were able to do that
was because PQconnectPoll() currently puts all those concerns into one
impossibly complex function. If someone later wanted to replace
getaddrinfo/connect with a Happy Eyeballs library, to cut down on
connection times, this proposal would prevent them from doing that.
(Both your patch, and the other thread's.) Personally I think we
should reserve the ability to use any API that says "connect me to
this hostname as fast as possible; I do not care how."

I'd say that the boundary has moved - from "find me an endpoint from this list of hosts with these characteristics" to "find me an endpoint from this list of IPs with these characteristics" - rather than that they've become tangled. "Connect me to this list of addresses as fast as possible" still sounds like a good place to be.

<snip>  
I understand why it's appealing, I think, but the discussions so far
on both threads don't convince me that this is an overall reduction of
complexity. It exposes more implementation details, which makes it
harder to improve our network connection behavior in the future. It
potentially collides with attempts to encode network topology within
the Postgres protocol. I don't think we're likely to be happy with it
in a few years.

I can see a situation where the client's internal view of the topology could be populated by polling (which would work for any version of server) or from what was encoded in the protocol (for versions of the server which can provide it) as the features to discover topology roll out.
 
But I do want you to be able to point libpq at a cluster and have it
Just Work. It's a good conversation to have, even if this doesn't make
it in.

Regards

Alastsair 
> Another thought - what about cluster-aware routing at the protocol level? A standby could redirect to the primary -
similarto HTTP 302. The cluster knows its own topology, libpq stays fast and dumb about it.
 

The cluster knows its topology, from it's own viewpoint. Standby
saying "primary is at 10.0.0.42:5432" isn't helpful to the client,
proxies exist. This is a nice solution in a configuration where
everything uses public IPs and no proxies, as it solves the connection
issue in at most 2 connections, but it doesn't seem to be a 100%
generic always working solution.

> If someone later wanted to replace
> getaddrinfo/connect with a Happy Eyeballs library, to cut down on
> connection times, this proposal would prevent them from doing that.
> (Both your patch, and the other thread's.) Personally I think we
> should reserve the ability to use any API that says "connect me to
> this hostname as fast as possible; I do not care how."

Aren't these just variations to the same question? Which IPs to try to
connect, in which order/parallelism?

In a happy eyeballs analogy, one approach might want to connect to all
listed IPs at the same time, and return the first that responds and is
read write.



On Thu, Mar 12, 2026 at 5:22 PM Zsolt Parragi <zsolt.parragi@percona.com> wrote:
In a happy eyeballs analogy, one approach might want to connect to all listed IPs at the same time, and return the first that responds and is read write.

I would hope that "the first" read write is also "the only" read write. If you have a multi-leader situation, you almost certainly want to be quite precise about who connects to what, and not leave that up to the whims of the network gods.

I'm still a big +1 to the original proposal in this thread, and don't think it would be incompatible with happy eyeballs. Although I would think the latter would be quite wasteful, as we are not simply checking for a response, but doing a whole connect/authenticate/get-status dance. Is a quicker response more important than querying every IP in the list every time? I dunno. Maybe that's a future argument to target_session_attribs.[1]

[1] Yes, I know, but that's what the name should have been. Or even "attributes"



Cheers,
Greg

Jacob,

I appreciate that Evgeny and myself are trying to introduce patches
that, while seemingly simple, have stumbled on a bunch of complexities
(what is a host? how does libc handle address lookup?, etc) that may
not make our patches the ideal long term solution for the problem at
hand. I'm thinking that maybe it is worthwhile to take a step back at
the problem that we are trying to solve to see if there are any
alternative approaches that would be easier to merge in the medium
term.

Problem: I would like to make it easier for managed service operators
to allow client side auto discovery/failover by providing a single
place where the managed service operator can change connection
parameters so that end users don't need to update their connection
parameters which may be hardcoded, not source controlled, in excel
spreadsheets, etc. IE managed service providers should be able to
provide a connection string to customers that does not hardcode a list
of hosts as this  is subject to change as nodes get added, removed,
moved, etc.

Currently the closest thing to this that exists in libpq is libpq's
"LDAP Lookup of Connection Parameters" functionality. One issue with
this functionality that I see is that it can only be used with the
pg_service.conf file and cannot be provided in a connection string.
There is a very (37 lines of code) small patch [1] that adds this
ability and would make "LDAP Lookup of Connection Parameters" far more
accessible in my opinion.

One downside with the LDAP functionality is that many database teams
do not have the ability to dynamically change their organizations LDAP
records. An alternative that I have a very rough patch for is using
the addition of libcurl to enable connection parameters to be looked
up at an HTTP address. This is a much larger patch than the LDAP one
and is probably more controversial so understood if this is not going
to make it in the medium term (or ever).

Also thank you for your review in this thread. Definitely learned a
lot from that.

Thanks,
Andrew Jackson

[0]: https://www.postgresql.org/docs/current/libpq-ldap.html
[1]: https://commitfest.postgresql.org/patch/6390/
[2]: https://commitfest.postgresql.org/patch/6614/

On Wed, Mar 25, 2026 at 8:13 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
>
> Hi Evgeny,
>
> (Evgeny asked me to weigh in on the patch. Careful what you wish for...)
>
> I would like to, as kindly as possible, say that I don't like *either*
> of these approaches, on this thread or the other. General concerns up
> front:
>
> - A read-only host and a read-write host aren't the "same host".
> `target_session_attrs=any` doesn't work for your case *because*
> they're not. Our protocol, and the applications on top of it, do not
> consider them interchangeable. (You can maybe argue that multiple
> read-only hosts could be treated as one, and I think I'd agree -- but
> the proof of that is, round-robin DNS already works in that case.
> Right?)
>
> - Is POSIX getaddrinfo *guaranteed* to return every record, on all
> platforms? It's not a DNS-specific API, so what's preventing a libc
> from omitting the single read-write IP address you need out of a group
> of twenty because [insert POSIX-allowed or IETF-mandated reason]?
>
> - I'm no DNS expert, but I can't shake the feeling that you're
> (mis)using round-robin A records to reimplement, say, SRV records [1]
> (or SRVB, which dovetails with recently-standardized ECH).
>
> On Wed, Mar 11, 2026 at 7:29 AM Evgeny Kuzin <evgeny.kuzin@outlook.com> wrote:
> > A refinement: what if we only change behavior when target_session_attrs is explicitly set to something other than
any?The logic would be: 
> >
> > target_session_attrs=any (default): current behavior unchanged
> > target_session_attrs=read-write/primary/standby/etc: iterate all addresses on mismatch
>
> Users should not have to choose between a) target_session_attrs
> fallback and b) reasonable and performant behavior with modern
> hybrid-stack/multi-NIC/multihomed/etc. setups.
>
> I think you've tangled a Postgres-level concern (find me a host with
> these characteristics) with a socket-level concern (find me the
> addresses for a host), and the main reason you were able to do that
> was because PQconnectPoll() currently puts all those concerns into one
> impossibly complex function. If someone later wanted to replace
> getaddrinfo/connect with a Happy Eyeballs library, to cut down on
> connection times, this proposal would prevent them from doing that.
> (Both your patch, and the other thread's.) Personally I think we
> should reserve the ability to use any API that says "connect me to
> this hostname as fast as possible; I do not care how."
>
> > I just want to find whichever approach has the best chance of actually getting accepted, rather than having a good
featuresit in review for another year. 
>
> The bar for getting something into a release can (sometimes? often?)
> be too high, for the wrong reasons, especially for a new contributor.
> I don't want to make that problem worse; I'm very glad you're here and
> focusing on this use case. But I don't think you should expect either
> patch to make it into PG19 in the middle of March, unless you've
> already found another committer who's willing to maintain them.
>
> I understand why it's appealing, I think, but the discussions so far
> on both threads don't convince me that this is an overall reduction of
> complexity. It exposes more implementation details, which makes it
> harder to improve our network connection behavior in the future. It
> potentially collides with attempts to encode network topology within
> the Postgres protocol. I don't think we're likely to be happy with it
> in a few years.
>
> But I do want you to be able to point libpq at a cluster and have it
> Just Work. It's a good conversation to have, even if this doesn't make
> it in.
>
> --Jacob
>
> [1] https://postgr.es/m/CAK_s-G2_3S09_EA%2BnRxxefMW%2B0-UwKE%3DUj6bCdBpPncPVRpM_g%40mail.gmail.com
>
>
>
>