Re: [PoC] Delegating pg_ident to a third party

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: [PoC] Delegating pg_ident to a third party
Дата
Msg-id CAOuzzgpFpuroNRabEvB9kST_TSyS2jFicBNoXvW7G2pZFixyBw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Delegating pg_ident to a third party  (Jacob Champion <pchampion@vmware.com>)
Ответы Re: [PoC] Delegating pg_ident to a third party  (Jacob Champion <pchampion@vmware.com>)
Список pgsql-hackers
Greetings,

On Tue, Jan 4, 2022 at 18:56 Jacob Champion <pchampion@vmware.com> wrote:
On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote:
> * Jacob Champion (pchampion@vmware.com) wrote:
> >
> > That last point was my motivation for the authn_id patch [1] -- so that
> > auditing could see the actual user _and_ the generic role. The
> > information is already there to be used, it's just not exposed to the
> > stats framework yet.
>
> While that helps, and I generally support adding that information to the
> logs, it's certainly not nearly as good or useful as having the actual
> user known to the database.

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob@example.net, logged in as admin" is any worse
than "logged in as jacob".

The above case isn’t what we are talking about, as far as I understand anyway. You’re suggesting “authenticated as jacob@example.net, logged in as sales” where the user in the database is “sales”.  Consider triggers which only have access to “sales”, or a tool like pgaudit which only has access to “sales”.  Who was it in sales that updated that record though?  We don’t know- we would have to go try to figure it out from the logs, but even if we had time stamps on the row update, there could be 50 sales people logged in at overlapping times.

> > Forcing one role per individual end user is wasteful and isn't really
> > making good use of the role-based system that you already have.
> > Generally speaking, when administering hundreds or thousands of users,
> > people start dividing them up into groups as opposed to dealing with
> > them individually. So I don't think new features should be taking away
> > flexibility in this area -- if one role per user already works well for
> > you, great, but don't make everyone do the same.
>
> Using the role system we have to assign privileges certainly is useful
> and sensible, of course, though I don't see where you've actually made
> an argument for why one role per individual is somehow wasteful or
> somehow takes away from the role system that we have for granting
> rights.

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

Just saying it’s wasteful isn’t actually saying what is wasteful about it.

> I'm also not suggesting that we make everyone do the same
> thing, indeed, later on I was supportive of having an external system
> provide the mapping.  Here, I'm just making the point that we should
> also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

Not sure exactly what you’re referring to here by “administer role privileges externally too”..?  Curious to hear what you are imagining specifically.

> > > I'd go a step further and suggest that the way to do this is with a
> > > background worker that's started up and connects to an LDAP
> > > infrastructure and listens for changes, allowing the system to pick up
> > > on new roles/memberships as soon as they're created in the LDAP
> > > environment.  That would then be controlled by appropriate settings in
> > > postgresql.conf/.auto.conf.
> >
> > This is roughly what you can already do with existing (third-party)
> > tools, and that approach isn't scaling out in practice for some of our
> > existing customers. The load on the central server, for thousands of
> > idle databases dialing in just to see if there are any new users, is
> > huge.
>
> If you're referring specifically to cron-based tools which are
> constantly hammering on the LDAP servers running the same queries over
> and over, sure, I agree that that's creating load on the LDAP
> infrastructure (though, well, it was kind of designed to be very
> scalable for exactly that kind of load, no?  So I'm not really sure why
> that's such an issue..).

I don't have hands-on experience here -- just going on what I've been
told via field/product teams -- but it seems to me that there's a big
difference between asking an LDAP server to give you information on a
user at the time that user logs in, and asking it to give a list of
_all_ users to every single Postgres instance you have on a regular
timer. The latter is what seems to be problematic.

And to be clear, I agree that’s not good (though, again, really, your ldap infrastructure shouldn’t be having all that much trouble with it- you can scale those out verryyyy far, and far more easily than a relational database..).

I’d also point out though that having to do an ldap lookup on every login to PG is *already* an issue in some environments, having to do multiple amplifies that.  Not to mention that when the ldap servers can’t be reached for some reason, no one can log into the database and that’s rather unfortunate too. These are, of course, arguments for moving away from methods that require checking with some other system synchronously during login- which is another reason why it’s better to have the authentication credentials easily map to the PG role, without the need for external checks at login time.  That’s done with today’s pg_ident, but this patch would change that.

Consider the approach I continue to advocate- GSSAPI based authentication, where a user only needs to contact the Kerberos server perhaps every 8 hours or so for an updated ticket but otherwise can authorize directly to PG using their existing ticket and credentials, where their role was previously created and their memberships already exist thanks to a background worker whose job it is to handle that and which deals with transient network failures or other issues. In this world, most logins to PG don’t require any other system to be involved besides the client, the PG server, and the networking between them; perhaps DNS if things aren’t cached on the client. 

On the other hand, to use ldap authentication (which also happens to be demonstrable insecure without any reasonable way to fix that), with an ldap mapping setup, requires two logins to an ldap server every single time a user logs into PG and if the ldap environment is offline or overloaded for whatever reason, the login fails or takes an excessively long amount of time.

> That's also why I specifically wasn't
> suggesting that and was instead suggesting that we have something that's
> connected to one of the (hopefully, many, many) LDAP servers and is
> doing change monitoring, allowing changes to be pushed down to PG,
> rather than cronjobs constantly running the same queries and re-checking
> things over and over.  I appreciate that that's also not free, but I
> don't believe it's nearly as bad as the cron-based approach and it's
> certainly something that an LDAP infrastructure should be really rather
> good at.

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

I mean … let’s talk about the one that really matters here: 

> > > All that said, I do see how having the ability to call out to another
> > > system for mappings may be useful, so I'm not sure that we shouldn't
> > > consider this specific change and have it be specifically just for
> > > mappings, in which case pg_ident seems appropriate.
> >
> > Yeah, this PoC was mostly an increment on the functionality that
> > already existed. The division between what goes in pg_hba and what goes
> > in pg_ident is starting to blur with this patchset, though, and I think
> > Peter's point is sound.
>
> This part I tend to disagree with- pg_ident for mappings and for ways to
> call out to other systems to provide those mappings strikes me as
> entirely appropriate and doesn't blur the lines and that's really what
> this patch seems to be primarily about.  Peter noted that there might be
> other things we want to do and argued that those might not be
> appropriate in pg_ident, which I tend to agree with, but I don't think
> we need to invent something entirely new for mappings when we have
> pg_ident already.

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

I can agree with the point that it would be nicer to have the ldap host/port/base/filter be in the hba instead, if there is a way to accomplish that reasonably. Did you have a suggestion in mind for how to do that..?  If there’s an alternative approach to consider, it’d be useful to see them next to each other and then we could all contemplate which is better.

> When it comes to the question of "how to connect to an LDAP server for
> $whatever", it seems like it'd be nice to be able to configure that once
> and reuse that configuration.  Not sure I have a great suggestion for
> how to do that. The approach this patch takes of adding options to
> pg_hba for that, just like other options in pg_hba do, strikes me as
> pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

If the ldap connection info and filters and such could all exist in the hba, then perhaps a way to define those credentials in one place in the hba file and then use them on other lines would be possible..?  Seems like that would be easier than having them also in the ident or having the ident refer to something defined elsewhere. 

Consider in the hba having:

LDAPSERVER[ldap1]=“ldaps://whatever other options go here”

Then later:

hostssl all all ::0/0 ldap ldapserver=ldap1 ldapmapserver=ldap1 map=myldapmap

Clearly needs more thought needed due to different requirements for ldap authentication vs. the map, but still, the general idea being to have all of it in the hba and then a way to define ldap server configuration in the hba once and then reused.

> I would advocate for other methods to work when it comes to
> authenticating to LDAP from PG though (such as GSSAPI, in particular,
> of course...).

I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?

GSSAPI is the main one … I suppose client side certificates would be nice too if that’s possible.  I suspect some would like a way to have username/pw ldap credentials in some other file besides the hba, but that isn’t as interesting to me, at least.

> > > I certainly don't think we should have this be limited to LDAP auth-
> > > such an external mapping ability is suitable for any authentication
> > > method that supports a mapping (thinking specifically of GSSAPI, of
> > > course..).  Not sure if that's what was meant above but did want to
> > > make sure that was clear.
> >
> > You can't use usermaps with LDAP auth yet, so no, that's not what I
> > meant. (I have another patch for that feature in commitfest, which
> > would allow these two things to be used together.)
>
> Yes, I'm aware of the other patch, just wanted to make sure the intent
> is for this to work for all map-supporting auth methods.  Figured that
> was the case but the examples in the prior email had me concerned and
> just wanted to make sure.

Correct. The new tests use cert auth, for instance.

Great.

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Corey Huinker
Дата:
Сообщение: Re: Suggestion: optionally return default value instead of error on failed cast
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side