Обсуждение: RADIUS tests and improvements

Поиск
Список
Период
Сортировка

RADIUS tests and improvements

От
Thomas Munro
Дата:
Hi,

Here's a draft patch to tackle a couple of TODOs in the RADIUS code in auth.c.

The first change is to replace select() with a standard latch loop
that responds to interrupts, postmaster death etc promptly.  It's not
really too much of a big deal because the timeout was only 3 seconds
(hardcoded), but it's not good to have places that ignore ProcSignal,
and it's good to move code to our modern pattern for I/O multiplexing.

We know from experience that we have to crank timeouts up to be able
to run tests reliably on slow/valgrind/etc systems, so the second
change is to change the timeout to a GUC, as also requested by a
comment.  One good side-effect is that it becomes easy and fast to
test the timed-out code path too, with a small value.  While adding
the GUC I couldn't help wondering why RADIUS even needs a timeout
separate from authentication_timeout; another way to go here would be
to remove it completely, but that'd be a policy change (removing the 3
second timeout we always had).  Thoughts?

The patch looks bigger than it really is because it changes the
indentation level.

But first, some basic tests to show that it works.  We can test before
and after the change and have a non-zero level of confidence about
whacking the code around.  Like existing similar tests, you need to
install an extra package (FreeRADIUS) and opt in with
PG_EXTRA_TESTS=radius.  I figured out how to do that for our 3 CI
Unixen, so cfbot should run the tests and pass once I add this to the
March commitfest.  FreeRADIUS claims to work on Windows too, but I
don't know how to set that up; maybe someday someone will fix that for
all the PG_EXTRA_TESTS tests.  I've also seen this work on a Mac with
MacPorts.  There's only one pathname in there that's a wild guess:
non-Debianoid Linux systems; if you know the answer there please LMK.

Вложения

Re: RADIUS tests and improvements

От
Andreas Karlsson
Дата:
On 1/3/23 04:11, Thomas Munro wrote:
> Here's a draft patch to tackle a couple of TODOs in the RADIUS code in auth.c.

Nice to see someone working on this! I know of one company which could 
have used the configurable timeout for radius because the 3 second 
timeout is too short for 2FA. I think they ended up using PAM or some 
other solution in the end, but I am not 100% sure.

> [...] While adding
> the GUC I couldn't help wondering why RADIUS even needs a timeout
> separate from authentication_timeout; another way to go here would be
> to remove it completely, but that'd be a policy change (removing the 3
> second timeout we always had).  Thoughts?

It was some time since I last looked at the code but my impression was 
that the reason for having a separate timeout is that you can try the 
next server after the first one timed out (multiple radius servers are 
allowed). But I wonder if that really is a useful feature or if someone 
just was too clever or it just was an accidental feature.

Andreas



Re: RADIUS tests and improvements

От
Andreas Karlsson
Дата:
On 1/3/23 22:03, Andreas Karlsson wrote:
> On 1/3/23 04:11, Thomas Munro wrote:
>> Here's a draft patch to tackle a couple of TODOs in the RADIUS code in 
>> auth.c.
> 
> Nice to see someone working on this!.

Another thing: shouldn't we set some wait event to indicate that we are 
waiting the RADIUS server or is that pointless during authentication 
since there are no queries running anyway?

Andreas



Re: RADIUS tests and improvements

От
Thomas Munro
Дата:
On Wed, Jan 4, 2023 at 10:07 AM Andreas Karlsson <andreas@proxel.se> wrote:
> Another thing: shouldn't we set some wait event to indicate that we are
> waiting the RADIUS server or is that pointless during authentication
> since there are no queries running anyway?

I initially added a wait_event value, but I couldn't see it
anywhere... there is no entry in pg_stat_activity for a backend that
is in that phase of authentication, so I just set it to zero.



Re: RADIUS tests and improvements

От
Thomas Munro
Дата:
On Wed, Jan 4, 2023 at 10:03 AM Andreas Karlsson <andreas@proxel.se> wrote:
> On 1/3/23 04:11, Thomas Munro wrote:
> > [...] While adding
> > the GUC I couldn't help wondering why RADIUS even needs a timeout
> > separate from authentication_timeout; another way to go here would be
> > to remove it completely, but that'd be a policy change (removing the 3
> > second timeout we always had).  Thoughts?
>
> It was some time since I last looked at the code but my impression was
> that the reason for having a separate timeout is that you can try the
> next server after the first one timed out (multiple radius servers are
> allowed). But I wonder if that really is a useful feature or if someone
> just was too clever or it just was an accidental feature.

Ah!  Thanks, now that makes sense.



Re: RADIUS tests and improvements

От
Andreas Karlsson
Дата:
On 1/3/23 22:16, Thomas Munro wrote:
> On Wed, Jan 4, 2023 at 10:07 AM Andreas Karlsson <andreas@proxel.se> wrote:
>> Another thing: shouldn't we set some wait event to indicate that we are
>> waiting the RADIUS server or is that pointless during authentication
>> since there are no queries running anyway?
> 
> I initially added a wait_event value, but I couldn't see it
> anywhere... there is no entry in pg_stat_activity for a backend that
> is in that phase of authentication, so I just set it to zero.

Thanks for the explanation, that makes a lot of sense!

Andreas




Re: RADIUS tests and improvements

От
Thomas Munro
Дата:
New improved version:

* fixed stupid misuse of PG_FINALLY() (oops, must have been thinking
of another language)
* realised that it was strange to have a GUC for the timeout, and made
a new HBA parameter instead
* added documentation for that
* used TimestampDifferenceMilliseconds() instead of open-coded TimestampTz maths

I don't exactly love the PG_TRY()/PG_CATCH() around the
CHECK_FOR_INTERRUPTS().  In fact this kind of CFI-with-cleanup problem
has been haunting me across several projects.  For cases that memory
contexts and resource owners can't help with, I don't currently know
what else to do here.  Better ideas welcome.  If I just let that
socket leak because I know this backend will soon exit, I'd expect a
knock at the door from the programming police.

I don't actually know why we have
src/test/authentication/t/...{password,sasl,peer}..., but then
src/test/{kerberos,ldap,ssl}/t/001_auth.pl.  For this one, I just
copied the second style, creating src/test/radius/t/001_auth.pl.  I
can't explain why it should be like that, though.  If I propose
another test for PAM, where should it go?

Вложения

Re: RADIUS tests and improvements

От
Michael Paquier
Дата:
On Sat, Mar 04, 2023 at 02:23:10PM +1300, Thomas Munro wrote:
> I don't exactly love the PG_TRY()/PG_CATCH() around the
> CHECK_FOR_INTERRUPTS().

> In fact this kind of CFI-with-cleanup problem
> has been haunting me across several projects. For cases that memory
> contexts and resource owners can't help with, I don't currently know
> what else to do here.  Better ideas welcome.

Like adding a Open/CloseSocket() in fd.c to control the leaks?

> If I just let that
> socket leak because I know this backend will soon exit, I'd expect a
> knock at the door from the programming police.

Hmm.  It seems to me that you'd better have two patches instead of one
here?  First, one to introduce the new parameter to control the
timeout, and a second to improve the responsiveness with a
WaitLatch()?  If the CFI proves to be an issue, it would be sad to
have to revert the configuration part, which is worth on its own.

> I don't actually know why we have
> src/test/authentication/t/...{password,sasl,peer}..., but then
> src/test/{kerberos,ldap,ssl}/t/001_auth.pl.  For this one, I just
> copied the second style, creating src/test/radius/t/001_auth.pl.  I
> can't explain why it should be like that, though.  If I propose
> another test for PAM, where should it go?

My take would be to keep the number of directories in src/test/ to a
minimum in the long run.  Still, this is a case-by-case, as it depends
on if a set of tests needs an expanded set of modules, configuration
files and/or multiple scripts.  ssl has its own set of configuration
files with its module, so it makes sense to be independent.  ldap has
its LdapServer.pm with a configuration file, again I'm OK with a
separate case.  Kerberos has its own README, but IMO it could also be
moved to src/test/authentication/ as it has a simple structure, with
its requirements moved into a different README.

What this patch set does for the RADIUS test is simple enough in
structure that I would also add it in src/test/authentication/.  That
means less Make-fu and less Meson-fu.

In 0001, PG_TEST_EXTRA requires radius for the test.  This needs an
update of regress.sgml where the values available are listed.  I think
that you'd better document that freeradius is required for the test in
one of the README (either create a new one in radius/, or add this
information to the one in authentication, as you feel).
--
Michael

Вложения

Re: RADIUS tests and improvements

От
Daniel Gustafsson
Дата:
Hi Thomas,

Have you have a chance to look at and address the feedback given in this
thread?

--
Daniel Gustafsson