Re: [PATCH] Exponential backoff for auth_delay

Поиск
Список
Период
Сортировка
От Michael Banck
Тема Re: [PATCH] Exponential backoff for auth_delay
Дата
Msg-id 65aa8265.170a0220.1ecef.91df@mx.google.com
обсуждение исходный текст
Ответ на Re: [PATCH] Exponential backoff for auth_delay  (Abhijit Menon-Sen <ams@toroid.org>)
Ответы Re: [PATCH] Exponential backoff for auth_delay  (Michael Banck <mbanck@gmx.net>)
Re: [PATCH] Exponential backoff for auth_delay  (Abhijit Menon-Sen <ams@toroid.org>)
Список pgsql-hackers
Hi,

many thanks for the review!

I went through your comments (a lot of which pertained to the original
larger patch I took code from), attached is a reworked version 2.

Other changes:

1. Ignore STATUS_EOF, this led to auth_delay being applied twice (maybe
due to the gss/kerberos auth psql is trying by default? Is that legit
and should this change be reverted?) - i.e. handle STATUS_OK and
STATUS_ERROR explicitly.

2. Guard ShmemInitStruct() with LWLockAcquire(AddinShmemInitLock,
LW_EXCLUSIVE) / LWLockRelease(AddinShmemInitLock), as is done in
pg_prewarm and pg_stat_statements as well.

3. Added an additional paragraph discussing the value of
auth_delay.milliseconds when auth_delay.exponential_backoff is on, see
below.

I wonder whether maybe auth_delay.max_seconds should either be renamed
to auth_delay.exponential_backoff_max_seconds (but then it is rather
long) in order to make it clearer it only applies in that context or
alternatively just apply to auth_delay.milliseconds as well (though that
would be somewhat weird).

Further comments to your comments:

On Tue, Jan 16, 2024 at 12:00:49PM +0530, Abhijit Menon-Sen wrote:
> At 2024-01-04 08:30:36 +0100, mbanck@gmx.net wrote:
> >
> > +typedef struct AuthConnRecord
> > +{
> > +    char        remote_host[NI_MAXHOST];
> > +    bool        used;
> > +    double        sleep_time;        /* in milliseconds */
> > +} AuthConnRecord;
> 
> Do we really need a "used" field here? You could avoid it by setting
> remote_host[0] = '\0' in cleanup_conn_record.

Ok, removed.

> >  static void
> >  auth_delay_checks(Port *port, int status)
> >  {
> > +    double        delay;
> 
> I would just initialise this to auth_delay_milliseconds here, instead of
> the harder-to-notice "else" below.
 
Done.

> > @@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status)
> >       */
> >      if (status != STATUS_OK)
> >      {
> > -        pg_usleep(1000L * auth_delay_milliseconds);
> > +        if (auth_delay_exp_backoff)
> > +        {
> > +            /*
> > +             * Exponential backoff per remote host.
> > +             */
> > +            delay = record_failed_conn_auth(port);
> > +            if (auth_delay_max_seconds > 0)
> > +                delay = Min(delay, 1000L * auth_delay_max_seconds);
> > +        }
> 
> I would make this comment more specific, maybe something like "Delay by
> 2^n seconds after each authentication failure from a particular host,
> where n is the number of consecutive authentication failures".

Done.
 
> It's slightly discordant to see the new delay being returned by a
> function named "record_<something>" (rather than "calculate_delay" or
> similar). Maybe a name like "backoff_after_failed_auth" would be better?
> Or "increase_delay_on_auth_fail"?

I called it increase_delay_after_failed_conn_auth() now.
 
> > +static double
> > +record_failed_conn_auth(Port *port)
> > +{
> > +    AuthConnRecord *acr = NULL;
> > +    int            j = -1;
> > +
> > +    acr = find_conn_record(port->remote_host, &j);
> > +
> > +    if (!acr)
> > +    {
> > +        if (j == -1)
> > +
> > +            /*
> > +             * No free space, MAX_CONN_RECORDS reached. Wait as long as the
> > +             * largest delay for any remote host.
> > +             */
> > +            return find_conn_max_delay();
> 
> In this extraordinary situation (where there are lots of hosts with lots
> of authentication failures), why not delay by auth_delay_max_seconds
> straightaway, especially when the default is only 10s? I don't see much
> point in coordinating the delay between fifty known hosts and an unknown
> number of others.

I was a bit worried about legitimate users suffering here if (for some
reason) a lot of different hosts try to guess passwords, but only once
or twice or something. But I have changed it now as you suggested as
that makes it simpler and I guess the problem I mentioned above is
rather contrived.

> > +        elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j);
> 
> I think this should be removed, but if you want to leave it in, the
> message should be more specific about what this is actually about, and
> where the message is from, so as to not confuse debug-log readers.

I left it in but mentioned auth_delay in it now. I wonder though whether
this might be a useful message to have at some more standard level like
INFO?
 
> (The same applies to the other debug messages.)

Those are all gone.
 
> > +static AuthConnRecord *
> > +find_conn_record(char *remote_host, int *free_index)
> > +{
> > +    int            i;
> > +
> > +    *free_index = -1;
> > +    for (i = 0; i < MAX_CONN_RECORDS; i++)
> > +    {
> > +        if (!acr_array[i].used)
> > +        {
> > +            if (*free_index == -1)
> > +                /* record unused element */
> > +                *free_index = i;
> > +            continue;
> > +        }
> > +        if (strcmp(acr_array[i].remote_host, remote_host) == 0)
> > +            return &acr_array[i];
> > +    }
> > +
> > +    return NULL;
> > +}
> 
> It's not a big deal, but to be honest, I would much prefer to (get rid
> of used, as suggested earlier, and) have separate find_acr_for_host()
> and find_free_acr() functions.

Done.
 
> > +static void
> > +record_conn_failure(AuthConnRecord *acr)
> > +{
> > +    if (acr->sleep_time == 0)
> > +        acr->sleep_time = (double) auth_delay_milliseconds;
> > +    else
> > +        acr->sleep_time *= 2;
> > +}
> 
> I would just roll this function into record_failed_conn_auth (or
> whatever it's named), 

Done.

> In terms of the logic, it would have been slightly clearer to store the
> number of failures and calculate the delay, but it's easier to double
> the sleep time that way you've written it. I think it's fine.

I kept it as-is for now.
 
> It's worth noting that there is no time-based reset of the delay with
> this feature, which I think is something that people might expect to go
> hand-in-hand with exponential backoff. I think that's probably OK too.

You mean something like "after 5 minutes, reset the delay to 0 again"? I
agree that this would probably be useful, but would also make the change
more complex.

> > +static void
> > +auth_delay_shmem_startup(void)
> > +{
> > +    Size        required;
> > +    bool        found;
> > +
> > +    if (shmem_startup_next)
> > +        shmem_startup_next();
> > +
> > +    required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS;
> > +    acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found);
> > +    if (found)
> > +        /* this should not happen ? */
> > +        elog(DEBUG1, "variable acr_array already exists");
> > +    /* all fileds are set to 0 */
> > +    memset(acr_array, 0, required);
> >  }
> 
> I think you can remove the elog and just do the memset if (!found). Also
> if you're changing it anyway, I'd suggest something like "total_size"
> instead of "required".

Done.
 
> > +    DefineCustomBoolVariable("auth_delay.exp_backoff",
> > +                             "Exponential backoff for failed connections, per remote host",
> > +                             NULL,
> > +                             &auth_delay_exp_backoff,
> > +                             false,
> > +                             PGC_SIGHUP,
> > +                             0,
> > +                             NULL,
> > +                             NULL,
> > +                             NULL);
> 
> Maybe "Double the delay after each authentication failure from a
> particular host". (Note: authentication failed, not connection.)

Done.
 
> I would also mildly prefer to spell out "exponential_backoff" (but leave
> auth_delay_exp_backoff as-is).

Done.

> > +    DefineCustomIntVariable("auth_delay.max_seconds",
> > +                            "Maximum seconds to wait when login fails during exponential backoff",
> > +                            NULL,
> > +                            &auth_delay_max_seconds,
> > +                            10,
> > +                            0, INT_MAX,
> > +                            PGC_SIGHUP,
> > +                            GUC_UNIT_S,
> > +                            NULL, NULL, NULL);
> > +
> 
> Maybe just "Maximum delay when exponential backoff is enabled".

Done.
 
> (Parameter indentation doesn't match the earlier block.)

I noticed that as well, but pgindent keeps changing it back to this, not
sure why...
 
> I'm not able to make up my mind if I think 10s is a good default or not.
> In practice, it means that after the first three consecutive failures,
> we'll delay by 10s for every subsequent failure. That sounds OK. But is
> is much more useful than, for example, delaying the first three failures
> by auth_delay_milliseconds and then jumping straight to max_seconds?

What I had in mind is that admins would lower auth_delay.milliseconds to
something like 100 or 125 when exponential_backoff is on, so that the
first few (possibley honest) auth failures do not get an annoying 1
seconds penalty, but later ones then wil. In that case, 10 seconds is
probably ok cause you'd need more than a handful of auth failures.

I added a paragraph to the documentation to this end.
 
> I can't really imagine wanting to increase max_seconds to, say, 128 and
> keep a bunch of backends sleeping while someone's trying to brute-force
> a password. And with a reasonably short max_seconds, I'm not sure if
> having the backoff be _exponential_ is particularly important.
> 
> Or maybe because this is a contrib module, we don't have to think about
> it to that extent?

Well, not sure. I think something like 10 seconds should be fine for
most brute-force attacks in practise, and it is configurable (and turned
off by default).
 
> > diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml
> > index 0571f2a99d..2ca9528011 100644
> > --- a/doc/src/sgml/auth-delay.sgml
> > +++ b/doc/src/sgml/auth-delay.sgml
> > @@ -16,6 +16,17 @@
> >    connection slots.
> >   </para>
> >  
> > + <para>
> > +  It is optionally possible to let <filename>auth_delay</filename> wait longer
> > +  for each successive authentication failure from a particular remote host, if
> > +  the configuration parameter <varname>auth_delay.exp_backoff</varname> is
> > +  active.  Once an authentication succeeded from a remote host, the
> > +  authentication delay is reset to the value of
> > +  <varname>auth_delay.milliseconds</varname> for this host.  The parameter
> > +  <varname>auth_delay.max_seconds</varname> sets an upper bound for the delay
> > +  in this case.
> > + </para>
> 
> How about something like this…
> 
>     If you enable exponential_backoff, auth_delay will double the delay
>     after each consecutive authentication failure from a particular
>     host, up to the given max_seconds (default: 10s). If the host
>     authenticates successfully, the delay is reset.

Done, mostly.
 
> > +   <varlistentry>
> > +    <term>
> > +     <varname>auth_delay.max_seconds</varname> (<type>integer</type>)
> > +     <indexterm>
> > +      <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary>
> > +     </indexterm>
> > +    </term>
> > +    <listitem>
> > +     <para>
> > +      How many seconds to wait at most if exponential backoff is active.
> > +      Setting this parameter to 0 disables it.  The default is 10 seconds.
> > +     </para>
> > +    </listitem>
> > +   </varlistentry>
> 
> I suggest "The maximum delay, in seconds, when exponential backoff is
> enabled."

Done.


Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: generate syscache info automatically
Следующее
От: Michael Banck
Дата:
Сообщение: Re: [PATCH] Exponential backoff for auth_delay