Обсуждение: Connection slots reserved for replication

Поиск
Список
Период
Сортировка

Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hello hackers,

at the moment it is possible to reserve some amount of connection slots for superusers and this behavior is controlled by superuser_reserved_connections configuration parameter with the default value = 3.

In case if all non-reserved connection slots are busy, replica fails to open a new connection and start streaming from the primary. Such behavior is very bad if you want to run postgresql HA clusters

Initially, replication connections required superuser privileges (in 9.0) and therefore they were deliberately excluded from superuser_reserved_connections. Basically that means it has never been possible to reserve come connection slots for replication connections.

Later (9.1) it became possible to create a user with REPLICATION and NOSUPERUSER options, but comment in the postinit.c still tells that superuser is required.

Now I think now it is a time to go further, and we should make it possible to reserve some connection slots for replication in a manner similar to superuser connections.

How should it work:
1. If we know that we got the replication connection, we just should make sure that there are at least superuser_reserved_connections free connection slots are available.
2. If we know that this is neither superuser nor replication connection, we should check that there are at least (superuser_reserved_connections + NumWalSenders() - max_wal_senders) connection slots are available.

And the last question how to control the number of reserved slots for replication. There are two options:
1. We can introduce a new GUC for that: replication_reserved_connections
2. Or we can just use the value of max_wal_senders

Personally, I more like the second option.


Attached patch implements above described functionality.
Feedback is very appretiated.


Regards,
--
Alexander Kukushkin
Вложения

Re: Connection slots reserved for replication

От
Masahiko Sawada
Дата:
On Wed, Aug 1, 2018 at 9:30 PM, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> Hello hackers,
>
> at the moment it is possible to reserve some amount of connection slots for
> superusers and this behavior is controlled by superuser_reserved_connections
> configuration parameter with the default value = 3.
>
> In case if all non-reserved connection slots are busy, replica fails to open
> a new connection and start streaming from the primary. Such behavior is very
> bad if you want to run postgresql HA clusters

Yeah, that's also bad if we want to use pg_baseback in the situation.

>
> Initially, replication connections required superuser privileges (in 9.0)
> and therefore they were deliberately excluded from
> superuser_reserved_connections. Basically that means it has never been
> possible to reserve come connection slots for replication connections.
>
> Later (9.1) it became possible to create a user with REPLICATION and
> NOSUPERUSER options, but comment in the postinit.c still tells that
> superuser is required.
>
> Now I think now it is a time to go further, and we should make it possible
> to reserve some connection slots for replication in a manner similar to
> superuser connections.
>

+1

> How should it work:
> 1. If we know that we got the replication connection, we just should make
> sure that there are at least superuser_reserved_connections free connection
> slots are available.
> 2. If we know that this is neither superuser nor replication connection, we
> should check that there are at least (superuser_reserved_connections +
> NumWalSenders() - max_wal_senders) connection slots are available.

You wanted to mean (superuser_reserved_connections + max_wal_senders -
NumWalSenders()) in the second point?

>
> And the last question how to control the number of reserved slots for
> replication. There are two options:
> 1. We can introduce a new GUC for that: replication_reserved_connections
> 2. Or we can just use the value of max_wal_senders
>
> Personally, I more like the second option.
>

One argrable point of the second option could be that it breaks
backward compatibility of the parameter configurations. That is, the
existing systems need to re-configure the max_connections. So it might
be better to take the first option with
replication_reservd_connections = 0 by default.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

2018-09-14 12:23 GMT+02:00 Masahiko Sawada <sawada.mshk@gmail.com>:

>> 2. If we know that this is neither superuser nor replication connection, we
>> should check that there are at least (superuser_reserved_connections +
>> NumWalSenders() - max_wal_senders) connection slots are available.
>
> You wanted to mean (superuser_reserved_connections + max_wal_senders -
> NumWalSenders()) in the second point?

Sure, my bad. Did a mistake when writing an email, but in the attached
file it looks good.

>
> One argrable point of the second option could be that it breaks
> backward compatibility of the parameter configurations. That is, the
> existing systems need to re-configure the max_connections. So it might
> be better to take the first option with
> replication_reservd_connections = 0 by default.

Please find attached the new version of the patch, which introduces
replication_reservd_connections GUC

Regards,
--
Alexander Kukushkin

Вложения

Re: Connection slots reserved for replication

От
Kyotaro HORIGUCHI
Дата:
Hello.

I agree to the objective.

At Mon, 17 Sep 2018 14:25:56 +0200, Alexander Kukushkin <cyberdemn@gmail.com> wrote in
<CAFh8B=nbh4gbFCiT-jpjth60QJC1pKoWkvgke+7di-FgAduGLQ@mail.gmail.com>
> Hi,
> 
> 2018-09-14 12:23 GMT+02:00 Masahiko Sawada <sawada.mshk@gmail.com>:
> 
> >> 2. If we know that this is neither superuser nor replication connection, we
> >> should check that there are at least (superuser_reserved_connections +
> >> NumWalSenders() - max_wal_senders) connection slots are available.
> >
> > You wanted to mean (superuser_reserved_connections + max_wal_senders -
> > NumWalSenders()) in the second point?
> 
> Sure, my bad. Did a mistake when writing an email, but in the attached
> file it looks good.
> 
> >
> > One argrable point of the second option could be that it breaks
> > backward compatibility of the parameter configurations. That is, the
> > existing systems need to re-configure the max_connections. So it might
> > be better to take the first option with
> > replication_reservd_connections = 0 by default.
> 
> Please find attached the new version of the patch, which introduces
> replication_reservd_connections GUC

With this patch, non-superuser is rejected if there are less than
super_res_conn + (remain of repl_res_conn). It gives the same
effect for walsenders with just sharing
superuser_reserved_connection by superusers and walsenders.

Instaed, we can iterally "reserve" connection slots for the
specific use by providing ProcGlobal->walsenderFreeProcs. The
slots are never stolen for other usage. Allowing additional
walsenders comes after all the slots are filled to grab an
available "normal" slot, it works as the same as the current
behavior when walsender_reserved_connectsions = 0.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

>
> Instaed, we can iterally "reserve" connection slots for the
> specific use by providing ProcGlobal->walsenderFreeProcs. The
> slots are never stolen for other usage. Allowing additional
> walsenders comes after all the slots are filled to grab an
> available "normal" slot, it works as the same as the current
> behavior when walsender_reserved_connectsions = 0.
>
> What do you think about this?

Sounds reasonable, please see the updated patch.

Regards,
--
Alexander Kukushkin

Вложения

Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> >
> > Instaed, we can iterally "reserve" connection slots for the
> > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > slots are never stolen for other usage. Allowing additional
> > walsenders comes after all the slots are filled to grab an
> > available "normal" slot, it works as the same as the current
> > behavior when walsender_reserved_connectsions = 0.
> >
> > What do you think about this?
>
> Sounds reasonable, please see the updated patch.
>
> Regards,
> --
> Alexander Kukushkin

Вложения

Re: Connection slots reserved for replication

От
Kyotaro HORIGUCHI
Дата:
Hello. Thank you for the new version.

At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin <cyberdemn@gmail.com> wrote in
<CAFh8B=kaX0uWdyZXn3xZPgRqhHrbiOWwFhWStdG0fvJ4is21iA@mail.gmail.com>
> Hi,
> 
> Attached rebased version patch to the current HEAD and created commit fest entry
> On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> >
> > Hi,
> >
> > On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >
> > >
> > > Instaed, we can iterally "reserve" connection slots for the
> > > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > > slots are never stolen for other usage. Allowing additional
> > > walsenders comes after all the slots are filled to grab an
> > > available "normal" slot, it works as the same as the current
> > > behavior when walsender_reserved_connectsions = 0.
> > >
> > > What do you think about this?
> >
> > Sounds reasonable, please see the updated patch.

InitializeMaxBackends()
     MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-        max_worker_processes;
+        max_worker_processes + replication_reserved_connections;

This means walsender doesn't comsume a connection, which is
different from the current behavior. We should reserve a part of
MaxConnections for walsenders.  (in PostmasterMain,
max_wal_senders is counted as a part of MaxConnections)

+    if (am_walsender && replication_reserved_connections < max_wal_senders
+            && *procgloballist == NULL)
+        procgloballist = &ProcGlobal->freeProcs;

Currently exccesive number of walsenders are rejected in
InitWalSenderSlot and emit the following error.

>    ereport(FATAL,
>        (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
>         errmsg("number of requested standby connections "
>            "exceeds max_wal_senders (currently %d)",
>            max_wal_senders)));

With this patch, if max_wal_senders =
replication_reserved_connections = 3 and the fourth walreceiver
comes, we will get "FATAL: sorry, too many clients already"
instead. It should be fixed.

When r_r_conn = 2 and max_wal_senders = 3 and the three
walsenders are active, in an exreme case where a new replication
connection comes at the same time another is exiting, we could
end up using two normal slots despite that one slot is vacant in
reserved slots. If we want to avoid the case, we need to limit
the number of normal slots to use. I don't think it is acceptable
as is but basically something like the attached would do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3b6c636077..f86c05e8e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2250,6 +2250,7 @@ static void
 InitWalSenderSlot(void)
 {
     int            i;
+    bool        reject = false;
 
     /*
      * WalSndCtl should be set up already (we inherit this by fork() or
@@ -2258,41 +2259,63 @@ InitWalSenderSlot(void)
     Assert(WalSndCtl != NULL);
     Assert(MyWalSnd == NULL);
 
+    /* limit the maximum number of non-reserved slots to use */
+    if (MyProc->procgloballist == &ProcGlobal->freeProcs)
+    {
+        int    max_normal_slots
+            = max_wal_senders - replication_reserved_connections;
+
+        if (max_normal_slots <= 0)
+        {
+            /* we mustn't use a normal slot */
+            reject = true;
+        }
+        else if (pg_atomic_add_fetch_u32(&WalSndCtl->n_using_normal_slots, 1)
+            > max_normal_slots)
+        {
+            pg_atomic_sub_fetch_u32(&WalSndCtl->n_using_normal_slots, 1);
+            reject = true;
+        }
+    }
+
     /*
      * Find a free walsender slot and reserve it. If this fails, we must be
      * out of WalSnd structures.
      */
-    for (i = 0; i < max_wal_senders; i++)
+    if (!reject)
     {
-        WalSnd       *walsnd = &WalSndCtl->walsnds[i];
-
-        SpinLockAcquire(&walsnd->mutex);
-
-        if (walsnd->pid != 0)
+        for (i = 0; i < max_wal_senders; i++)
         {
-            SpinLockRelease(&walsnd->mutex);
-            continue;
-        }
-        else
-        {
-            /*
-             * Found a free slot. Reserve it for us.
-             */
-            walsnd->pid = MyProcPid;
-            walsnd->sentPtr = InvalidXLogRecPtr;
-            walsnd->write = InvalidXLogRecPtr;
-            walsnd->flush = InvalidXLogRecPtr;
-            walsnd->apply = InvalidXLogRecPtr;
-            walsnd->writeLag = -1;
-            walsnd->flushLag = -1;
-            walsnd->applyLag = -1;
-            walsnd->state = WALSNDSTATE_STARTUP;
-            walsnd->latch = &MyProc->procLatch;
-            SpinLockRelease(&walsnd->mutex);
-            /* don't need the lock anymore */
-            MyWalSnd = (WalSnd *) walsnd;
+            WalSnd       *walsnd = &WalSndCtl->walsnds[i];
 
-            break;
+            SpinLockAcquire(&walsnd->mutex);
+
+            if (walsnd->pid != 0)
+            {
+                SpinLockRelease(&walsnd->mutex);
+                continue;
+            }
+            else
+            {
+                /*
+                 * Found a free slot. Reserve it for us.
+                 */
+                walsnd->pid = MyProcPid;
+                walsnd->sentPtr = InvalidXLogRecPtr;
+                walsnd->write = InvalidXLogRecPtr;
+                walsnd->flush = InvalidXLogRecPtr;
+                walsnd->apply = InvalidXLogRecPtr;
+                walsnd->writeLag = -1;
+                walsnd->flushLag = -1;
+                walsnd->applyLag = -1;
+                walsnd->state = WALSNDSTATE_STARTUP;
+                walsnd->latch = &MyProc->procLatch;
+                SpinLockRelease(&walsnd->mutex);
+                /* don't need the lock anymore */
+                MyWalSnd = (WalSnd *) walsnd;
+
+                break;
+            }
         }
     }
     if (MyWalSnd == NULL)
@@ -2322,6 +2345,10 @@ WalSndKill(int code, Datum arg)
     /* Mark WalSnd struct as no longer being in use. */
     walsnd->pid = 0;
     SpinLockRelease(&walsnd->mutex);
+
+    /* decrement usage count of normal connection slots if needed */
+    if (MyProc->procgloballist == &ProcGlobal->freeProcs)
+        pg_atomic_sub_fetch_u32(&WalSndCtl->n_using_normal_slots, 1);
 }
 
 /*
@@ -3047,6 +3074,9 @@ WalSndShmemInit(void)
 
             SpinLockInit(&walsnd->mutex);
         }
+
+        /* set the number of non-reserved slots walsenders can use */
+        pg_atomic_init_u32(&WalSndCtl->n_using_normal_slots, 0);
     }
 }
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 2d04a8204a..c461cf2f12 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -333,8 +333,7 @@ InitProcess(void)
     * Try to use ProcGlobal->freeProcs as a fallback when
     * all reserved walsender slots are already busy.
     */
-    if (am_walsender && replication_reserved_connections < max_wal_senders
-            && *procgloballist == NULL)
+    if (am_walsender && *procgloballist == NULL)
         procgloballist = &ProcGlobal->freeProcs;
 
     MyProc = *procgloballist;
diff --git a/src/include/replication/walsender_private.h b/src/include/replication/walsender_private.h
index 4b90477936..bc84c287a4 100644
--- a/src/include/replication/walsender_private.h
+++ b/src/include/replication/walsender_private.h
@@ -14,6 +14,7 @@
 
 #include "access/xlog.h"
 #include "nodes/nodes.h"
+#include "port/atomics.h"
 #include "replication/syncrep.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
@@ -101,6 +102,8 @@ typedef struct
      */
     bool        sync_standbys_defined;
 
+    pg_atomic_uint32 n_using_normal_slots;
+
     WalSnd        walsnds[FLEXIBLE_ARRAY_MEMBER];
 } WalSndCtlData;


Re: Connection slots reserved for replication

От
Masahiko Sawada
Дата:
On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello. Thank you for the new version.
>
> At Thu, 1 Nov 2018 11:58:52 +0100, Alexander Kukushkin <cyberdemn@gmail.com> wrote in
<CAFh8B=kaX0uWdyZXn3xZPgRqhHrbiOWwFhWStdG0fvJ4is21iA@mail.gmail.com>
> > Hi,
> >
> > Attached rebased version patch to the current HEAD and created commit fest entry
> > On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
> > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >
> > > >
> > > > Instaed, we can iterally "reserve" connection slots for the
> > > > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > > > slots are never stolen for other usage. Allowing additional
> > > > walsenders comes after all the slots are filled to grab an
> > > > available "normal" slot, it works as the same as the current
> > > > behavior when walsender_reserved_connectsions = 0.
> > > >
> > > > What do you think about this?
> > >
> > > Sounds reasonable, please see the updated patch.
>
> InitializeMaxBackends()
>         MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> -               max_worker_processes;
> +               max_worker_processes + replication_reserved_connections;
>
> This means walsender doesn't comsume a connection, which is
> different from the current behavior. We should reserve a part of
> MaxConnections for walsenders.  (in PostmasterMain,
> max_wal_senders is counted as a part of MaxConnections)

Yes. We can force replication_reserved_connections <= max_wal_senders
and then reserved connections for replication should be a part of
MaxConnections.

>
> +       if (am_walsender && replication_reserved_connections < max_wal_senders
> +                       && *procgloballist == NULL)
> +               procgloballist = &ProcGlobal->freeProcs;
>
> Currently exccesive number of walsenders are rejected in
> InitWalSenderSlot and emit the following error.
>
> >    ereport(FATAL,
> >        (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> >         errmsg("number of requested standby connections "
> >            "exceeds max_wal_senders (currently %d)",
> >            max_wal_senders)));
>
> With this patch, if max_wal_senders =
> replication_reserved_connections = 3 and the fourth walreceiver
> comes, we will get "FATAL: sorry, too many clients already"
> instead. It should be fixed.
>
> When r_r_conn = 2 and max_wal_senders = 3 and the three
> walsenders are active, in an exreme case where a new replication
> connection comes at the same time another is exiting, we could
> end up using two normal slots despite that one slot is vacant in
> reserved slots.

Doesn't the max_wal_senders prevent the case?

Wal senders can get connection if we have free procs more than
(MaxConnections - reserved for superusers). So I think for normal
users the connection must be refused if (MaxConnections - (reserved
for superuser and replication) > # of freeprocs) and for wal senders
the connection also must be refused if (MaxConnections - (reserved for
superuser) > # of freeprocs). I'm not sure we need such trick in
InitWalSenderSlot().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center


Re: Connection slots reserved for replication

От
Kyotaro HORIGUCHI
Дата:
Hello.

At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig@mail.gmail.com>
> On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > InitializeMaxBackends()
> >         MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > -               max_worker_processes;
> > +               max_worker_processes + replication_reserved_connections;
> >
> > This means walsender doesn't comsume a connection, which is
> > different from the current behavior. We should reserve a part of
> > MaxConnections for walsenders.  (in PostmasterMain,
> > max_wal_senders is counted as a part of MaxConnections)
> 
> Yes. We can force replication_reserved_connections <= max_wal_senders
> and then reserved connections for replication should be a part of
> MaxConnections.
> 
> >
> > +       if (am_walsender && replication_reserved_connections < max_wal_senders
> > +                       && *procgloballist == NULL)
> > +               procgloballist = &ProcGlobal->freeProcs;
> >
> > Currently exccesive number of walsenders are rejected in
> > InitWalSenderSlot and emit the following error.
> >
> > >    ereport(FATAL,
> > >        (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > >         errmsg("number of requested standby connections "
> > >            "exceeds max_wal_senders (currently %d)",
> > >            max_wal_senders)));
> >
> > With this patch, if max_wal_senders =
> > replication_reserved_connections = 3 and the fourth walreceiver
> > comes, we will get "FATAL: sorry, too many clients already"
> > instead. It should be fixed.
> >
> > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > walsenders are active, in an exreme case where a new replication
> > connection comes at the same time another is exiting, we could
> > end up using two normal slots despite that one slot is vacant in
> > reserved slots.
> 
> Doesn't the max_wal_senders prevent the case?

Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.

I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.

> Wal senders can get connection if we have free procs more than
> (MaxConnections - reserved for superusers). So I think for normal
> users the connection must be refused if (MaxConnections - (reserved
> for superuser and replication) > # of freeprocs) and for wal senders
> the connection also must be refused if (MaxConnections - (reserved for
> superuser) > # of freeprocs). I'm not sure we need such trick in
> InitWalSenderSlot().

(For clarity, I don't mean my previous patch is good solution.)

It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)

Another idea is we acquire a walsnd[] slot before getting a
connection slot..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Connection slots reserved for replication

От
Masahiko Sawada
Дата:
On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello.
>
> At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig@mail.gmail.com>
> > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > > InitializeMaxBackends()
> > >         MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > -               max_worker_processes;
> > > +               max_worker_processes + replication_reserved_connections;
> > >
> > > This means walsender doesn't comsume a connection, which is
> > > different from the current behavior. We should reserve a part of
> > > MaxConnections for walsenders.  (in PostmasterMain,
> > > max_wal_senders is counted as a part of MaxConnections)
> >
> > Yes. We can force replication_reserved_connections <= max_wal_senders
> > and then reserved connections for replication should be a part of
> > MaxConnections.
> >
> > >
> > > +       if (am_walsender && replication_reserved_connections < max_wal_senders
> > > +                       && *procgloballist == NULL)
> > > +               procgloballist = &ProcGlobal->freeProcs;
> > >
> > > Currently exccesive number of walsenders are rejected in
> > > InitWalSenderSlot and emit the following error.
> > >
> > > >    ereport(FATAL,
> > > >        (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > >         errmsg("number of requested standby connections "
> > > >            "exceeds max_wal_senders (currently %d)",
> > > >            max_wal_senders)));
> > >
> > > With this patch, if max_wal_senders =
> > > replication_reserved_connections = 3 and the fourth walreceiver
> > > comes, we will get "FATAL: sorry, too many clients already"
> > > instead. It should be fixed.
> > >
> > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > walsenders are active, in an exreme case where a new replication
> > > connection comes at the same time another is exiting, we could
> > > end up using two normal slots despite that one slot is vacant in
> > > reserved slots.
> >
> > Doesn't the max_wal_senders prevent the case?
>
> Currently the variable doesn't work as so. We once accept the
> connection request and searches for a vacant slot in
> InitWalSenderSlot and reject the connection if it found that no
> room is available. Even with this patch, we don't count the
> accurate number of active walsenders (for performance reason). If
> reserved slot are filled, there's no way other than to accept the
> connection using non-reserved slot if r_r_conn <
> max_wal_senders. If one of active walsenders went away since we
> allocated non-reserved connection slot until InitWalSenderSlot
> starts searching sendnds[] array. Finally the new walsender on
> the unreserved slot is activated, and one reserved slot is left
> empty. So this is "an extreme case". We could ignore the case.
>
> I'm doubt that we should allow the setting where r_r_conn <
> max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> have a problem like this if we don't allow the cases.
>
> > Wal senders can get connection if we have free procs more than
> > (MaxConnections - reserved for superusers). So I think for normal
> > users the connection must be refused if (MaxConnections - (reserved
> > for superuser and replication) > # of freeprocs) and for wal senders
> > the connection also must be refused if (MaxConnections - (reserved for
> > superuser) > # of freeprocs). I'm not sure we need such trick in
> > InitWalSenderSlot().
>
> (For clarity, I don't mean my previous patch is good solution.)
>
> It works as far as we accept that some reserved slots can be left
> unused despite of some walsenders are using normal slots. (Just
> exiting a walsender using reserved slot causes this but it is
> usually occupied by walsenders comes later)
>
> Another idea is we acquire a walsnd[] slot before getting a
> connection slot..

After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.

For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.

On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center


Re: Connection slots reserved for replication

От
Magnus Hagander
Дата:
On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> Hello.
>
> At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig@mail.gmail.com>
> > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > > InitializeMaxBackends()
> > >         MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > -               max_worker_processes;
> > > +               max_worker_processes + replication_reserved_connections;
> > >
> > > This means walsender doesn't comsume a connection, which is
> > > different from the current behavior. We should reserve a part of
> > > MaxConnections for walsenders.  (in PostmasterMain,
> > > max_wal_senders is counted as a part of MaxConnections)
> >
> > Yes. We can force replication_reserved_connections <= max_wal_senders
> > and then reserved connections for replication should be a part of
> > MaxConnections.
> >
> > >
> > > +       if (am_walsender && replication_reserved_connections < max_wal_senders
> > > +                       && *procgloballist == NULL)
> > > +               procgloballist = &ProcGlobal->freeProcs;
> > >
> > > Currently exccesive number of walsenders are rejected in
> > > InitWalSenderSlot and emit the following error.
> > >
> > > >    ereport(FATAL,
> > > >        (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > >         errmsg("number of requested standby connections "
> > > >            "exceeds max_wal_senders (currently %d)",
> > > >            max_wal_senders)));
> > >
> > > With this patch, if max_wal_senders =
> > > replication_reserved_connections = 3 and the fourth walreceiver
> > > comes, we will get "FATAL: sorry, too many clients already"
> > > instead. It should be fixed.
> > >
> > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > walsenders are active, in an exreme case where a new replication
> > > connection comes at the same time another is exiting, we could
> > > end up using two normal slots despite that one slot is vacant in
> > > reserved slots.
> >
> > Doesn't the max_wal_senders prevent the case?
>
> Currently the variable doesn't work as so. We once accept the
> connection request and searches for a vacant slot in
> InitWalSenderSlot and reject the connection if it found that no
> room is available. Even with this patch, we don't count the
> accurate number of active walsenders (for performance reason). If
> reserved slot are filled, there's no way other than to accept the
> connection using non-reserved slot if r_r_conn <
> max_wal_senders. If one of active walsenders went away since we
> allocated non-reserved connection slot until InitWalSenderSlot
> starts searching sendnds[] array. Finally the new walsender on
> the unreserved slot is activated, and one reserved slot is left
> empty. So this is "an extreme case". We could ignore the case.
>
> I'm doubt that we should allow the setting where r_r_conn <
> max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> have a problem like this if we don't allow the cases.
>
> > Wal senders can get connection if we have free procs more than
> > (MaxConnections - reserved for superusers). So I think for normal
> > users the connection must be refused if (MaxConnections - (reserved
> > for superuser and replication) > # of freeprocs) and for wal senders
> > the connection also must be refused if (MaxConnections - (reserved for
> > superuser) > # of freeprocs). I'm not sure we need such trick in
> > InitWalSenderSlot().
>
> (For clarity, I don't mean my previous patch is good solution.)
>
> It works as far as we accept that some reserved slots can be left
> unused despite of some walsenders are using normal slots. (Just
> exiting a walsender using reserved slot causes this but it is
> usually occupied by walsenders comes later)
>
> Another idea is we acquire a walsnd[] slot before getting a
> connection slot..

After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.

For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.

On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.


Maybe what we should do instead is not consider max_wal_senders a part of the total number of connections, and instead size the things that needs to be sized by them by max_connections + max_wal_senders. That seems more logical given how the parameters are named as well. 

--

Re: Connection slots reserved for replication

От
Stephen Frost
Дата:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> > On the other hand, If we always reserve max_wal_senders slots
> > available slots for normal backend will get decreased in the next
> > release, which require for users to re-confiugre the max_connection.
> > But I felt this behavior seems more natural than the current one, so I
> > think the re-configuration can be acceptable for users.
>
> Maybe what we should do instead is not consider max_wal_senders a part of
> the total number of connections, and instead size the things that needs to
> be sized by them by max_connections + max_wal_senders. That seems more
> logical given how the parameters are named as well.

I tend to agree with having max_connections + max_wal_senders.

Thanks!

Stephen

Вложения

Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

attaching the new version of the patch.
Now it simply reserves max_wal_senders slots in the ProcGlobal, what
guarantees that only walsender process could use them.

Regards,
--
Alexander Kukushkin

Вложения

Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:
> On 30. Nov 2018, at 13:58, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>
> attaching the new version of the patch.
> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
> guarantees that only walsender process could use them.

With this patch It looks like InitProcess will trigger the generic error about 'too many clients' before the more
specificerror message in InitWalSenderSlot when exceeding the number of max_wal_senders. 

Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply that
weneed to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the replica
tobe not lower than the one on the primary?  

Cheers,
Oleksii

Re: Connection slots reserved for replication

От
Petr Jelinek
Дата:
On 05/12/2018 15:33, Oleksii Kliukin wrote:
> 
>> On 30. Nov 2018, at 13:58, Alexander Kukushkin <cyberdemn@gmail.com> wrote:
>>
>> attaching the new version of the patch.
>> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
>> guarantees that only walsender process could use them.
> 
> With this patch It looks like InitProcess will trigger the generic error about 'too many clients' before the more
specificerror message in InitWalSenderSlot when exceeding the number of max_wal_senders.
 
> 
> Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply
thatwe need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the
replicato be not lower than the one on the primary? 
 
> 

I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> > Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply
thatwe need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the
replicato be not lower than the one on the primary? 
> >
>
> I think it does, we need the proc slots for walsenders on the standby
> same way we do for normal backends.

You are absolutely right. Attaching the new version of the patch.

Regards,
--
Alexander Kukushkin

Вложения

Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:
On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
> Hi,
> 
> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
> > > Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply
thatwe need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the
replicato be not lower than the one on the primary?
 
> > >
> >
> > I think it does, we need the proc slots for walsenders on the standby
> > same way we do for normal backends.
> 
> You are absolutely right. Attaching the new version of the patch.

Thank you. I've checked that the replica correctly complains when its value of max_wal_senders is lower than the one on
theprimary at v6. 
 

As stated in my previous comment, I think we should retain the specific error message on exceeding max_wal_senders,
insteadof showing the generic "too many clients already'.  Attached is the patch that fixes this small thing. I've also
rebasedit against the master and took a liberty of naming it v7.  It makes me wondering why don't we apply the same
levelof details to the regular out of connection message and don't show the actual value of max_connections in the
errortext?
 

The code diff to the previous version is rather small:


diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d1a8113cb6..df073673f5 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2273,8 +2273,8 @@ InitWalSenderSlot(void)
     Assert(MyWalSnd == NULL);
 
     /*
-     * Find a free walsender slot and reserve it. If this fails, we must be
-     * out of WalSnd structures.
+     * Find a free walsender slot and reserve it. This must not fail due
+     * to the prior check for free walsenders at InitProcess.
      */
     for (i = 0; i < max_wal_senders; i++)
     {
@@ -2310,13 +2310,7 @@ InitWalSenderSlot(void)
             break;
         }
     }
-    if (MyWalSnd == NULL)
-        ereport(FATAL,
-                (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
-                 errmsg("number of requested standby connections "
-                        "exceeds max_wal_senders (currently %d)",
-                        max_wal_senders)));
-
+    Assert(MyWalSnd != NULL);
     /* Arrange to clean up at walsender exit */
     on_shmem_exit(WalSndKill, 0);
 }

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 33387fb71b..fe33fc42e3 100644
@ -341,6 +353,12 @@ InitProcess(void)
          * in the autovacuum case?
          */
         SpinLockRelease(ProcStructLock);
+        if (am_walsender)
+            ereport(FATAL,
+                    (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
+                      errmsg("number of requested standby connections "
+                            "exceeds max_wal_senders (currently %d)",
+                            max_wal_senders)));
         ereport(FATAL,
                 (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
                  errmsg("sorry, too many clients already")));


Cheers,
Oleksii


Вложения

Re: Connection slots reserved for replication

От
Petr Jelinek
Дата:
Hi,

On 02/01/2019 10:21, Oleksii Kliukin wrote:
> On Mon, Dec 17, 2018, at 14:10, Alexander Kukushkin wrote:
>> Hi,
>>
>> On Thu, 6 Dec 2018 at 00:55, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>>>> Does excluding WAL senders from the max_connections limit and including max_wal_senders in MaxBackends also imply
thatwe need to add max_wal_senders to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the
replicato be not lower than the one on the primary?
 
>>>>
>>>
>>> I think it does, we need the proc slots for walsenders on the standby
>>> same way we do for normal backends.
>>
>> You are absolutely right. Attaching the new version of the patch.
> 
> Thank you. I've checked that the replica correctly complains when its value of max_wal_senders is lower than the one
onthe primary at v6. 
 
> 
> As stated in my previous comment, I think we should retain the specific error message on exceeding max_wal_senders,
insteadof showing the generic "too many clients already'.  Attached is the patch that fixes this small thing. I've also
rebasedit against the master and took a liberty of naming it v7.  It makes me wondering why don't we apply the same
levelof details to the regular out of connection message and don't show the actual value of max_connections in the
errortext?
 
> 

+1

The patch generally looks good, but the documentation of max_wal_senders
needs updating. In config.sgml we say:

> WAL sender processes count towards the total number
> of connections, so this parameter's value must be less than
> <xref linkend="guc-max-connections"/> minus
> <xref linkend="guc-superuser-reserved-connections"/>.

This is now misleading.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:

On Wed, Jan 2, 2019, at 11:02, Petr Jelinek wrote:

> The patch generally looks good, but the documentation of max_wal_senders
> needs updating. In config.sgml we say:
> 
> > WAL sender processes count towards the total number
> > of connections, so this parameter's value must be less than
> > <xref linkend="guc-max-connections"/> minus
> > <xref linkend="guc-superuser-reserved-connections"/>.
> 
> This is now misleading.

Thank you. Attached the new version(called it v8) with the following changes to the documentation:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..6634ce8cdc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -697,8 +697,7 @@ include_dir 'conf.d'
 
        <para>
         The default value is three connections. The value must be less
-        than <varname>max_connections</varname> minus
-        <xref linkend="guc-max-wal-senders"/>.
+        than <varname>max_connections</varname>.
         This parameter can only be set at server start.
        </para>
       </listitem>
@@ -3453,24 +3452,25 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
        </term>
        <listitem>
        <para>
-        Specifies the maximum number of concurrent connections from
-        standby servers or streaming base backup clients (i.e., the
-        maximum number of simultaneously running WAL sender
-        processes). The default is 10. The value 0 means replication is
-        disabled. WAL sender processes count towards the total number
-        of connections, so this parameter's value must be less than
-        <xref linkend="guc-max-connections"/> minus
-        <xref linkend="guc-superuser-reserved-connections"/>.
-        Abrupt streaming client disconnection might leave an orphaned
-        connection slot behind until
-        a timeout is reached, so this parameter should be set slightly
-        higher than the maximum number of expected clients so disconnected
-        clients can immediately reconnect.  This parameter can only
-        be set at server start.
+        Specifies the maximum number of concurrent connections from standby
+        servers or streaming base backup clients (i.e., the maximum number of
+        simultaneously running WAL sender processes). The default is 10. The
+        value 0 means replication is disabled.  Abrupt streaming client
+        disconnection might leave an orphaned connection slot behind until a
+        timeout is reached, so this parameter should be set slightly higher
+        than the maximum number of expected clients so disconnected clients
+        can immediately reconnect.  This parameter can only be set at server
+        start.
         Also, <varname>wal_level</varname> must be set to
         <literal>replica</literal> or higher to allow connections from standby
         servers.
        </para>
+
+       <para>
+         When running a standby server, you must set this parameter to the
+         same or higher value than on the master server. Otherwise, queries
+         will not be allowed in the standby server.
+        </para>
        </listitem>
       </varlistentry>
 
Cheers,
Oleksii

Вложения

Re: Connection slots reserved for replication

От
Alexander Kukushkin
Дата:
Hi,

On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin <alexk@hintbits.com> wrote:

> Thank you. Attached the new version(called it v8) with the following changes to the documentation:

Thank you for jumping on it. Your changes look good to me.


I was also thinking about changing the value in PG_CONTROL_VERSION,
because we added the new field into the control file, but decided to
leave this change to committer.

Regards,
--
Alexander Kukushkin


Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:
On Wed, Jan 2, 2019, at 12:36, Alexander Kukushkin wrote:
> Hi,
> 
> On Wed, 2 Jan 2019 at 12:17, Oleksii Kliukin <alexk@hintbits.com> wrote:
> 
> > Thank you. Attached the new version(called it v8) with the following changes to the documentation:
> 
> Thank you for jumping on it. Your changes look good to me.
> 
> 
> I was also thinking about changing the value in PG_CONTROL_VERSION,
> because we added the new field into the control file, but decided to
> leave this change to committer.

Sounds reasonable, not sure what the 'official policy' is on this.

As there is no further reviews/issues found for almost 2 weeks since the last one, should we move it to RFC?

Cheers,
Oleksii


Re: Connection slots reserved for replication

От
Robert Haas
Дата:
On Wed, Jan 2, 2019 at 6:36 AM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
> I was also thinking about changing the value in PG_CONTROL_VERSION,
> because we added the new field into the control file, but decided to
> leave this change to committer.

We typically omit catversion bumps from submitted patches to avoid
unnecessary conflicts, but I think PG_CONTROL_VERSION doesn't change
enough to cause a problem.  Also, it's not date-dependent the way
catversion is.  So I would include the bump in the patch, if it were
me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Connection slots reserved for replication

От
Kyotaro HORIGUCHI
Дата:
Hello.

Documentation looks fine for me. By the way, a comment for the
caller-site of CheckRequreParameterValues() in xlog.c looks
somewhat stale.

> /* Check to see if any changes to max_connections give problems */
> CheckRequiredParameterValues();

may be better something like this?:

> /* Check to see if any parameter change gives a problem on replication */


In postinit.c:
>    /*
>     * The last few connection slots are reserved for superusers.
>     */
>    if ((!am_superuser && !am_walsender) &&
>        ReservedBackends > 0 &&

This is forgetting about explaing about walsenders.

> The last few connection slots are reserved for
> superusers. Replication connections don't share the same slot
> pool.

Or something?

And the parentheses enclosing "!am_superuser..walsender" seems no
longer needed.


In guc.c:
-        /* see max_connections and max_wal_senders */
+        /* see max_connections */

I don't understand for what reason we should see max_connections
just above. (Or even max_wal_senders) Do we need this comment?
(There're some other instances, but they wont'be for this patch.)


In pg_controldata.c:
+    printf(_("max_wal_senders setting:         %d\n"),
+           ControlFile->max_wal_senders);
     printf(_("max_worker_processes setting:         %d\n"),
            ControlFile->max_worker_processes);
     printf(_("max_prepared_xacts setting:           %d\n"),

The added item seems to want some additional spaces.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:
Hello,

> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Thank you for the review.I took a liberty to address it with v9.

>
> Documentation looks fine for me. By the way, a comment for the
> caller-site of CheckRequreParameterValues() in xlog.c looks
> somewhat stale.
>
>> /* Check to see if any changes to max_connections give problems */
>> CheckRequiredParameterValues();
>
> may be better something like this?:
>
>> /* Check to see if any parameter change gives a problem on replication */

I changed it to "Check if any parameter change gives a problem on recovery”, as I think it is independent of the
[streaming]replication, but I still don’t like the wording, as it just duplicate the comment at the definition of
CheckRequiredParameterValues.Maybe remove the comment altogether? 

>
>
> In postinit.c:
>>   /*
>>    * The last few connection slots are reserved for superusers.
>>    */
>>   if ((!am_superuser && !am_walsender) &&
>>       ReservedBackends > 0 &&
>
> This is forgetting about explaing about walsenders.
>
>> The last few connection slots are reserved for
>> superusers. Replication connections don't share the same slot
>> pool.
>
> Or something?

I changed it to

+        * The last few connection slots are reserved for superusers.
+        * Replication connections are drawn from a separate pool and
+        * not limited by max_connections or superuser_reserved_connections.


>
> And the parentheses enclosing "!am_superuser..walsender" seems no
> longer needed.
>
>
> In guc.c:
> -        /* see max_connections and max_wal_senders */
> +        /* see max_connections */
>
> I don't understand for what reason we should see max_connections
> just above. (Or even max_wal_senders) Do we need this comment?
> (There're some other instances, but they wont'be for this patch.)

I don’t understand what is it pointing to as well, so I’ve removed it.

>
>
> In pg_controldata.c:
> +    printf(_("max_wal_senders setting:         %d\n"),
> +           ControlFile->max_wal_senders);
>     printf(_("max_worker_processes setting:         %d\n"),
>            ControlFile->max_worker_processes);
>     printf(_("max_prepared_xacts setting:           %d\n"),
>
> The added item seems to want some additional spaces.

Good catch, fixed.

Attached is v9. I also bumped up the PG_CONTROL_VERSION to 1200 per the prior comment by Robert.

Cheers,
Oleksii


Вложения

Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:


On 25. Jan 2019, at 18:37, Oleksii Kliukin <alexk@hintbits.com> wrote:

Hello,

On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Thank you for the review.I took a liberty to address it with v9.


So, given there are no further feedback or suggestions, can we set it to RFC?

Cheers,
Oleksii

Re: Connection slots reserved for replication

От
Petr Jelinek
Дата:
On 31/01/2019 11:25, Oleksii Kliukin wrote:
> 
> 
>> On 25. Jan 2019, at 18:37, Oleksii Kliukin <alexk@hintbits.com
>> <mailto:alexk@hintbits.com>> wrote:
>>
>> Hello,
>>
>>> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI
>>> <horiguchi.kyotaro@lab.ntt.co.jp
>>> <mailto:horiguchi.kyotaro@lab.ntt.co.jp>> wrote:
>>
>> Thank you for the review.I took a liberty to address it with v9.
> 
> 
> So, given there are no further feedback or suggestions, can we set it to
> RFC?

I vote yes.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:

> On 31. Jan 2019, at 11:50, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
>
> On 31/01/2019 11:25, Oleksii Kliukin wrote:
>>
>>
>>> On 25. Jan 2019, at 18:37, Oleksii Kliukin <alexk@hintbits.com
>>> <mailto:alexk@hintbits.com>> wrote:
>>>
>>> Hello,
>>>
>>>> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI
>>>> <horiguchi.kyotaro@lab.ntt.co.jp
>>>> <mailto:horiguchi.kyotaro@lab.ntt.co.jp>> wrote:
>>>
>>> Thank you for the review.I took a liberty to address it with v9.
>>
>>
>> So, given there are no further feedback or suggestions, can we set it to
>> RFC?
>
> I vote yes.

Thanks, set it to RFC.

Cheers,
Oleksii

Re: Connection slots reserved for replication

От
Michael Paquier
Дата:
On Thu, Jan 31, 2019 at 12:08:18PM +0100, Oleksii Kliukin wrote:
> Thanks, set it to RFC.

Oh, I have just noticed this patch when doing my vacuum homework.  I
have just added my name as committer, just wait a bit until the CF is
closed before I begin looking at it..
--
Michael

Вложения

Re: Connection slots reserved for replication

От
Michael Paquier
Дата:
On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
> Oh, I have just noticed this patch when doing my vacuum homework.  I
> have just added my name as committer, just wait a bit until the CF is
> closed before I begin looking at it..

So, I have looked at this thread and the latest patch, and the thing
looks in good shape.  Nice job by the authors and the reviewers.  It
is a bit of a pain for users to hope that max_connections would be
able to handle replication connections when needed, which can cause
backups to not happen.  Using a separate GUC while we have
max_wal_senders here to do the job is also not a good idea, so the
approach of the patch is sound.  And on top of that, dependencies
between GUCs get reduced.

I have spotted one error though: the patch does not check if changing
max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
calculation into check_autovacuum_max_workers,
check_max_worker_processes and check_maxconnections.

One thing is that the slot position calculation gets a bit more
complicated with the new slot set for WAL senders, still the code is
straight-forward to follow so that's not really an issue in my
opinion.  The potential backward-incompatible issue issue of creating
a standby with lower max_wal_senders value than the primary is also
something we can live with as that's simple enough to address, and I'd
like to think that most users prefer inheriting the parameter from the
primary to ease failover flows.

It seems to me that it would be a good idea to have an
autovacuum-worker-related message in the context of InitProcess() for
a failed backend creation, but we can deal with that later if
necessary.

With all that reviewed, I finish with the attached that I am
comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
reminder because xl_parameter_change gets an upgrade, and I am most
likely going to forget it.

Please let me know if you have comments.  I am still planning to do an
extra pass on it.
--
Michael

Вложения

Re: Connection slots reserved for replication

От
Kyotaro HORIGUCHI
Дата:
Hello.

At Thu, 7 Feb 2019 15:51:46 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190207065146.GN4074@paquier.xyz>
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
> > Oh, I have just noticed this patch when doing my vacuum homework.  I
> > have just added my name as committer, just wait a bit until the CF is
> > closed before I begin looking at it..
> 
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.
> 
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

I don't think it is a good thing, including the existing checker
functions. But as you (seem to have) wrote below it can be
another issue. So I agree to that.

> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my

I was hesitating to propose to change it (in InitProcGlobal) but
I like the fixed style.

> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

I belive so.

> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

(Maybe I'm losing the point, but) The guc validate functions for
max_connections and friends seem rather harmful than useless,
since we only see an error for max_connections and others won't
be seen, and it conceals what is happening. I think we should
remove the validators and InitializeMaxBackends should complain
on that providing more meaningful information. In another patch?

> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.

Some removed comments are revived but I'm fine with it. Adding
tags in documentation seems fine.

> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

After all I (am not the author) am fine with it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:
> On 7. Feb 2019, at 07:51, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Feb 02, 2019 at 10:35:20AM +0900, Michael Paquier wrote:
>> Oh, I have just noticed this patch when doing my vacuum homework.  I
>> have just added my name as committer, just wait a bit until the CF is
>> closed before I begin looking at it..
>
> So, I have looked at this thread and the latest patch, and the thing
> looks in good shape.  Nice job by the authors and the reviewers.  It
> is a bit of a pain for users to hope that max_connections would be
> able to handle replication connections when needed, which can cause
> backups to not happen.  Using a separate GUC while we have
> max_wal_senders here to do the job is also not a good idea, so the
> approach of the patch is sound.  And on top of that, dependencies
> between GUCs get reduced.

Thank you for picking it up!

>
> I have spotted one error though: the patch does not check if changing
> max_wal_senders overflows MAX_BACKEND, and we need to reflect a proper
> calculation into check_autovacuum_max_workers,
> check_max_worker_processes and check_maxconnections.

Good catch, thank you.

I’ve noticed you returned the 'see max_connections’ parameter there. As noticed
previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
are supposed to see there, perhaps it makes sense to elaborate (besides, the
comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
max_connections, which is not true anymore).

>
> One thing is that the slot position calculation gets a bit more
> complicated with the new slot set for WAL senders, still the code is
> straight-forward to follow so that's not really an issue in my
> opinion.  The potential backward-incompatible issue issue of creating
> a standby with lower max_wal_senders value than the primary is also
> something we can live with as that's simple enough to address, and I'd
> like to think that most users prefer inheriting the parameter from the
> primary to ease failover flows.

+1

>
> It seems to me that it would be a good idea to have an
> autovacuum-worker-related message in the context of InitProcess() for
> a failed backend creation, but we can deal with that later if
> necessary.

Hm.. I am wondering how the autovacuum workers can run out of slots there?

>
> With all that reviewed, I finish with the attached that I am
> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
> reminder because xl_parameter_change gets an upgrade, and I am most
> likely going to forget it.
>
> Please let me know if you have comments.  I am still planning to do an
> extra pass on it.

Apart from the comment-related notes above your changes look good to me, thank
you.

Cheers,
Oleksii

Re: Connection slots reserved for replication

От
Michael Paquier
Дата:
On Thu, Feb 07, 2019 at 04:16:22PM +0100, Oleksii Kliukin wrote:
> On 7. Feb 2019, at 07:51, Michael Paquier <michael@paquier.xyz> wrote:
> I’ve noticed you returned the 'see max_connections’ parameter there. As noticed
> previously in this thread by Kyotaro Horiguchi, it’s not clear what exactly we
> are supposed to see there, perhaps it makes sense to elaborate (besides, the
> comment on max_wal_senders at guc.c:382 hints that max_wal_senders depends on
> max_connections, which is not true anymore).

After waking up on that, it seems that I overdid this part.  I think
that I was trying to document the relationship with the multiple
parameters.  Now it is true that it is easy enough to grep for the GUC
strings and find them.  It may be better to avoid spawning the
calculations in the check functions in multiple lines as well.

>> It seems to me that it would be a good idea to have an
>> autovacuum-worker-related message in the context of InitProcess() for
>> a failed backend creation, but we can deal with that later if
>> necessary.
>
> Hm.. I am wondering how the autovacuum workers can run out of slots
> there?

Normally they cannot if you look at the way the launcher decides new
workers.  Still, if one begins to hack the autovacuum code for a fork
or a new feature, I think that it would be useful to display a
context-related message instead of the confusing "too many clients" in
this case.  This way it is possible to debug properly things.

>> With all that reviewed, I finish with the attached that I am
>> comfortable enough to commit.  XLOG_PAGE_MAGIC is bumped as well, as a
>> reminder because xl_parameter_change gets an upgrade, and I am most
>> likely going to forget it.
>>
>> Please let me know if you have comments.  I am still planning to do an
>> extra pass on it.
>
> Apart from the comment-related notes above your changes look good to
> me, thank you.

Thanks for the confirmation.  I am not planning to finish wrapping
this one today anyway, and next week should be fine.
--
Michael

Вложения

Re: Connection slots reserved for replication

От
Michael Paquier
Дата:
On Fri, Feb 08, 2019 at 09:37:40AM +0900, Michael Paquier wrote:
> Thanks for the confirmation.  I am not planning to finish wrapping
> this one today anyway, and next week should be fine.

And done.  I have done an extra pass on it, reordering a bit
parameters and comments, and removed the extra comments I added
previously in guc.c.
--
Michael

Вложения

Re: Connection slots reserved for replication

От
Kevin Hale Boyes
Дата:
On Mon, 11 Feb 2019 at 18:39, Michael Paquier <michael@paquier.xyz> wrote:
And done.
 
Michael,
I think there's a small problem with the commit.
The position of xlrec.max_wal_senders (line 117) should be below max_worker_processes.

112         appendStringInfo(buf, "max_connections=%d max_worker_processes=%d "
113                          "max_wal_senders=%d max_prepared_xacts=%d "
114                          "max_locks_per_xact=%d wal_level=%s "
115                          "wal_log_hints=%s track_commit_timestamp=%s",
116                          xlrec.MaxConnections,
117                          xlrec.max_wal_senders,
118                          xlrec.max_worker_processes,
119                          xlrec.max_prepared_xacts,
120                          xlrec.max_locks_per_xact,
121                          wal_level_str,
122                          xlrec.wal_log_hints ? "on" : "off",
123                          xlrec.track_commit_timestamp ? "on" : "off");

Kevin.

 

Re: Connection slots reserved for replication

От
Michael Paquier
Дата:
On Mon, Feb 11, 2019 at 08:57:41PM -0700, Kevin Hale Boyes wrote:
> I think there's a small problem with the commit.
> The position of xlrec.max_wal_senders (line 117) should be below
> max_worker_processes.

Fixed, thanks!
--
Michael

Вложения

Re: Connection slots reserved for replication

От
Oleksii Kliukin
Дата:

> On 12. Feb 2019, at 05:12, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Feb 11, 2019 at 08:57:41PM -0700, Kevin Hale Boyes wrote:
>> I think there's a small problem with the commit.
>> The position of xlrec.max_wal_senders (line 117) should be below
>> max_worker_processes.
>
> Fixed, thanks!

Wonderful, thank you very much for taking it to commit and everyone involved for getting it through!

Cheers,
Oleksii