Re: Review for GetWALAvailability()

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Review for GetWALAvailability()
Дата
Msg-id 20200617.121031.1692393794056067357.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Review for GetWALAvailability()  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: Review for GetWALAvailability()  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-hackers
At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Jun-17, Fujii Masao wrote:
> > On 2020/06/17 3:50, Alvaro Herrera wrote:
> 
> > So InvalidateObsoleteReplicationSlots() can terminate normal backends.
> > But do we want to do this? If we want, we should add the note about this
> > case into the docs? Otherwise the users would be surprised at termination
> > of backends by max_slot_wal_keep_size. I guess that it's basically rarely
> > happen, though.
> 
> Well, if we could distinguish a walsender from a non-walsender process,
> then maybe it would make sense to leave backends alive.  But do we want
> that?  I admit I don't know what would be the reason to have a
> non-walsender process with an active slot, so I don't have a good
> opinion on what to do in this case.

The non-walsender backend is actually doing replication work. It
rather should be killed?

> > > > +        /*
> > > > +         * Signal to terminate the process using the replication slot.
> > > > +         *
> > > > +         * Try to signal every 100ms until it succeeds.
> > > > +         */
> > > > +        if (!killed && kill(active_pid, SIGTERM) == 0)
> > > > +            killed = true;
> > > > +        ConditionVariableTimedSleep(&slot->active_cv, 100,
> > > > +                                    WAIT_EVENT_REPLICATION_SLOT_DROP);
> > > > +    } while (ReplicationSlotIsActive(slot, NULL));
> > > 
> > > Note that here you're signalling only once and then sleeping many times
> > > in increments of 100ms -- you're not signalling every 100ms as the
> > > comment claims -- unless the signal fails, but you don't really expect
> > > that.  On the contrary, I'd claim that the logic is reversed: if the
> > > signal fails, *then* you should stop signalling.
> > 
> > You mean; in this code path, signaling fails only when the target process
> > disappears just before signaling. So if it fails, slot->active_pid is
> > expected to become 0 even without signaling more. Right?
> 
> I guess kill() can also fail if the PID now belongs to a process owned
> by a different user.  I think we've disregarded very quick reuse of
> PIDs, so we needn't concern ourselves with it.

The first time call to ConditionVariableTimedSleep doen't actually
sleep, so the loop works as expected.  But we may make an extra call
to kill(2).  Calling ConditionVariablePrepareToSleep beforehand of the
loop would make it better.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Review for GetWALAvailability()
Следующее
От: Noah Misch
Дата:
Сообщение: Re: valgrind versus pg_atomic_init()