Обсуждение: Latches, signals, and waiting

Поиск
Список
Период
Сортировка

Latches, signals, and waiting

От
Tom Lane
Дата:
heikki@postgresql.org (Heikki Linnakangas) writes:
> Log Message:
> -----------
> Use a latch to make startup process wake up and replay immediately when
> new WAL arrives via streaming replication. This reduces the latency, and
> also allows us to use a longer polling interval, which is good for energy
> efficiency.

> We still need to poll to check for the appearance of a trigger file, but
> the interval is now 5 seconds (instead of 100ms), like when waiting for
> a new WAL segment to appear in WAL archive.

This is just speculation at this point, because I haven't taken time
to think through the details, but couldn't we improve on that still
further?

There are always going to be some conditions that we have to poll for,
in particular death of the postmaster (since Unix unaccountably fails
to provide a SIGPARNT signal condition :-().  However postmaster death
isn't really something that needs an instant response IMO.  I would like
to get the wakeup-and-poll interval for our background processes down to
a minute or so; so far as postmaster death goes that doesn't seem like
an unacceptable response time.

So I'm wondering if we couldn't eliminate the five-second sleep
requirement here too.  It's problematic anyhow, since somebody looking
for energy efficiency will still feel it's too short, while somebody
concerned about fast failover will feel it's too long.  Could the
standby triggering protocol be modified so that it involves sending a
signal, not just creating a file?  (One issue is that it's not clear
what that'd translate to on Windows.)
        regards, tom lane


Re: Latches, signals, and waiting

От
Heikki Linnakangas
Дата:
On 15/09/10 16:55, Tom Lane wrote:
> So I'm wondering if we couldn't eliminate the five-second sleep
> requirement here too.  It's problematic anyhow, since somebody looking
> for energy efficiency will still feel it's too short, while somebody
> concerned about fast failover will feel it's too long.

Yep.

>  Could the
> standby triggering protocol be modified so that it involves sending a
> signal, not just creating a file?

Seems reasonable, at least if we still provide an option for more 
frequent polling and no need to send signal.

> (One issue is that it's not clear what that'd translate to on Windows.)

pg_ctl failover ? At the moment, the location of the trigger file is 
configurable, but if we accept a constant location like 
"$PGDATA/failover" pg_ctl could do the whole thing, create the file and 
send signal. pg_ctl on Window already knows how to send the "signal" via 
the named pipe signal emulation.

Fujii-san suggested that we might have a user-defined function for 
triggering failover as well. That's also handy, but it's not a 
replacement because it only works in hot standby mode.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Latches, signals, and waiting

От
Fujii Masao
Дата:
On Wed, Sep 15, 2010 at 11:14 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> (One issue is that it's not clear what that'd translate to on Windows.)
>
> pg_ctl failover ? At the moment, the location of the trigger file is
> configurable, but if we accept a constant location like "$PGDATA/failover"
> pg_ctl could do the whole thing, create the file and send signal. pg_ctl on
> Window already knows how to send the "signal" via the named pipe signal
> emulation.

Right.

> Fujii-san suggested that we might have a user-defined function for
> triggering failover as well.

The attached patch introduces such a user-defined function. This is
useful especially when clusterware like pgpool-II is located in remote
server since it can trigger failover without using something like ssh.

> That's also handy, but it's not a replacement
> because it only works in hot standby mode.

Yep.

And we should increase the sleep time in walsender's poll loop (i.e.,
increase the default value of wal_sender_delay) too? Currently it's
very small, 200ms.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: Latches, signals, and waiting

От
Fujii Masao
Дата:
On Wed, Sep 15, 2010 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I'm wondering if we couldn't eliminate the five-second sleep
> requirement here too.

That would make the shutdown time longer since startup process currently
cannot respond to SIGTERM and SIGHUP immediately. To avoid this, I think
that we should change the signal handlers of startup process so that they
call WakeupRecovery.

The attached patch makes StartupProcSigHupHandler and StartupProcShutdownHandler
call WakeupRecovery.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: Latches, signals, and waiting

От
Fujii Masao
Дата:
On Thu, Sep 16, 2010 at 1:23 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Fujii-san suggested that we might have a user-defined function for
>> triggering failover as well.
>
> The attached patch introduces such a user-defined function. This is
> useful especially when clusterware like pgpool-II is located in remote
> server since it can trigger failover without using something like ssh.

I forgot to check if the caller of that function has superuser permission.
Here is the updated version.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения