Обсуждение: Synchronous replication: sleeping

Поиск
Список
Период
Сортировка

Synchronous replication: sleeping

От
Heikki Linnakangas
Дата:
In walsender, in the main loop that waits for backend requests to send 
WAL, there's this comment:

> +         /*
> +          * Nap for the configured time or until a request arrives.
> +          *
> +          * On some platforms, signals won't interrupt the sleep.  To ensure we
> +          * respond reasonably promptly when someone signals us, break down the
> +          * sleep into 1-second increments, and check for interrupts after each
> +          * nap.
> +          */

That's apparently copy-pasted from bgwriter. It's fine for bgwriter, 
where a prompt response is not important, but it seems pretty awful for 
synchronous replication. On such platforms, that would introduce a delay 
of 500ms on average at every commit. I'm not sure if the comment is 
actually accurate, though. bgwriter uses pq_usleep(), while this loop 
uses pq_wait, which uses secure_poll().

There's also a small race condition in that loop:

> +         while (remaining > 0)
> +         {
> +             int waitres;
> +             
> +             if (got_SIGHUP || shutdown_requested || replication_requested)
> +                 break;
> +             
> +             /*
> +              * Check whether the data from standby can be read.
> +              */
> +             waitres = pq_wait(true, false, 
> +                               remaining > 1000 ? 1000 : remaining);
> +             
>          ...

If a signal is received just before pq_wait call, after checking 
replication_requested, pq_wait won't be interrupted and will wait up to 
a second before responding to it.

BTW, on what platforms signal doesn't interrupt sleep?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Synchronous replication: sleeping

От
Martijn van Oosterhout
Дата:
On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
> If a signal is received just before pq_wait call, after checking
> replication_requested, pq_wait won't be interrupted and will wait up to
> a second before responding to it.
>
> BTW, on what platforms signal doesn't interrupt sleep?

In theory, none. SIGALRM is not set as SA_RESTART so any system call
should be interrupted. This applies to POSIX systems though, not sure
about Windows.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Synchronous replication: sleeping

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
>> BTW, on what platforms signal doesn't interrupt sleep?

> In theory, none.

In practice, they exist.  In particular I can demonstrate the issue
on HPUX 10.20.  I also dispute your claim that the behavior is
forbidden by standards,  For example, the Single Unix Spec
http://www.opengroup.org/onlinepubs/007908799/xsh/select.html
saith
If SA_RESTART has been set for the interrupting signal, it isimplementation-dependent whether select() restarts or
returnswith[EINTR].
 

and since we set SA_RESTART for most everything, we are exposed to the
implementation dependency.

I complained about this previously, but nothing came of it:
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php
        regards, tom lane


Re: Synchronous replication: sleeping

От
"Fujii Masao"
Дата:
Hi,

On Mon, Dec 8, 2008 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
>> On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
>>> BTW, on what platforms signal doesn't interrupt sleep?
>
>> In theory, none.
>
> In practice, they exist.  In particular I can demonstrate the issue
> on HPUX 10.20.  I also dispute your claim that the behavior is
> forbidden by standards,  For example, the Single Unix Spec
> http://www.opengroup.org/onlinepubs/007908799/xsh/select.html
> saith
>
>        If SA_RESTART has been set for the interrupting signal, it is
>        implementation-dependent whether select() restarts or returns with
>        [EINTR].
>
> and since we set SA_RESTART for most everything, we are exposed to the
> implementation dependency.
>
> I complained about this previously, but nothing came of it:
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php

Umm... it's difficult problem. Is it OK if SA_RESTART is removed from only the
signals which walsender uses, and EINTR handling is added into every system
call which walsender uses? Some system calls which walsender uses already
have EINTR handling, for example pq_recvbuf handles EINTR by recv().

Does anyone have a better idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center