Re: Sync Rep for 2011CF1

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Sync Rep for 2011CF1
Дата
Msg-id AANLkTikyW6GX3Mh2qTN=SfoQ=N10oWS3FHcKPp9OKCNa@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Sync Rep for 2011CF1  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Sync Rep for 2011CF1  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
On Wed, Feb 16, 2011 at 11:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2011-02-16 at 17:40 +0200, Heikki Linnakangas wrote:
>> On 16.02.2011 17:36, Simon Riggs wrote:
>> > On Tue, 2011-02-15 at 12:08 -0500, Robert Haas wrote:
>> >> On Mon, Feb 14, 2011 at 12:25 AM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> >>> On Fri, Feb 11, 2011 at 4:06 AM, Heikki Linnakangas
>> >>> <heikki.linnakangas@enterprisedb.com>  wrote:
>> >>>> I added a XLogWalRcvSendReply() call into XLogWalRcvFlush() so that it also
>> >>>> sends a status update every time the WAL is flushed. If the walreceiver is
>> >>>> busy receiving and flushing, that would happen once per WAL segment, which
>> >>>> seems sensible.
>> >>>
>> >>> This change can make the callback function "WalRcvDie()" call ereport(ERROR)
>> >>> via XLogWalRcvFlush(). This looks unsafe.
>> >>
>> >> Good catch.  Is the cleanest solution to pass a boolean parameter to
>> >> XLogWalRcvFlush() indicating whether we're in the midst of dying?
>> >
>> > Surely if you do this then sync rep will fail to respond correctly if
>> > WalReceiver dies.
>> >
>> > Why is it OK to write to disk, but not OK to reply?
>>
>> Because the connection might be dead. A broken connection is a likely
>> cause of walreceiver death.
>
> Would it not be possible to check that?

I'm not actually sure that it matters that much whether we do or not.
ISTM that the WAL receiver is normally going to exit the main loop (in
WalReceiverMain) right here:
       /* Process any requests or signals received recently */       ProcessWalRcvInterrupts();

But to get to that point, we either have to be making our first pass
through the loop (in which case nothing interesting has happened yet)
or we have to have just completed an iteration through the loop (in
which case we just sent a reply).  I think that the only thing that
can have changed since the last reply is the replay position, which
this version of the sync rep patch doesn't care about anyway.  Even if
it did, I'm not sure it'd be worth complicating the die path to
squeeze in one final reply.

Actually, on further reflection, I'm not even sure why we bother with
the fsync.  It seems like a useful safeguard but I'm not seeing how we
can get to that point without having fsync'd everything anyway.  Am I
missing something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Debian readline/libedit breakage
Следующее
От: Robert Haas
Дата:
Сообщение: Re: contrib loose ends: 9.0 to 9.1 incompatibilities