Re: Sync Rep v17

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Sync Rep v17
Дата
Msg-id AANLkTi=P1f2rwdQ7pOrdB=nzyQ+4Xq_OKD8aVkYr5pqk@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Sync Rep v17  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Sync Rep v17  (Jaime Casanova <jaime@2ndquadrant.com>)
Re: Sync Rep v17  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
>> On the other hand, I see no particular
>> harm in leaving the option in there either, though I definitely think
>> the default should be changed to -1.
>
> The default setting should be to *not* freeze up if you lose the
> standby. That behaviour unexpectedly leads to an effective server down
> situation, rather than 2 minutes of slow running.

My understanding is that the parameter will wait on every commit, not
just once.  There's no mechanism to do anything else.  But I did some
testing this evening and actually it appears to not work at all.  I
hit walreceiver with a SIGSTOP and the commit never completes, even
after the two minute timeout.  Also, when I restarted walreceiver
after a long time, I got a server crash.

DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
DEBUG:  released 0 procs up to 0/3014690
DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
DEBUG:  released 2 procs up to 0/3027BC8
WARNING:  could not locate ourselves on wait queue
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 
The connection to the server was lost. Attempting reset: DEBUG:
shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make
FATAL:  could not receive data from WAL stream: server closed the
connection unexpectedly    This probably means the server terminated abnormally    before or while processing the
request.
Failed.
!> LOG:  record with zero length at 0/3027BC8
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:       STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 0/3000000
LOG:  streaming replication successfully connected to primary
DEBUG:  standby "standby" is a potential synchronous standby
DEBUG:  write 0/0 flush 0/0 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/0
DEBUG:  standby "standby" has now caught up with primary
DEBUG:  write 0/3027C18 flush 0/0 apply 0/3027BC8
DEBUG:  standby "standby" is now the synchronous replication standby
DEBUG:  released 0 procs up to 0/0
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/3027C18
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027C18
DEBUG:  released 0 procs up to 0/3027C18

(lots more copies of those last two messages)

I believe the problem is that the definition of IsOnSyncRepQueue is
bogus, so that the loop in SyncRepWaitOnQueue always takes the first
branch.

It was a little confusing to me setting this up that setting only
synchronous_replication did nothing; I had to also set
synchronous_standby_names.  We might need a cross-check there.  I
believe the docs for synchronous_replication also need some updating;
this part appears to be out of date:

+        between primary and standby. The commit wait will last until the
+        first reply from any standby. Multiple standby servers allow
+        increased availability and possibly increase performance as well.

The words "on the primary" in the next sentence may not be necessary
any more either, as I believe this parameter now has no effect
anywhere else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: FDW API: don't like the EXPLAIN mechanism
Следующее
От: Robert Haas
Дата:
Сообщение: Re: FDW API: don't like the EXPLAIN mechanism