Re: Sync Rep for 2011CF1

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Sync Rep for 2011CF1
Дата
Msg-id AANLkTikG8WMhOocX9AYsRHYPc-PgxPaG6miFDD9QH3i1@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Sync Rep for 2011CF1  (Aidan Van Dyk <aidan@highrise.ca>)
Ответы Re: Sync Rep for 2011CF1  (Aidan Van Dyk <aidan@highrise.ca>)
Re: Sync Rep for 2011CF1  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Fri, Jan 21, 2011 at 1:09 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
> On Fri, Jan 21, 2011 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Fri, Jan 21, 2011 at 12:23 PM, Aidan Van Dyk <aidan@highrise.ca> wrote:
>>>> When no sync slave is connected, yes, I want to stop things hard.
>>
>>> What you're proposing is to fail things earlier than absolutely
>>> necessary (when they try to XLOG, rather than at commit) but still
>>> later than what I think Simon is proposing (not even letting them log
>>> in).
>>
>> I can't see a reason to disallow login, because read-only transactions
>> can still run in such a situation --- and, indeed, might be fairly
>> essential if you need to inspect the database state on the way to fixing
>> the replication problem.  (Of course, we've already had the discussion
>> about it being a terrible idea to configure replication from inside the
>> database, but that doesn't mean there might not be views or status you
>> would wish to look at.)
>
> And just disallowing new logins is probably not even enough, because
> it allows current logged in clients "forward progress", leading
> towards an eventual hang (with now committed data on the master).
>
> Again, I'm trying to stop "forward progress" as soon as possible when
> a sync slave isn't replicating.  And I'ld like clients to fail with
> errors sooner (hopefully they get to the commit point) rather than
> accumulate the WAL synced to the master and just wait at the commit.
>
> So I think that's a more complete picture of my quick "not do anything
> with no synchronous slave replicating" that I think was what led to
> the no-login approach.

Well, stopping all WAL activity with an error sounds *more* reasonable
than refusing all logins, but I'm not personally sold on it.  For
example, a brief network disruption on the connection between master
and standby would cause the master to grind to a halt... and then
almost immediately resume operations.  More generally, if you have
short-running transactions, there's not much difference between
wait-at-commit and wait-at-WAL, and if you have long-running
transactions, then wait-at-WAL might be gumming up the works more than
necessary.

One idea might be to wait both before and after commit.  If
allow_standalone_primary is off, and a commit is attempted, we check
whether there's a slave connected, and if not, wait for one to
connect.  Then, we write and sync the commit WAL record.  Next, we
wait for the WAL to be ack'd.  Of course, the standby might disappear
between the first check and the second, but it would greatly reduce
the possibility of the master being ahead of the standby after a
crash, which might be useful for some people.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris Browne
Дата:
Сообщение: Re: Review: compact fsync request queue on overflow
Следующее
От: Aidan Van Dyk
Дата:
Сообщение: Re: Sync Rep for 2011CF1