Re: postgres in HA constellation

Поиск
Список
Период
Сортировка
От Chris Browne
Тема Re: postgres in HA constellation
Дата
Msg-id 60ac40814w.fsf@dba2.int.libertyrms.com
обсуждение исходный текст
Ответ на Re: postgres in HA constellation  ("Sebastian Reitenbach" <itlistuser@rapideye.de>)
Список pgsql-admin
bnichols@ca.afilias.info (Brad Nicholson) writes:
> On Wed, 2006-10-11 at 16:12 -0500, Jim C. Nasby wrote:
>> On Wed, Oct 11, 2006 at 10:28:44AM -0400, Andrew Sullivan wrote:
>> > On Thu, Oct 05, 2006 at 08:43:21PM -0500, Jim Nasby wrote:
>> > > Isn't it entirely possible that if the master gets trashed it would
>> > > start sending garbage to the Slony slave as well?
>> >
>> > Well, maybe, but unlikely.  What happens in a shared-disc failover is
>> > that the second machine re-mounts the same partition as the old
>> > machine had open.  The risk is the case where your to-be-removed
>> > machine hasn't actually stopped writing on the partition yet, but
>> > your failover software thinks it's dead, and can fail over.  Two
>> > processes have the same Postgres data and WAL files mounted at the
>> > same time, and blammo.  As nearly as I can tell, it takes
>> > approximately zero time for this arrangement to make such a mess that
>> > you're not committing any transactions.  Slony will only get the data
>> > on COMMIT, so the risk is very small.
>>
>> Hrm... I guess it depends on how quickly the Slony master would stop
>> processing if it was talking to a shared-disk that had become corrupt
>> from another postmaster.
>
> That doesn't depend on Slony, it depends on Postgres.  If transactions
> are committing on the master, Slony will replicate them.  You could have
> a situation where your HA failover trashes some of you database, but the
> database still starts up.  It starts accepting and replicating
> transactions before the corruption is discovered.

There's a bit of "joint responsibility" there.

Let's suppose that the disk has gone bad, zeroing out some index pages
for the Slony-I table sl_log_1.  (The situation will be the same for
just about any kind of corruption of a Slony-I internal table.)

There are two possibilities:
  1.  The PostgreSQL instance may notice that those pages are bad,
      returning an error message, and halting the SYNC.

  2.  The PostgreSQL instance may NOT notice that those pages are bad,
      and, as a result, fail to apply some updates, thereby corrupting
      the subscriber.

I think there's a pretty high probability of 1) happening rather than
2), but there is a risk of corruption of subscribers roughly
proportional to the probability of 2).

My "gut feel" is that the probability of 2) is pretty small, but I
don't have anything to point to as a proof of that...
--
output = reverse("gro.mca" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/
"One of the main causes of the fall of the Roman Empire was that,
lacking zero, they had no way to indicate successful termination of
their C programs."  -- Robert Firth

В списке pgsql-admin по дате отправления:

Предыдущее
От: Brad Nicholson
Дата:
Сообщение: Re: postgres in HA constellation
Следующее
От: "Aaron Bono"
Дата:
Сообщение: Re: Recursive use