Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)

Поиск
Список
Период
Сортировка
От Joris Dobbelsteen
Тема Re: Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
Дата
Msg-id 73427AD314CC364C8DF0FFF9C4D693FF037B30@nehemiah.joris2k.local
обсуждение исходный текст
Ответ на Fault Tolerant Postgresql (two machines, two postmasters, one disk array)  (John Gateley <gateley@jriver.com>)
Список pgsql-general
>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Ron Johnson
>Sent: donderdag 17 mei 2007 22:56
>To: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Fault Tolerant Postgresql (two
>machines, two postmasters, one disk array)
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On 05/17/07 09:35, Andrew Sullivan wrote:
>[snip]
>>
>> The problems come when you get a false detection of machine failure.
>> Consider a case, for instance, where the machine A gets overloaded,
>> goes into swap madness, or has a billion runaway processes
>that cause
>> it to stagger.  In this case, A might not respond in time on the
>> heartbeat monitor, and then the standby machine B thinks A
>has failed.
>> But A doesn't know that, of course, because it is working as hard as
>> it can just to stay up.  Now, if B mounts the disk and starts the
>> postmaster, but doesn't have a way to make _sure_ tha A is
>completely
>> disconnected from the disk, then it's entirely possible A will flush
>> buffers out to the still-mounted data area.  Poof!
>> Instant data corruption.
>
>Aren't there PCI heartbeat cards that are independent of the
>load on the host machine?

A solution commonly seen is to cut the power on the 'failed' machine
just before a take-over is done. Solutions for that are available...

Besides this, you don't want a separate PCI heartbeat card to see if
your software happens to work. Same situation with a watchdog, you don't
want the watchdog to 'reset' itself continuesly, as you loose the
benefit of the watchdog.

Generally your software should also check is postgresql is operation as
expected: its not stopped or non-responsive. In these cases the system
should fail over. The 'cut power' solution works.

If you look for details how to set up, heartbeat (www.linux-ha.org) and
search for stonith. They have lots and lots of very useful information
about high availability solutions. Furthermore the package is used
arround the world for these solutions by large companies and part of
several other software packages. It supports linux and BSD...

- Joris


В списке pgsql-general по дате отправления:

Предыдущее
От: Shane
Дата:
Сообщение: Re: Database corruption
Следующее
От: "ee e"
Дата:
Сообщение: about the pgadmin3