Re: Data replication through disk replication

Поиск

Список

Период

Сортировка

От	Joris Dobbelsteen
Тема	Re: Data replication through disk replication
Дата	19 мая 2007 г. 13:22:17
Msg-id	73427AD314CC364C8DF0FFF9C4D693FF037B32@nehemiah.joris2k.local обсуждение исходный текст
Ответ на	Data replication through disk replication (Thomas Lopatic <thomas@lopatic.de>)
Список	pgsql-general

Дерево обсуждения

>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
>Andrew Sullivan
>Sent: zaterdag 19 mei 2007 15:28
>To: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Data replication through disk replication
>
>On Fri, May 18, 2007 at 05:03:30PM -0700, Ben wrote:
>
>> that all changes are replicated, it won't say an fsync is finished
>> until it's finished on the remote host too, and it won't let
>you mount
>> the block device on the slave system (at least with 0.7x).
>
>How can it guarantee these things?  The web pages say this:
>
>    If the primary node fails, heartbeat is switching the
>    secondary device into primary state and starts the
>    application there. (If you are using it with a non-journaling
>    FS this involves running fsck)
>
>    If the failed node comes up again, it is a new secondary node
>    and has to synchronise its content to the primary. This, of
>    course, will happen whithout interruption of service in the
>    background.
>
>So what happens in those cases where the primary node gets in
>trouble but isn't actually dead yet?  I see a potential for a
>race condition here that is really troubling to me.
>(Especially since it uses the TCP/IP stack, which is
>notoriously subject to DoS on Linux.)  I think you really had
>better have something like STONITH running to use this.

General advise you see at linux-ha is to use redundant heartbeat paths.
You can use a serial link if you want to. Other options are redundent
networks. This is to reduce the probability of a split-brain situation.

As you guessed stonith is much required to guard against possible 'race'
conditions caused by hanging nodes.

As a note, DRDB will also do a lot of work for you. It prevents you from
some mistakes you can make.
While starting it waits if it does not detect the other node, and thus
doesn't know who has the latest data available. Can be overriden by a
timeout if desired. (In practice only a single will fail or both will
come up at the same time). Prevents running out of sync.
It also detects when its out of sync, requiring administrator
intervention.

Another advice is to take a look at the linux-ha web sit and mailing
list. Though the web-site might not be the best, the mailing list is
quite active and a lot of knowledge is available.

In general high availability is complex and requires a lot of thought to
cover all possible cases.

[snip]

- Joris Dobbelsteen

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Data replication through disk replication