Re: Patch for fail-back without fresh backup

Поиск

Список

Период

Сортировка

От	Sameer Thakur
Тема	Re: Patch for fail-back without fresh backup
Дата	20 сентября 2013 г. 13:10:49
Msg-id	CABzZFEuJjgc2oPYAJempj2n8WMF5GwdSNwCs__vq4SWzCseOAw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch for fail-back without fresh backup (Samrat Revagade <revagade.samrat@gmail.com>)
Ответы	Re: Patch for fail-back without fresh backup (Samrat Revagade <revagade.samrat@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

>Attached patch combines documentation patch and source-code patch.

I have had a stab at reviewing the documentation. Have a look.

--- a/doc/src/sgml/config.sgml

+++ b/doc/src/sgml/config.sgml

@@ -1749,6 +1749,50 @@ include 'filename'

</listitem>

</varlistentry>

+ <varlistentry id="guc-synchronous-transfer" xreflabel="synchronous_transfer">

+ <term><varname>synchronous_transfer</varname> (<type>enum</type>)</term>

+ <indexterm>

+ <primary><varname>synchronous_transfer</> configuration parameter</primary>

+ </indexterm>

+ <listitem>

+ <para>

+ This parameter controls the synchronous nature of WAL transfer and

+ maintains file system level consistency between master server and

+ standby server. It specifies whether master server will wait for file

+ system level change (for example : modifying data page) before

+ the corresponding WAL records are replicated to the standby server.

+ </para>

+ <para>

+ Valid values are <literal>commit</>, <literal>data_flush</> and

+ <literal>all</>. The default value is <literal>commit</>, meaning

+ that master will only wait for transaction commits, this is equivalent

+ to turning off <literal>synchronous_transfer</> parameter and standby

+ server will behave as a <quote>synchronous standby </> in

+ Streaming Replication. For value <literal>data_flush</>, master will

+ wait only for data page modifications but not for transaction

+ commits, hence the standby server will act as <quote>asynchronous

+ failback safe standby</>. For value <literal> all</>, master will wait

+ for data page modifications as well as for transaction commits and

+ resultant standby server will act as <quote>synchronous failback safe

+ standby</>.The wait is on background activities and hence will not create performance overhead.

+ To configure synchronous failback safe standby

+ <xref linkend="guc-synchronous-standby-names"> should be set.

+ </para>

+ </listitem>

+ </varlistentry>

@@ -2258,14 +2302,25 @@ include 'filename'</indexterm>

<para>

- Specifies a comma-separated list of standby names that can support

- <firstterm>synchronous replication</>, as described in

- <xref linkend="synchronous-replication">.

- At any one time there will be at most one active synchronous standby;

- transactions waiting for commit will be allowed to proceed after

- this standby server confirms receipt of their data.

- The synchronous standby will be the first standby named in this list

- that is both currently connected and streaming data in real-time

+ Specifies a comma-separated list of standby names. If this parameter

+ is set then standby will behave as synchronous standby in replication,

+ as described in <xref linkend="synchronous-replication"> or synchronous

+ failback safe standby, as described in <xref linkend="failback-safe">.

+ At any time there will be at most one active standby; when standby is

+ synchronous standby in replication, transactions waiting for commit

+ will be allowed to proceed after this standby server confirms receipt

+ of their data. But when standby is synchronous failback safe standby

+ data page modifications as well as transaction commits will be allowed

+ to proceed only after this standby server confirms receipt of their data.

+ If this parameter is set to empty value and

+ <xref linkend="guc-synchronous-transfer"> is set to <literal>data_flush</>

+ then standby is called as asynchronous failback safe standby and only

+ data page modifications will wait before corresponding WAL record is

+ replicated to standby.

+ </para>

+ <para>

+ Synchronous standby in replication will be the first standby named in

+ this list that is both currently connected and streaming data in real-time

(as shown by a state of <literal>streaming</literal> in the

<literal>pg_stat_replication</></link> view).

--- a/doc/src/sgml/high-availability.sgml

+++ b/doc/src/sgml/high-availability.sgml

+ <sect2 id="failback-safe">

+ <title>Setting up failback safe standby</title>

+ <indexterm zone="high-availability">

+ <primary>Setting up failback safe standby</primary>

+ </indexterm>

+ <para>

+ PostgreSQL streaming replication offers durability, but if the master crashes and

+a particular WAL record is unable to reach to standby server, then that

+WAL record is present on master server but not on standby server.

+In such a case master is ahead of standby server in term of WAL records and data in database.

+This leads to file-system level inconsistency between master and standby server.

+For example a heap page update on the master might not have been reflected on standby when master crashes.

+ </para>

+ <para>

+Due to this inconsistency, fresh backup of new master onto new standby is needed to re-prepare HA cluster.

+Taking fresh backup can be a very time consuming process when database is of large size. In such a case, disaster recovery

+can take very long time, if streaming replication is used to setup the high availability cluster.

+ </para>

+ <para>

+If HA cluster is configured with failback safe standby then this fresh back up can be avoided.

+The <xref linkend="guc-synchronous-transfer"> parameter has control over all WAL transfers and

+will not make any file system level change until master gets a confirmation from standby server.

+This avoids the need of a fresh backup by maintaining consistency.

+ </para>

+ <sect3 id="Failback-safe-config">

+ <title>Basic Configuration</title>

+ <para>

+ Failback safe standby can be asynchronous or synchronous in nature.

+ This will depend upon whether master will wait for transaction commit

+ or not. By default failback safe mechanism is turned off.

+ </para>

+ <para>

+ The first step to configure HA with failback safe standby is to setup

+ streaming replication. Configuring synchronous failback safe standby

+ requires setting up <xref linkend="guc-synchronous-transfer"> to

+ <literal>all</> and <xref linkend="guc-synchronous-standby-names">

+ must be set to a non-empty value. This configuration will cause each

+ commit and data page modification to wait for confirmation that standby

+ has written corresponding WAL record to durable storage. Configuring

+ asynchronous failback safe standby requires only setting up

+ <xref linkend="guc-synchronous-transfer"> to <literal> data_flush</>.

+ This configuration will cause only data page modifications to wait

+ for confirmation that standby has written corresponding WAL record

+ to durable storage.

+ </para>

+ </sect3>

+ </sect2>

</sect1>

</para>

<para>

- So, switching from primary to standby server can be fast but requires

- some time to re-prepare the failover cluster. Regular switching from

- primary to standby is useful, since it allows regular downtime on

- each system for maintenance. This also serves as a test of the

- failover mechanism to ensure that it will really work when you need it.

- Written administration procedures are advised.

+ At the time of failover there is a possibility of file-system level

+ inconsistency between the old primary and the old standby server and hence

+ a fresh backup from new master onto old master is needed for configuring

+ the old primary server as a new standby server. Without taking fresh

+ backup even if the new standby starts, streaming replication does not

+ start successfully. The activity of taking backup can be fast for smaller

+ databases but for a large database this activity requires more time to re-prepare the

+ failover cluster in streaming replication configuration of HA cluster.

+ This could break the service level agreement for crash

+ recovery. The need of fresh backup and problem of long

+ recovery time can be solved by using if HA cluster is configured with

+ failback safe standby see <xref linkend="failback-safe">.

+ Failback safe standby allows synchronous WAL transfer at required places

+ while maintaining the file-system level consistency between master and standby

+ server, without having backup to be taken on the old master.

+ </para>

+ <para>

+ Regular switching from primary to standby is useful, since it allows

+ regular downtime on each system for maintenance. This also serves as

+ a test of the failover mechanism to ensure that it will really work

+ when you need it. Written administration procedures are advised.

</para>

<para>

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml

index 2af1738..da3820f 100644

--- a/doc/src/sgml/perform.sgml

+++ b/doc/src/sgml/perform.sgml

</para>

</listitem>

+ <listitem>

+ <para>

+ Set <xref linkend="guc-synchronous-transfer"> to commit; there is no

+ need to guard against database inconsistency between master and standby during failover.

+ </para>

+ </listitem>

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Khandekar
Дата: 20 сентября 2013 г., 13:10:21
Сообщение: Re: Assertions in PL/PgSQL

Следующее

От: Martijn van Oosterhout
Дата: 20 сентября 2013 г., 13:11:09
Сообщение: Re: UTF8 national character data type support WIP patch and list of open issues.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch for fail-back without fresh backup

Предыдущее

Следующее