Обсуждение: PG synchronous replication and unresponsive slave

Поиск

Список

Период

Сортировка

PG synchronous replication and unresponsive slave

От

Manoj Govindassamy

Дата:

11 января 2012 г., 20:51:09

Hi,

I have a PG 9.1.2 Master <--> Slave with synchronous replication setup.
They are all working fine as expected. I do have a case where I want to
flip Master to non replication mode whenever its slave is not
responding. I have set replication_timeout to 5s and whenever salve is
not responding for for more than 5s, i see the master detecting it. But,
the transactions on the master is stuck till the slave comes back. To
get over it, I reloaded the config on master with synchronous_commit =
local. Further transactions on the master are going thru fine with this
local commits turned on.

Here are my questions:

1. Transaction which was stuck right when slave going away never went
thru even after I reloaded master's config with local commit on. I do
see all new transactions on master are going thru fine, except the one
which was stuck initially. How to get this stuck transaction complete or
return with error.

2. Whenever there is a problem with slave, I have to manually reload
master's config with local commit turned on to get master go forward. Is
there any automated way to reload this config with local commit on on
slave's unresponsiveness ? tcp connection timeouts, replication timeouts
all detect the failures, but i want to run some corrective action on
these failure detection.

--
thanks,
Manoj

Re: PG synchronous replication and unresponsive slave

От

Manoj Govindassamy

Дата:

12 января 2012 г., 17:13:09

any help on this is much appreciated.

thanks,
Manoj


On 01/11/2012 01:50 PM, Manoj Govindassamy wrote:
> Hi,
>
> I have a PG 9.1.2 Master <--> Slave with synchronous replication
> setup. They are all working fine as expected. I do have a case where I
> want to flip Master to non replication mode whenever its slave is not
> responding. I have set replication_timeout to 5s and whenever salve is
> not responding for for more than 5s, i see the master detecting it.
> But, the transactions on the master is stuck till the slave comes
> back. To get over it, I reloaded the config on master with
> synchronous_commit = local. Further transactions on the master are
> going thru fine with this local commits turned on.
>
> Here are my questions:
>
> 1. Transaction which was stuck right when slave going away never went
> thru even after I reloaded master's config with local commit on. I do
> see all new transactions on master are going thru fine, except the one
> which was stuck initially. How to get this stuck transaction complete
> or return with error.
>
> 2. Whenever there is a problem with slave, I have to manually reload
> master's config with local commit turned on to get master go forward.
> Is there any automated way to reload this config with local commit on
> on slave's unresponsiveness ? tcp connection timeouts, replication
> timeouts all detect the failures, but i want to run some corrective
> action on these failure detection.
>

PG synchronous replication and unresponsive slave

От

Manoj Govindassamy

Дата:

16 января 2012 г., 17:51:49

anyone with PG Synchronous Replication knowledge, please help me with
your views on the below questions.

thanks,
Manoj


On 01/12/2012 10:12 AM, Manoj Govindassamy wrote:
>
> any help on this is much appreciated.
>
> thanks,
> Manoj
>
>
> On 01/11/2012 01:50 PM, Manoj Govindassamy wrote:
>> Hi,
>>
>> I have a PG 9.1.2 Master <--> Slave with synchronous replication
>> setup. They are all working fine as expected. I do have a case where
>> I want to flip Master to non replication mode whenever its slave is
>> not responding. I have set replication_timeout to 5s and whenever
>> salve is not responding for for more than 5s, i see the master
>> detecting it. But, the transactions on the master is stuck till the
>> slave comes back. To get over it, I reloaded the config on master
>> with synchronous_commit = local. Further transactions on the master
>> are going thru fine with this local commits turned on.
>>
>> Here are my questions:
>>
>> 1. Transaction which was stuck right when slave going away never went
>> thru even after I reloaded master's config with local commit on. I do
>> see all new transactions on master are going thru fine, except the
>> one which was stuck initially. How to get this stuck transaction
>> complete or return with error.
>>
>> 2. Whenever there is a problem with slave, I have to manually reload
>> master's config with local commit turned on to get master go forward.
>> Is there any automated way to reload this config with local commit on
>> on slave's unresponsiveness ? tcp connection timeouts, replication
>> timeouts all detect the failures, but i want to run some corrective
>> action on these failure detection.
>>
>
>

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Fujii Masao

Дата:

17 января 2012 г., 02:44:32

On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy
<manoj@nimblestorage.com> wrote:
>>> 1. Transaction which was stuck right when slave going away never went
>>> thru even after I reloaded master's config with local commit on. I do see
>>> all new transactions on master are going thru fine, except the one which was
>>> stuck initially. How to get this stuck transaction complete or return with
>>> error.

Changing synchronous_commit doesn't affect such a transaction. Instead,
empty synchronous_standby_names and reload the configuration file to
resume that transaction.

>>> 2. Whenever there is a problem with slave, I have to manually reload
>>> master's config with local commit turned on to get master go forward. Is
>>> there any automated way to reload this config with local commit on on
>>> slave's unresponsiveness ? tcp connection timeouts, replication timeouts all
>>> detect the failures, but i want to run some corrective action on these
>>> failure detection.

PostgreSQL doesn't have such a capability, but pgpool-II might have.
Can you ask that in pgpool-II mailing-list?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Manoj Govindassamy

Дата:

17 января 2012 г., 20:38:21

Thanks for your views.

(1) Will try out emptying synchronous_standby_names on replica failures
and verify if the transactions proceeds thru.

(2) We are not comfortable moving to PGPool just for automatic failback
mode on hot-standby failure. Any suggestions on how to build this
failback mechanism for master in PG9.1.2 ? We are using C interface for
PG. Any kind of health checking that we can do on the master to detect
the hot-standby problem and let master reload its config with empty
synchronous_standby_names ?

Any help is much appreciated.

thanks,
Manoj

On 01/16/2012 07:44 PM, Fujii Masao wrote:
> On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy
> <manoj@nimblestorage.com>  wrote:
>>>> 1. Transaction which was stuck right when slave going away never went
>>>> thru even after I reloaded master's config with local commit on. I do see
>>>> all new transactions on master are going thru fine, except the one which was
>>>> stuck initially. How to get this stuck transaction complete or return with
>>>> error.
> Changing synchronous_commit doesn't affect such a transaction. Instead,
> empty synchronous_standby_names and reload the configuration file to
> resume that transaction.
>
>>>> 2. Whenever there is a problem with slave, I have to manually reload
>>>> master's config with local commit turned on to get master go forward. Is
>>>> there any automated way to reload this config with local commit on on
>>>> slave's unresponsiveness ? tcp connection timeouts, replication timeouts all
>>>> detect the failures, but i want to run some corrective action on these
>>>> failure detection.
> PostgreSQL doesn't have such a capability, but pgpool-II might have.
> Can you ask that in pgpool-II mailing-list?
>
> Regards,
>

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Fujii Masao

Дата:

18 января 2012 г., 00:05:13

On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy
<manoj@nimblestorage.com> wrote:
> (2) We are not comfortable moving to PGPool just for automatic failback mode
> on hot-standby failure.

Hmm.. my reply might be misleading. What I meant was to use pgpool-II
as a clusterware for PostgreSQL built-in replication, not as a replication
itself. You can health-check, do failover if necessary and manage the
PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an
operation mode. But you are still not comfortable in using pgpool-II in
that way?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Manoj Govindassamy

Дата:

18 января 2012 г., 00:55:29

I am aware of pgpool-II and its features. Just that my requirements are
little different.  I have a System (PG runs on it) which already has
Failover mechanism to another System and I want PG to be part of this
cluster and not clustered on its own. Mean, PG has to be running in
Master system and in synchronous replication mode with another slave
system, but the failover is driven from the higher level and not just on
PG's failure.

So, whenever PG's slave node is unresponsive, we better let the
replication cutoff and run the master system independently. So, we need
better mechanism to detect when Master PG's synchronous replication not
working as expected or when the slave PG is going unresponsive.  If not,
master PG is held back by the slave PG and so the whole clustered system
is stuck. Hope, I am making some sense here. Let me know if there are
easy ways to detect Master PG's replication not working (via libpq would
be more preferable).

thanks,
Manoj

On 01/17/2012 05:04 PM, Fujii Masao wrote:
> On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy
> <manoj@nimblestorage.com>  wrote:
>> (2) We are not comfortable moving to PGPool just for automatic failback mode
>> on hot-standby failure.
> Hmm.. my reply might be misleading. What I meant was to use pgpool-II
> as a clusterware for PostgreSQL built-in replication, not as a replication
> itself. You can health-check, do failover if necessary and manage the
> PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an
> operation mode. But you are still not comfortable in using pgpool-II in
> that way?
>
> Regards,
>

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Fujii Masao

Дата:

18 января 2012 г., 01:12:47

On Wed, Jan 18, 2012 at 10:54 AM, Manoj Govindassamy
<manoj@nimblestorage.com> wrote:
> I am aware of pgpool-II and its features. Just that my requirements are
> little different.  I have a System (PG runs on it) which already has
> Failover mechanism to another System and I want PG to be part of this
> cluster and not clustered on its own. Mean, PG has to be running in Master
> system and in synchronous replication mode with another slave system, but
> the failover is driven from the higher level and not just on PG's failure.
>
> So, whenever PG's slave node is unresponsive, we better let the replication
> cutoff and run the master system independently. So, we need better mechanism
> to detect when Master PG's synchronous replication not working as expected
> or when the slave PG is going unresponsive.  If not, master PG is held back
> by the slave PG and so the whole clustered system is stuck. Hope, I am
> making some sense here. Let me know if there are easy ways to detect Master
> PG's replication not working (via libpq would be more preferable).

You can detect that by checking whether information about synchronous
standby is still in pg_stat_replication or not. But I have no good idea about
the way to automatically run some action like reload of the configuration file
on the failure detection. Maybe you need to implement that on your own...

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [ADMIN] PG synchronous replication and unresponsive slave

От

Tatsuo Ishii

Дата:

18 января 2012 г., 01:14:00

> I am aware of pgpool-II and its features. Just that my requirements
> are little different.  I have a System (PG runs on it) which already
> has Failover mechanism to another System and I want PG to be part of
> this cluster and not clustered on its own. Mean, PG has to be running
> in Master system and in synchronous replication mode with another
> slave system, but the failover is driven from the higher level and not
> just on PG's failure.
>
> So, whenever PG's slave node is unresponsive, we better let the
> replication cutoff and run the master system independently. So, we
> need better mechanism to detect when Master PG's synchronous
> replication not working as expected or when the slave PG is going
> unresponsive.  If not, master PG is held back by the slave PG and so
> the whole clustered system is stuck. Hope, I am making some sense
> here. Let me know if there are easy ways to detect Master PG's
> replication not working (via libpq would be more preferable).

I'm not sure I fully understand your requirement but...

From pgpool-II 3.1, it has a switch not to trigger failover and you
can use it for avoiding automatic failover of master node.  For
detecting replication not working case, you can use replication delay
feature of pgpool-II. It monitors replication delay between master and
standby: if the delay is greater than a threshold, it stopps to send
read query to the standby. In case of standby failure (server down
etc.)  you can use automatic failover as usual.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> thanks,
> Manoj
>
>
> On 01/17/2012 05:04 PM, Fujii Masao wrote:
>> On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy
>> <manoj@nimblestorage.com>  wrote:
>>> (2) We are not comfortable moving to PGPool just for automatic
>>> failback mode
>>> on hot-standby failure.
>> Hmm.. my reply might be misleading. What I meant was to use pgpool-II
>> as a clusterware for PostgreSQL built-in replication, not as a
>> replication
>> itself. You can health-check, do failover if necessary and manage the
>> PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an
>> operation mode. But you are still not comfortable in using pgpool-II
>> in
>> that way?
>>
>> Regards,
>>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: PG synchronous replication and unresponsive slave