Обсуждение: PG synchronous replication and unresponsive slave
Hi, I have a PG 9.1.2 Master <--> Slave with synchronous replication setup. They are all working fine as expected. I do have a case where I want to flip Master to non replication mode whenever its slave is not responding. I have set replication_timeout to 5s and whenever salve is not responding for for more than 5s, i see the master detecting it. But, the transactions on the master is stuck till the slave comes back. To get over it, I reloaded the config on master with synchronous_commit = local. Further transactions on the master are going thru fine with this local commits turned on. Here are my questions: 1. Transaction which was stuck right when slave going away never went thru even after I reloaded master's config with local commit on. I do see all new transactions on master are going thru fine, except the one which was stuck initially. How to get this stuck transaction complete or return with error. 2. Whenever there is a problem with slave, I have to manually reload master's config with local commit turned on to get master go forward. Is there any automated way to reload this config with local commit on on slave's unresponsiveness ? tcp connection timeouts, replication timeouts all detect the failures, but i want to run some corrective action on these failure detection. -- thanks, Manoj
any help on this is much appreciated. thanks, Manoj On 01/11/2012 01:50 PM, Manoj Govindassamy wrote: > Hi, > > I have a PG 9.1.2 Master <--> Slave with synchronous replication > setup. They are all working fine as expected. I do have a case where I > want to flip Master to non replication mode whenever its slave is not > responding. I have set replication_timeout to 5s and whenever salve is > not responding for for more than 5s, i see the master detecting it. > But, the transactions on the master is stuck till the slave comes > back. To get over it, I reloaded the config on master with > synchronous_commit = local. Further transactions on the master are > going thru fine with this local commits turned on. > > Here are my questions: > > 1. Transaction which was stuck right when slave going away never went > thru even after I reloaded master's config with local commit on. I do > see all new transactions on master are going thru fine, except the one > which was stuck initially. How to get this stuck transaction complete > or return with error. > > 2. Whenever there is a problem with slave, I have to manually reload > master's config with local commit turned on to get master go forward. > Is there any automated way to reload this config with local commit on > on slave's unresponsiveness ? tcp connection timeouts, replication > timeouts all detect the failures, but i want to run some corrective > action on these failure detection. >
anyone with PG Synchronous Replication knowledge, please help me with your views on the below questions. thanks, Manoj On 01/12/2012 10:12 AM, Manoj Govindassamy wrote: > > any help on this is much appreciated. > > thanks, > Manoj > > > On 01/11/2012 01:50 PM, Manoj Govindassamy wrote: >> Hi, >> >> I have a PG 9.1.2 Master <--> Slave with synchronous replication >> setup. They are all working fine as expected. I do have a case where >> I want to flip Master to non replication mode whenever its slave is >> not responding. I have set replication_timeout to 5s and whenever >> salve is not responding for for more than 5s, i see the master >> detecting it. But, the transactions on the master is stuck till the >> slave comes back. To get over it, I reloaded the config on master >> with synchronous_commit = local. Further transactions on the master >> are going thru fine with this local commits turned on. >> >> Here are my questions: >> >> 1. Transaction which was stuck right when slave going away never went >> thru even after I reloaded master's config with local commit on. I do >> see all new transactions on master are going thru fine, except the >> one which was stuck initially. How to get this stuck transaction >> complete or return with error. >> >> 2. Whenever there is a problem with slave, I have to manually reload >> master's config with local commit turned on to get master go forward. >> Is there any automated way to reload this config with local commit on >> on slave's unresponsiveness ? tcp connection timeouts, replication >> timeouts all detect the failures, but i want to run some corrective >> action on these failure detection. >> > >
On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy <manoj@nimblestorage.com> wrote: >>> 1. Transaction which was stuck right when slave going away never went >>> thru even after I reloaded master's config with local commit on. I do see >>> all new transactions on master are going thru fine, except the one which was >>> stuck initially. How to get this stuck transaction complete or return with >>> error. Changing synchronous_commit doesn't affect such a transaction. Instead, empty synchronous_standby_names and reload the configuration file to resume that transaction. >>> 2. Whenever there is a problem with slave, I have to manually reload >>> master's config with local commit turned on to get master go forward. Is >>> there any automated way to reload this config with local commit on on >>> slave's unresponsiveness ? tcp connection timeouts, replication timeouts all >>> detect the failures, but i want to run some corrective action on these >>> failure detection. PostgreSQL doesn't have such a capability, but pgpool-II might have. Can you ask that in pgpool-II mailing-list? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Thanks for your views. (1) Will try out emptying synchronous_standby_names on replica failures and verify if the transactions proceeds thru. (2) We are not comfortable moving to PGPool just for automatic failback mode on hot-standby failure. Any suggestions on how to build this failback mechanism for master in PG9.1.2 ? We are using C interface for PG. Any kind of health checking that we can do on the master to detect the hot-standby problem and let master reload its config with empty synchronous_standby_names ? Any help is much appreciated. thanks, Manoj On 01/16/2012 07:44 PM, Fujii Masao wrote: > On Tue, Jan 17, 2012 at 3:51 AM, Manoj Govindassamy > <manoj@nimblestorage.com> wrote: >>>> 1. Transaction which was stuck right when slave going away never went >>>> thru even after I reloaded master's config with local commit on. I do see >>>> all new transactions on master are going thru fine, except the one which was >>>> stuck initially. How to get this stuck transaction complete or return with >>>> error. > Changing synchronous_commit doesn't affect such a transaction. Instead, > empty synchronous_standby_names and reload the configuration file to > resume that transaction. > >>>> 2. Whenever there is a problem with slave, I have to manually reload >>>> master's config with local commit turned on to get master go forward. Is >>>> there any automated way to reload this config with local commit on on >>>> slave's unresponsiveness ? tcp connection timeouts, replication timeouts all >>>> detect the failures, but i want to run some corrective action on these >>>> failure detection. > PostgreSQL doesn't have such a capability, but pgpool-II might have. > Can you ask that in pgpool-II mailing-list? > > Regards, >
On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy <manoj@nimblestorage.com> wrote: > (2) We are not comfortable moving to PGPool just for automatic failback mode > on hot-standby failure. Hmm.. my reply might be misleading. What I meant was to use pgpool-II as a clusterware for PostgreSQL built-in replication, not as a replication itself. You can health-check, do failover if necessary and manage the PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an operation mode. But you are still not comfortable in using pgpool-II in that way? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I am aware of pgpool-II and its features. Just that my requirements are little different. I have a System (PG runs on it) which already has Failover mechanism to another System and I want PG to be part of this cluster and not clustered on its own. Mean, PG has to be running in Master system and in synchronous replication mode with another slave system, but the failover is driven from the higher level and not just on PG's failure. So, whenever PG's slave node is unresponsive, we better let the replication cutoff and run the master system independently. So, we need better mechanism to detect when Master PG's synchronous replication not working as expected or when the slave PG is going unresponsive. If not, master PG is held back by the slave PG and so the whole clustered system is stuck. Hope, I am making some sense here. Let me know if there are easy ways to detect Master PG's replication not working (via libpq would be more preferable). thanks, Manoj On 01/17/2012 05:04 PM, Fujii Masao wrote: > On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy > <manoj@nimblestorage.com> wrote: >> (2) We are not comfortable moving to PGPool just for automatic failback mode >> on hot-standby failure. > Hmm.. my reply might be misleading. What I meant was to use pgpool-II > as a clusterware for PostgreSQL built-in replication, not as a replication > itself. You can health-check, do failover if necessary and manage the > PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an > operation mode. But you are still not comfortable in using pgpool-II in > that way? > > Regards, >
On Wed, Jan 18, 2012 at 10:54 AM, Manoj Govindassamy <manoj@nimblestorage.com> wrote: > I am aware of pgpool-II and its features. Just that my requirements are > little different. I have a System (PG runs on it) which already has > Failover mechanism to another System and I want PG to be part of this > cluster and not clustered on its own. Mean, PG has to be running in Master > system and in synchronous replication mode with another slave system, but > the failover is driven from the higher level and not just on PG's failure. > > So, whenever PG's slave node is unresponsive, we better let the replication > cutoff and run the master system independently. So, we need better mechanism > to detect when Master PG's synchronous replication not working as expected > or when the slave PG is going unresponsive. If not, master PG is held back > by the slave PG and so the whole clustered system is stuck. Hope, I am > making some sense here. Let me know if there are easy ways to detect Master > PG's replication not working (via libpq would be more preferable). You can detect that by checking whether information about synchronous standby is still in pg_stat_replication or not. But I have no good idea about the way to automatically run some action like reload of the configuration file on the failure detection. Maybe you need to implement that on your own... Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
> I am aware of pgpool-II and its features. Just that my requirements > are little different. I have a System (PG runs on it) which already > has Failover mechanism to another System and I want PG to be part of > this cluster and not clustered on its own. Mean, PG has to be running > in Master system and in synchronous replication mode with another > slave system, but the failover is driven from the higher level and not > just on PG's failure. > > So, whenever PG's slave node is unresponsive, we better let the > replication cutoff and run the master system independently. So, we > need better mechanism to detect when Master PG's synchronous > replication not working as expected or when the slave PG is going > unresponsive. If not, master PG is held back by the slave PG and so > the whole clustered system is stuck. Hope, I am making some sense > here. Let me know if there are easy ways to detect Master PG's > replication not working (via libpq would be more preferable). I'm not sure I fully understand your requirement but... From pgpool-II 3.1, it has a switch not to trigger failover and you can use it for avoiding automatic failover of master node. For detecting replication not working case, you can use replication delay feature of pgpool-II. It monitors replication delay between master and standby: if the delay is greater than a threshold, it stopps to send read query to the standby. In case of standby failure (server down etc.) you can use automatic failover as usual. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > thanks, > Manoj > > > On 01/17/2012 05:04 PM, Fujii Masao wrote: >> On Wed, Jan 18, 2012 at 6:37 AM, Manoj Govindassamy >> <manoj@nimblestorage.com> wrote: >>> (2) We are not comfortable moving to PGPool just for automatic >>> failback mode >>> on hot-standby failure. >> Hmm.. my reply might be misleading. What I meant was to use pgpool-II >> as a clusterware for PostgreSQL built-in replication, not as a >> replication >> itself. You can health-check, do failover if necessary and manage the >> PostgreSQL replication by using pgpool-II. AFAIK pgpool-II has such an >> operation mode. But you are still not comfortable in using pgpool-II >> in >> that way? >> >> Regards, >> > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general