RE: [EXTERNAL] Re: Postgres PAF setup

Поиск
Список
Период
Сортировка
От Andrew Edenburn
Тема RE: [EXTERNAL] Re: Postgres PAF setup
Дата
Msg-id bca8e5426570474b85d60f681b100ac7@gm.com
обсуждение исходный текст
Ответ на Re: Postgres PAF setup  ("Jehan-Guillaume (ioguix) de Rorthais" <ioguix@free.fr>)
Список pgsql-general
Sorry for the delay.  Here is my corosync.log file.
I have tried making the changes that you requested but still no good.  I know when I configured the cluster to use
pgsqldinstead of pgsqlms  I could at least get the cluster to start. But it was starting the cluster as a master on
bothnodes. 

Thanks for your help...

Andrew A Edenburn
General Motors
Hyperscale Computing & Core Engineering
Mobile Phone: +01-810-410-6008
30009 Van Dyke Ave
Warren, MI. 48090-9026
Cube: 2w05-21
mailto:andrew.edenburn@gm.com
Web Connect SoftPhone 586-986-4864


-----Original Message-----
From: Jehan-Guillaume (ioguix) de Rorthais [mailto:ioguix@free.fr]
Sent: Tuesday, April 24, 2018 11:09 AM
To: Andrew Edenburn <andrew.edenburn@gm.com>
Cc: pgsql-general@postgresql.org; users@clusterlabs.org
Subject: [EXTERNAL] Re: Postgres PAF setup

On Mon, 23 Apr 2018 18:09:43 +0000
Andrew Edenburn <andrew.edenburn@gm.com> wrote:

> I am having issues with my PAF setup.  I am new to Postgres and have
> setup the cluster as seen below. I am getting this error when trying
> to start my cluster resources.
> [...]
>
> cleanup and clear is not fixing any issues and I am not seeing
> anything in the logs.  Any help would be greatly appreciated.

This lack a lot of information.

According to the PAF ressource agent, your instances are in an "unexpected state" on both nodes while PAF was actually
tryingto stop it. 

Pacemaker might decide to stop a ressource if the start operation fails.
Stopping it when the start failed give some chances to the resource agent to stop the resource gracefully if still
possible.

I suspect you have some setup mistake on both nodes, maybe the exact same one...

You should probably provide your full logs from pacemaker/corosync with timing information so we can check all the
messagescoming from PAF from the very beginning of the startup attempt. 


>         have-watchdog=false \

you should probably consider to setup watchdog in your cluster.

>         stonith-enabled=false \

This is really bad. Your cluster will NOT work as expected. PAF **requires** Stonith to be enabled and to properly
working.Without it, soon or later, you will experience some unexpected reaction from the cluster (freezing all actions,
etc).

>         no-quorum-policy=ignore \

You should not ignore quorum, even in a two node cluster. See "two_node"
parameter in the manual of corosync.conf.

>         migration-threshold=1 \
> rsc_defaults rsc_defaults-options: \
>         migration-threshold=5 \

The later is the supported way to set migration-threshold. Your "migration-threshold=1" should not be a cluster
propertybut a default ressource option. 

> My pcs Config
> Corosync Nodes:
> dcmilphlum223 dcmilphlum224
> Pacemaker Nodes:
> dcmilphlum223 dcmilphlum224
>
> Resources:
> Master: pgsql-ha
>   Meta Attrs: notify=true target-role=Stopped

This target-role might have been set by the cluster because it can not fence nodes (which might be easier to deal with
inyour situation btw). That means the cluster will keep this resource down because of previous errors. 

> recovery_template=/pgsql/data/pg7000/recovery.conf.pcmk

You should probably not put your recovery.conf.pcmk in your PGDATA. Both files are different between each nodes. As you
mightwant to rebuild the standby or old master after some failures, you would have to correct it each time. Keep it
outsideof the PGDATA to avoid this useless step. 

> dcmilphlum224: pgsqld-data-status=LATEST

I suppose this comes from the "pgsql" resource agent, definitely not from PAF...

Regards,


Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary
isincluded in this message. 

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain
confidentialand/or privileged material. Any review, transmission, dissemination or other use, or taking of any action
inreliance upon this message by persons or entities other than the intended recipient is prohibited and may be
unlawful.If you received this message in error, please contact the sender and delete it from your computer. 

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: legrand legrand
Дата:
Сообщение: Re: pg_stat_statements : how to catch non successfully finishedstatements ?
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Locks analysis after-the-fact