Обсуждение: PostgresSQL 9.5 and systemd autorestart but without replication.

Поиск

Список

Период

Сортировка

PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

13 октября 2016 г., 10:00:15

Hi!

This morning I noticed that my 2 PG virtual servers had made an automatic

restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.

Why don't replication follow up!

Do I have to make a new pg_basebackup from primarily to replica?

OS: RHEL 7.2

TIA

Poul

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

"Hunley, Douglas"

Дата:

13 октября 2016 г., 14:30:15

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:

This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",

"phone" : "+1 732 339 3419 x163"

}

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

13 октября 2016 г., 15:45:27

Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.

No errors in the pg_log. I have not earlier experienced this situation.

The feature of autorestart is really nice.

BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

14 октября 2016 г., 11:42:46

Hi !

Hopefully this is a situation in virtuel environments and not on physical

servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Scott Mead

Дата:

14 октября 2016 г., 12:53:47

On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:

Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Scott Mead
Sr. Architect
OpenSCG

http://openscg.com

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

17 октября 2016 г., 18:35:36

RHEL 7.2, which made the postgresql-9.5.service restart on both virtual servers because of some network problem I am almost sure.

The problem now is to make the replication server understand that he is not the master now until next failover. If I can not do this by reestablishing

recovery.conf and then restarting the postgresql.-9.5.service then I am running out of ideas. I have to make sure to be able to handle the failover's and the reestablishment of the "replica" or standby server again when going in production on the physical servers.

Thanks

Poul

2016-10-14 14:53 GMT+02:00 Scott Mead <scottm@openscg.com>:

On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
--
Scott Mead
Sr. Architect
OpenSCG
http://openscg.com

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

24 октября 2016 г., 09:59:11

I am now running PostgreSQL 9.6 master/replica:

Master - Wal sender process OK!

Standby: Started the server is succesfully but is recovering for several days and the standby database is not accessible!

Wal receiver streaming is running and is in sync master.

select * from pg_locks p1 left join pg_stat_activity psa on p1.pid=psa.pid;

-[ RECORD 9 ]------+----------------------------------------------------------------------------

locktype | relation

database | 0

relation | 2671

page |

tuple |

virtualxid |

transactionid |

classid |

objid |

objsubid |

virtualtransaction | 3/86715

pid | 29001

mode | AccessShareLock

granted | t

fastpath | f

datid | 13322

datname | postgres

pid | 29001

usesysid | 10

usename | postgres

application_name | psql

client_addr |

client_hostname |

client_port | -1

backend_start | 2016-10-24 11:41:20.291612+02

xact_start | 2016-10-24 11:53:57.815105+02

query_start | 2016-10-24 11:53:57.815105+02

state_change | 2016-10-24 11:53:57.815111+02

wait_event_type |

wait_event |

state | active

backend_xid |

backend_xmin | 2312

I can not figure it out!

OS:rhel 7.2

Thanks

Poul

2016-10-17 20:35 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

RHEL 7.2, which made the postgresql-9.5.service restart on both virtual servers because of some network problem I am almost sure.

The problem now is to make the replication server understand that he is not the master now until next failover. If I can not do this by reestablishing
recovery.conf and then restarting the postgresql.-9.5.service then I am running out of ideas. I have to make sure to be able to handle the failover's and the reestablishment of the "replica" or standby server again when going in production on the physical servers.

Thanks
Poul

2016-10-14 14:53 GMT+02:00 Scott Mead <scottm@openscg.com>:
On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
--
Scott Mead
Sr. Architect
OpenSCG
http://openscg.com

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

24 октября 2016 г., 12:26:38

This is output from strace -p 1449(pid):

The strace keeps repeating this :

read(7, 0x7fff0dc197f7, 1) = -1 EAGAIN (Resource temporarily unavailable)

epoll_create(3) = 36

epoll_ctl(36, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051624, u64=18051624}}) = 0

epoll_ctl(36, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051648, u64=18051648}}) = 0

epoll_wait(36, {}, 1, 5000) = 0

close(36) = 0

Does anyone have an idea off want is going on.

TIA.

Poul

2016-10-24 11:59 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

I am now running PostgreSQL 9.6 master/replica:

Master - Wal sender process OK!

Standby: Started the server is succesfully but is recovering for several days and the standby database is not accessible!
Wal receiver streaming is running and is in sync master.

select * from pg_locks p1 left join pg_stat_activity psa on p1.pid=psa.pid;
-[ RECORD 9 ]------+----------------------------------------------------------------------------
locktype | relation
database | 0
relation | 2671
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/86715
pid | 29001
mode | AccessShareLock
granted | t
fastpath | f
datid | 13322
datname | postgres
pid | 29001
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2016-10-24 11:41:20.291612+02
xact_start | 2016-10-24 11:53:57.815105+02
query_start | 2016-10-24 11:53:57.815105+02
state_change | 2016-10-24 11:53:57.815111+02
wait_event_type |
wait_event |
state | active
backend_xid |
backend_xmin | 2312

I can not figure it out!

OS:rhel 7.2

Thanks

Poul

2016-10-17 20:35 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
RHEL 7.2, which made the postgresql-9.5.service restart on both virtual servers because of some network problem I am almost sure.

The problem now is to make the replication server understand that he is not the master now until next failover. If I can not do this by reestablishing
recovery.conf and then restarting the postgresql.-9.5.service then I am running out of ideas. I have to make sure to be able to handle the failover's and the reestablishment of the "replica" or standby server again when going in production on the physical servers.

Thanks
Poul

2016-10-14 14:53 GMT+02:00 Scott Mead <scottm@openscg.com>:
On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
--
Scott Mead
Sr. Architect
OpenSCG
http://openscg.com

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

24 октября 2016 г., 13:15:48

Pid 1449 is postgres: startup process recovering 00000001000000010000002C

strace -p 1449 -e write

gives

Process 1449 attached

SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=1488, si_uid=26} ---

write(12, "\0", 1)

Does this mean that write is denied on the master?

TIA

Poul

2016-10-24 14:26 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

This is output from strace -p 1449(pid):

The strace keeps repeating this :

read(7, 0x7fff0dc197f7, 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create(3) = 36
epoll_ctl(36, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051624, u64=18051624}}) = 0
epoll_ctl(36, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051648, u64=18051648}}) = 0
epoll_wait(36, {}, 1, 5000) = 0
close(36) = 0

Does anyone have an idea off want is going on.

TIA.

Poul

2016-10-24 11:59 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

I am now running PostgreSQL 9.6 master/replica:

Master - Wal sender process OK!

Standby: Started the server is succesfully but is recovering for several days and the standby database is not accessible!
Wal receiver streaming is running and is in sync master.

select * from pg_locks p1 left join pg_stat_activity psa on p1.pid=psa.pid;
-[ RECORD 9 ]------+----------------------------------------------------------------------------
locktype | relation
database | 0
relation | 2671
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/86715
pid | 29001
mode | AccessShareLock
granted | t
fastpath | f
datid | 13322
datname | postgres
pid | 29001
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2016-10-24 11:41:20.291612+02
xact_start | 2016-10-24 11:53:57.815105+02
query_start | 2016-10-24 11:53:57.815105+02
state_change | 2016-10-24 11:53:57.815111+02
wait_event_type |
wait_event |
state | active
backend_xid |
backend_xmin | 2312

I can not figure it out!

OS:rhel 7.2

Thanks

Poul

2016-10-17 20:35 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
RHEL 7.2, which made the postgresql-9.5.service restart on both virtual servers because of some network problem I am almost sure.

The problem now is to make the replication server understand that he is not the master now until next failover. If I can not do this by reestablishing
recovery.conf and then restarting the postgresql.-9.5.service then I am running out of ideas. I have to make sure to be able to handle the failover's and the reestablishment of the "replica" or standby server again when going in production on the physical servers.

Thanks
Poul

2016-10-14 14:53 GMT+02:00 Scott Mead <scottm@openscg.com>:
On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
--
Scott Mead
Sr. Architect
OpenSCG
http://openscg.com

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

26 октября 2016 г., 09:57:52

Replica(hot_standby) server:

Is it normal behavior that PostgreSQL 9.6 does a very very slow

recovery restart point at 1/77472E5 in the pg_log

while doing this

postgres: startup process recovering 000000010000000100000077(this process has run several days)

I expect the database to finish startup when recovering is done.

Is this correct?

At the same time streaming works well

postgres: wal receiver process streaming 1/774731D0

This is the autogenerated recovery.conf except of cause xxxxxxxx and 999 when

used pg_basebackup:

standby_mode = 'on'

primary_conninfo = 'user=xxxxxx host=999.999.999.99 port=5432 sslmode=prefer sslcompression=1 krbsrvname=xxxxxxx'

Thanks!

Poul

2016-10-24 15:14 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

Pid 1449 is postgres: startup process recovering 00000001000000010000002C

strace -p 1449 -e write

gives

Process 1449 attached

SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=1488, si_uid=26} ---
write(12, "\0", 1)

Does this mean that write is denied on the master?

TIA

Poul

2016-10-24 14:26 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
This is output from strace -p 1449(pid):

The strace keeps repeating this :

read(7, 0x7fff0dc197f7, 1) = -1 EAGAIN (Resource temporarily unavailable)
epoll_create(3) = 36
epoll_ctl(36, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051624, u64=18051624}}) = 0
epoll_ctl(36, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=18051648, u64=18051648}}) = 0
epoll_wait(36, {}, 1, 5000) = 0
close(36) = 0

Does anyone have an idea off want is going on.

TIA.

Poul

2016-10-24 11:59 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:

I am now running PostgreSQL 9.6 master/replica:

Master - Wal sender process OK!

Standby: Started the server is succesfully but is recovering for several days and the standby database is not accessible!
Wal receiver streaming is running and is in sync master.

select * from pg_locks p1 left join pg_stat_activity psa on p1.pid=psa.pid;
-[ RECORD 9 ]------+----------------------------------------------------------------------------
locktype | relation
database | 0
relation | 2671
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/86715
pid | 29001
mode | AccessShareLock
granted | t
fastpath | f
datid | 13322
datname | postgres
pid | 29001
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2016-10-24 11:41:20.291612+02
xact_start | 2016-10-24 11:53:57.815105+02
query_start | 2016-10-24 11:53:57.815105+02
state_change | 2016-10-24 11:53:57.815111+02
wait_event_type |
wait_event |
state | active
backend_xid |
backend_xmin | 2312

I can not figure it out!

OS:rhel 7.2

Thanks

Poul

2016-10-17 20:35 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
RHEL 7.2, which made the postgresql-9.5.service restart on both virtual servers because of some network problem I am almost sure.

The problem now is to make the replication server understand that he is not the master now until next failover. If I can not do this by reestablishing
recovery.conf and then restarting the postgresql.-9.5.service then I am running out of ideas. I have to make sure to be able to handle the failover's and the reestablishment of the "replica" or standby server again when going in production on the physical servers.

Thanks
Poul

2016-10-14 14:53 GMT+02:00 Scott Mead <scottm@openscg.com>:
On Fri, Oct 14, 2016 at 7:42 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
Hi !

Hopefully this is a situation in virtuel environments and not on physical
servers with storage on SAN.as replication have to follow up in a autorestart without a manual new basebackup.

What OS are you running? recovery.conf gets renamed to 'recovery.done' only when a promotion (failover) occurs. This can happen a number of ways (pg_ctl promote, trigger_file, etc..). Typically, these commands are not executed automatically for you, you would have had to either install a tool that does auto failover, or the startup scripts in your OS did something (highly doubtful).

--Scott

Does anyone have experience with replication on physical servers(slaves)?

Thanks

Poul

2016-10-13 17:45 GMT+02:00 Poul Kristensen <bcc5226@gmail.com>:
Thank you for fast respons.

No. The recovery.conf has now replace a recovery.done file on the slave.
No errors in the pg_log. I have not earlier experienced this situation.
The feature of autorestart is really nice.
BTW: I do not quite understand why the recovery.conf are replaced with a recovery.done file.

Thanks

/Poul

2016-10-13 16:30 GMT+02:00 Hunley, Douglas <douglas.hunley@openscg.com>:

On Thu, Oct 13, 2016 at 6:00 AM, Poul Kristensen <bcc5226@gmail.com> wrote:
This morning I noticed that my 2 PG virtual servers had made an automatic
restart at exactly the same time 00:00. The 2 servers is running and accepting connections but no WALL sender and no WALL receicver is running.
Why don't replication follow up!
Do I have to make a new pg_basebackup from primarily to replica?
OS: RHEL 7.2

Is the recovery.conf file still present and valid in $PGDATA on the slave? Any errors in the slave's pg_log directory concerning the replication?

--
{
"name" : "douglas j hunley",
"email" : "douglas.hunley@openscg.com",
"title" : "database engineer",
"phone" : "+1 732 339 3419 x163"
}

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
--
Scott Mead
Sr. Architect
OpenSCG
http://openscg.com

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

--
Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Tom Lane

Дата:

26 октября 2016 г., 13:10:56

Poul Kristensen <bcc5226@gmail.com> writes:
> Replica(hot_standby) server:
> Is it normal behavior that PostgreSQL 9.6 does  a very very slow
> recovery restart point at 1/77472E5 in the pg_log
> while doing this
>  postgres: startup process   recovering 000000010000000100000077(this
> process has run several days)

If that's the standby server, that process will persist as long as it's in
standby; it's what's responsible for applying WAL received by the
walreceiver process.  You should have been able to connect long before
now, though.

            regards, tom lane

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

26 октября 2016 г., 13:36:51

It IS the standby server:

Result of pg_isready:

/usr/pgsql-9.6/bin/pg_isready -h /tmp -p 5432

/tmp:5432 - rejecting connections

In the pg_log:

2016-10-26 13:30:16.298 UTC > FATAL: the database system is starting up

Thanks

Poul

2016-10-26 15:10 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Poul Kristensen <bcc5226@gmail.com> writes:
> Replica(hot_standby) server:
> Is it normal behavior that PostgreSQL 9.6 does a very very slow
> recovery restart point at 1/77472E5 in the pg_log
> while doing this
> postgres: startup process recovering 000000010000000100000077(this
> process has run several days)

If that's the standby server, that process will persist as long as it's in
standby; it's what's responsible for applying WAL received by the
walreceiver process. You should have been able to connect long before
now, though.

regards, tom lane

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Tom Lane

Дата:

26 октября 2016 г., 13:39:29

Poul Kristensen <bcc5226@gmail.com> writes:
> It  IS the standby server:
> Result of pg_isready:

> /usr/pgsql-9.6/bin/pg_isready -h /tmp -p 5432
> /tmp:5432 - rejecting connections

Hm, you do have hot_standby turned on in its postgresql.conf?

            regards, tom lane

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

26 октября 2016 г., 15:11:30

No, it is turned off.

Nothing in the "standby section" in postgresql.conf is turned on.

Thanks.

Poul

2016-10-26 15:39 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Poul Kristensen <bcc5226@gmail.com> writes:
> It IS the standby server:
> Result of pg_isready:

> /usr/pgsql-9.6/bin/pg_isready -h /tmp -p 5432
> /tmp:5432 - rejecting connections

Hm, you do have hot_standby turned on in its postgresql.conf?

regards, tom lane

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Tom Lane

Дата:

26 октября 2016 г., 15:49:56

Poul Kristensen <bcc5226@gmail.com> writes:
> No, it is turned off.
> Nothing in the "standby section" in postgresql.conf is turned on.

Then it's behaving as expected for a cold standby server, I think.
It's going to stay in this state until told to promote.

            regards, tom lane

Re: PostgresSQL 9.5 and systemd autorestart but without replication.

От

Poul Kristensen

Дата:

26 октября 2016 г., 21:23:43

Yes.

And a library as the one on the master have to established on the standby server in order to continue WAL archives on

standby.

As far as I remember I experienced the same situation using the "hot_standby" option in the "standby server" section(together with at riggerfile).

There is no solution that makes automatic switching between replication nodes ensuring no dataloss even not a glusterfs just by itself.

A third replicated server has to be added. PostgreSQL - in my opion - ought be able to meet network interruptions as they

seem to quite common making nodes switch several times each day. As one of you kind people said: To be sure against dataloss

use a cluster(at least 2 servers) plus a replication server.

Thanks a lot for helping.

Poul

2016-10-26 17:49 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Poul Kristensen <bcc5226@gmail.com> writes:
> No, it is turned off.
> Nothing in the "standby section" in postgresql.conf is turned on.

Then it's behaving as expected for a cold standby server, I think.
It's going to stay in this state until told to promote.

regards, tom lane

Med venlig hilsen / Best regards
Poul Kristensen
Linux-OS/Virtualizationexpert and Oracle DBA

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: PostgresSQL 9.5 and systemd autorestart but without replication.