Обсуждение: pg_stop_backup running for 10h?

Поиск
Список
Период
Сортировка

pg_stop_backup running for 10h?

От
François Beausoleil
Дата:
Hi all,

I'm using OmniPITR to build a new slave. According to pg_stat_activity, pg_stop_backup has been running for nearly 11 hours. The WAL archive command is running just fine and reporting "Segment X successfully sent to all destinations".

I had the same issue almost a year ago (http://www.postgresql.org/message-id/9CC57302-10F8-4678-BBD3-028EC6B57051@teksol.info), but don't have permission issues this time around.

What could cause a pg_stop_backup() to run for such a long time?

Should I stop, upgrade omnipitr and start again?

Thanks!
François

---

PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
omnipitr-backup-master ver. 1.1.0

# postgresql.conf

# ...
archive_command = '/usr/local/omnipitr/bin/omnipitr-archive --log /var/log/omnipitr/archive-^Y-^m-^d.log --state-dir /var/lib/postgresql/omnipitr-state --dst-remote gzip=root@kalanchoe.ca.seevibes.com:/var/backups/seevibes/wal/dbanalytics.production/ --temp /var/tmp/omnipitr --rsync-path=/usr/local/bin/wal-syncer --verbose "%p" --dst-backup /var/tmp/seevibes/pg.backup'

omnipitr-backup-master was launched like this:

# sudo -u postgres /usr/local/lib/omnipitr/bin/omnipitr-backup-master -D /var/lib/postgresql/9.1/main --dst-direct gzip=root@kalanchoe.ca.seevibes.com:/var/backups/seevibes/dbanalytics.production --log /var/log/omnipitr/backup-master.log --verbose --xlogs /var/tmp/seevibes/pg.backup --ssh-path /usr/local/bin/wal-sender

wal-syncer and wal-sender are simple wrappers over rsync and ssh, respectively, which use the -i option to choose the correct private key file.

# ls -ld /var/tmp/seevibes/pg.backup*
lrwxrwxrwx 1 postgres postgres   45 Jan 10 21:35 /var/tmp/seevibes/pg.backup -> /var/tmp/seevibes/pg.backup.real/main/pg_xlog
drwxrwxr-x 3 postgres postgres 4096 Jan 10 21:35 /var/tmp/seevibes/pg.backup.real

omnipitr-archive's log has many entries like this (and 10 hours ago, it was the same thing):

2014-01-11 22:20:53.142844 +0000 : 32091 : omnipitr-archive : LOG : Called with parameters: --log /var/log/omnipitr/archive-^Y-^m-^d.log --state-dir /var/lib/postgresql/omnipitr-state --dst-remote gzip=root@kalanchoe.ca.seevibes.com:/var/backups/seevibes/wal/dbanalytics.production/ --temp /var/tmp/omnipitr --rsync-path=/usr/local/bin/wal-syncer --verbose pg_xlog/0000000700002476000000A0 --dst-backup /var/tmp/seevibes/pg.backup
2014-01-11 22:20:53.706888 +0000 : 32091 : omnipitr-archive : LOG : Timer [Compressing with gzip] took: 0.315s
2014-01-11 22:20:54.149710 +0000 : 32091 : omnipitr-archive : LOG : Sending ./pg_xlog/0000000700002476000000A0 to /var/tmp/seevibes/pg.backup/0000000700002476000000A0 ended in 0.158532s
2014-01-11 22:20:56.765873 +0000 : 32091 : omnipitr-archive : LOG : Sending /var/tmp/omnipitr/omnipitr-archive/32091/0000000700002476000000A0/0000000700002476000000A0.gz to root@kalanchoe.ca.seevibes.com:/var/backups/seevibes/wal/dbanalytics.production/0000000700002476000000A0.gz ended in 2.485214s
2014-01-11 22:20:57.009521 +0000 : 32091 : omnipitr-archive : LOG : Timer [Segment delivery] took: 3.019s
2014-01-11 22:20:57.105550 +0000 : 32091 : omnipitr-archive : LOG : Segment ./pg_xlog/0000000700002476000000A0 successfully sent to all destinations.

omnipitr-backup-master's log has this:

2014-01-10 21:53:01.534477 +0000 : 22722 : omnipitr-backup-master : LOG : Timer [SELECT pg_start_backup('omnipitr')] took: 1080.943s
2014-01-10 21:53:01.752511 +0000 : 22722 : omnipitr-backup-master : LOG : pg_start_backup('omnipitr') returned 2484/5AFCBB48.
2014-01-10 21:53:01.895868 +0000 : 22722 : omnipitr-backup-master : LOG : Actual command to make tarballs: exec nice tar cf \- \-\-exclude\=pg_log\/\* \-\-exclude\=pg_xlog\/0\* \-\-exclude\=pg_xlog\/archive_status\/\* \-\-exclude\=postmaster\.pid main 2> \/tmp\/omnipitr\-backup\-master\/22722\/tar\.stderr | nice gzip \-\-stdout \- | \/usr\/local\/bin\/wal\-sender root\@kalanchoe\.ca\.seevibes\.com cat\ \-\ \>\ \\\/var\\\/backups\\\/seevibes\\\/dbanalytics\\\.production\\\/skyler\\\.ca\\\.seevibes\\\.com\\\-data\\\-2014\\\-01\\\-10\\\.tar\\\.gz
2014-01-11 11:36:15.305926 +0000 : 22722 : omnipitr-backup-master : LOG : tar stderr:
2014-01-11 11:36:16.161887 +0000 : 22722 : omnipitr-backup-master : LOG : ==============================================
2014-01-11 11:36:16.199998 +0000 : 22722 : omnipitr-backup-master : LOG : tar: main/base/18332/t4_26246478_fsm: File removed before we read it
2014-01-11 11:36:16.199998 +0000 : 22722 : omnipitr-backup-master : LOG : tar: main/base/18332/t23_26246745: File removed before we read it
2014-01-11 11:36:16.199998 +0000 : 22722 : omnipitr-backup-master : LOG : tar: main/base/18332/t20_26246658: File removed before we read it
...
2014-01-11 11:36:16.199998 +0000 : 22722 : omnipitr-backup-master : LOG : tar: main/base/18332/25968250.1: file changed as we read it
2014-01-11 11:36:16.199998 +0000 : 22722 : omnipitr-backup-master : LOG : tar: main/base/18332: file changed as we read it
2014-01-11 11:36:16.325208 +0000 : 22722 : omnipitr-backup-master : LOG : ==============================================
2014-01-11 11:36:16.423534 +0000 : 22722 : omnipitr-backup-master : LOG : Timer [Compressing $PGDATA] took: 49394.562s

Вложения

Re: pg_stop_backup running for 10h?

От
Magnus Hagander
Дата:

On Sat, Jan 11, 2014 at 11:26 PM, François Beausoleil <francois@teksol.info> wrote:
Hi all,

I'm using OmniPITR to build a new slave. According to pg_stat_activity, pg_stop_backup has been running for nearly 11 hours. The WAL archive command is running just fine and reporting "Segment X successfully sent to all destinations".

I had the same issue almost a year ago (http://www.postgresql.org/message-id/9CC57302-10F8-4678-BBD3-028EC6B57051@teksol.info), but don't have permission issues this time around.

What could cause a pg_stop_backup() to run for such a long time?

Can't speak for the OmniPITR specific parts, but typically the archive_command reacting strangely would cause pg_stop_backup() to wait.

You include the logs from omniptr, but do you get anything in the *postgresql* logs? If it's the archive command it should clearly tell you that. It should also tell you if you can safely cancel the pg_stop_backup() command.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_stop_backup running for 10h?

От
François Beausoleil
Дата:

Le 2014-01-12 à 05:38, Magnus Hagander a écrit :


On Sat, Jan 11, 2014 at 11:26 PM, François Beausoleil <francois@teksol.info> wrote:
Hi all,

I'm using OmniPITR to build a new slave. According to pg_stat_activity, pg_stop_backup has been running for nearly 11 hours. The WAL archive command is running just fine and reporting "Segment X successfully sent to all destinations".

I had the same issue almost a year ago (http://www.postgresql.org/message-id/9CC57302-10F8-4678-BBD3-028EC6B57051@teksol.info), but don't have permission issues this time around.

What could cause a pg_stop_backup() to run for such a long time?

Can't speak for the OmniPITR specific parts, but typically the archive_command reacting strangely would cause pg_stop_backup() to wait.

You include the logs from omniptr, but do you get anything in the *postgresql* logs? If it's the archive command it should clearly tell you that. It should also tell you if you can safely cancel the pg_stop_backup() command.

Oh well... pg_stop_backup() eventually finished by itself:

2014-01-12 05:57:06.174590 +0000 : 22722 : omnipitr-backup-master : LOG : Timer [SELECT pg_stop_backup()] took: 66049.684s
2014-01-12 05:57:06.296063 +0000 : 22722 : omnipitr-backup-master : LOG : pg_stop_backup('omnipitr') returned 2489/10004E78.
2014-01-12 05:57:06.409591 +0000 : 22722 : omnipitr-backup-master : LOG : Timer [Making data archive] took: 116525.818s

or 18 hours. That's long...

Postgres' logs themselves didn't have any mention of problems with the archive_command, so that's ruled out as well.

Thanks for taking the time to respond!
François

Вложения