Re: pg_basebackup: return value 1: reason?

Поиск
Список
Период
Сортировка
От Andrej Vanek
Тема Re: pg_basebackup: return value 1: reason?
Дата
Msg-id CAFNFRyHhkqug34cG3dpZwSdb+oZs==bM3f1dM1CkRG_D=w9Xzg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_basebackup: return value 1: reason?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: pg_basebackup: return value 1: reason?  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
Hello,

my setup is:
1 master
1 synchronous slave (up and running)
2 asynchronous slave (up and running)
Now I'm setting up asynchronous slave: first step of this setup is pg_basebackup connecting to master.

Indirections are because it is encapsulated in a script. This script is aimed to serve for automated replication recovery. This script is launched by crm_mon -d daemon.

You are right to check without indirections: now I tried to run the command directly from command line without variables- it works fine...

This means: no problem in pg_basebackup itself

but some problem in environment of process launched from crm_mon daemon.
Thanks for your hint (remove indirections). Pg_basebackup works fine when launched from command-line..
Unfortenutely I have still no clue how to solve the issue- su works. I have to find out why pg_basebackup cannot fork when launched from crm_mon.
 
Best Regards, Andrej

2016-04-16 1:17 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 04/15/2016 03:28 PM, Andrej Vanek wrote:
Hello,

I tried to run pg_basebackup. Return value is 1.

How to find out its reason?
(I suspect that some wal after backup is missing- but how to find
out the real reason? How to fix it?)

First it is not clear to me where you are taking the backup from, the master or the standby?

Second there is a lot of redirection going on. What happens if you run the pg_basebackup directly (without doing  su - postgres ...) and use hardcoded values instead of shell variables?



thanks, Andrej
--------------details:
environment: CentOS 6.7, postgres 9.5.1
( PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit)

I tried 2 forms of pg_basebackup (-X fetch and -X stream). Both were
issued from a script:
# su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP}
-D ${GEO_STDBY_DATA} -U pgreplic -P -v -X fetch" 2>${LOG_FILE}.stderr
  >> ${LOG_FILE}
# echo $?
1             <--------------pg_basebackup failed!
# cat log.stderr
# cat /var/log/cluster/geo_repair.log.err
transaction log start point: 0/E3000028 on timeline 1
WARNING:  skipping special file "./pg_hba.conf"
WARNING:  skipping special file "./pg_hba.conf.save"
transaction log end point: 0/E30000F8
pg_basebackup: base backup completed            <------------------no
reason for pg_basebackup failure!
# cp /tmp/pg_hba.conf /tmp/postgresql.conf /pg_data/
# su - postgres -c "/usr/pgsql-9.5/bin/pg_ctl -D /pg_data/ start"
# tail /pg_data/pg_log/postgresql-Fri.log
`pg_xlog/0000000100000000000000E2' ->
`../backups/arc/0000000100000000000000E2'
2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING:  skipping
special file "./pg_hba.conf"
2016-04-15 23:15:10 CEST:pgreplic@[unknown]:[10667] WARNING:  skipping
special file "./pg_hba.conf.save"         <---------------recorded in
pg_log on master node and copied by pg_basebackup (note time difference
between two servers)
2016-04-15 23:15:02 CEST:@:[23321] LOG:  database system was
interrupted; last known up at 2016-04-15 23:15:10 CEST
2016-04-15 23:15:02 CEST:postgres@postgres:[23329] FATAL:  the database
system is starting up
2016-04-15 23:15:03 CEST:@:[23321] LOG:  entering standby mode
2016-04-15 23:15:03 CEST:@:[23321] LOG:  database system was not
properly shut down; automatic recovery in progress <---------something
missing from pg_basebackup
2016-04-15 23:15:03 CEST:@:[23321] LOG:  redo starts at 0/E3000028
2016-04-15 23:15:03 CEST:@:[23321] LOG:  consistent recovery state
reached at 0/E4000000
2016-04-15 23:15:03 CEST:@:[23295] LOG:  database system is ready to
accept read only connections
2016-04-15 23:15:03 CEST:@:[23356] LOG:  started streaming WAL from
primary at 0/E4000000 on timeline 1
-------second trial
# su - postgres -c "/usr/pgsql-9.5/bin/pg_basebackup -h ${DB_MASTER_IP}
-D ${GEO_STDBY_DATA} -U pgreplic -P -v -X stream"
# echo $?
1
#  cat /var/log/cluster/geo_repair.log.err
transaction log start point: 0/E5000028 on timeline 1
pg_basebackup: starting background WAL receiver
WARNING:  skipping special file "./pg_hba.conf"
WARNING:  skipping special file "./pg_hba.conf.save"
transaction log end point: 0/E50000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: could not wait for child process: No child processes
    <----what does this mean? I think it failed to start process to
fetching wal logs created during backup: but neither on master node
neither on pg_basebackup output here is any information about reason..
(max_wal_senders on master is 10: I see no reason to fail).

postgres logs:
`pg_xlog/0000000100000000000000E4' ->
`../backups/arc/0000000100000000000000E4'
2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING:  skipping
special file "./pg_hba.conf"
2016-04-15 23:35:09 CEST:pgreplic@[unknown]:[29035] WARNING:  skipping
special file "./pg_hba.conf.save"
2016-04-15 23:35:01 CEST:@:[28926] LOG:  database system was
interrupted; last known up at 2016-04-15 23:35:09 CEST
2016-04-15 23:35:01 CEST:postgres@postgres:[28938] FATAL:  the database
system is starting up
2016-04-15 23:35:02 CEST:@:[28926] LOG:  entering standby mode
2016-04-15 23:35:02 CEST:@:[28926] LOG:  database system was not
properly shut down; automatic recovery in progress  <------------this
means something missing from pg_basebackup
2016-04-15 23:35:02 CEST:@:[28926] LOG:  redo starts at 0/E5000028
2016-04-15 23:35:02 CEST:@:[28926] LOG:  consistent recovery state
reached at 0/E6000000
2016-04-15 23:35:02 CEST:@:[28904] LOG:  database system is ready to
accept read only connections
2016-04-15 23:35:02 CEST:@:[28989] LOG:  started streaming WAL from
primary at 0/E6000000 on timeline 1

postgres params on master node:
log_line_prefix = '%t:%u@%d:[%p] '
logging_collector = on
wal_buffers = 16MB
max_wal_size = 200MB
log_temp_files = 1MB
max_connections = 170
shared_buffers = 512MB
effective_cache_size = 1500MB
work_mem = 48MB
log_lock_waits = on
log_min_duration_statement = 10000
shared_preload_libraries = 'pg_stat_statements'
include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA
wal_level = hot_standby
archive_mode = on
max_wal_senders = 10
hot_standby = on
wal_keep_segments = 128
archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f'
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
restart_after_crash = off
hot_standby_feedback = on



--
Adrian Klaver
adrian.klaver@aklaver.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Sergei Agalakov
Дата:
Сообщение: Enhancement request for pg_dump
Следующее
От: Andrej Vanek
Дата:
Сообщение: Re: pg_basebackup: return value 1: reason?