Обсуждение: Postgres 9.1 - getting a continous archiving database to accept connections taking too long

Поиск
Список
Период
Сортировка

Postgres 9.1 - getting a continous archiving database to accept connections taking too long

От
James Sebastian
Дата:
I am very new to postgresql and have a continous archiving server to recover the database in case of a hardware failure or pg crash. Following http://www.postgresql.org/docs/9.1/static/continuous-archiving.html, I copy the WAL archives to another machine and apply them regularly. Recently due to pg server crash at my main server I tried to restore the database by pushing a trigger (an empty file, expecting smart failover) and the logs are given below.

prodrestore_error.log
======================

WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
Trigger file:         /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CC
WAL file path:        /backup/prod/db_backup/archive/0000000100000001000000CC
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       60 seconds
Max wait interval:    0 forever
Command for restore:  cp "/backup/prod/db_backup/archive/0000000100000001000000CC" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore:      OK

Trigger file:         /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000002.history
WAL file path:        /backup/prod/db_backup/archive/00000002.history
Restoring to:         pg_xlog/RECOVERYHISTORY
Sleep interval:       60 seconds
Max wait interval:    0 forever
Command for restore:  cp "/backup/prod/db_backup/archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore:      cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000002.history': No such file or directory
not restored
history file not found
Trigger file:         /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 00000001.history
WAL file path:        /backup/prod/db_backup/archive/00000001.history
Restoring to:         pg_xlog/RECOVERYHISTORY
Sleep interval:       60 seconds
Max wait interval:    0 forever
Command for restore:  cp "/backup/prod/db_backup/archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history: 000000000000000000000000 and later
running restore:      cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
cp: cannot stat `/backup/prod/db_backup/archive/00000001.history': No such file or directory
not restored
history file not found

postgresql-9.1-main.log
========================

2015-08-03 08:08:44 IST::@:[5542]:LOG:  restored log file "0000000100000001000000CC" from archive
2015-08-03 08:11:44 IST::@:[5542]:LOG:  could not open file "pg_xlog/0000000100000001000000CD" (log file 1, segment 205): No such file or directory
2015-08-03 08:11:44 IST::@:[5542]:LOG:  redo done at 1/CC001E10
2015-08-03 08:11:44 IST::@:[5542]:LOG:  last completed transaction was at log time 2015-08-03 07:59:58.121908+05:30
2015-08-03 08:11:44 IST::@:[5542]:LOG:  restored log file "0000000100000001000000CC" from archive
2015-08-03 08:17:44 IST::@:[5542]:LOG:  selected new timeline ID: 2
2015-08-03 08:23:45 IST::@:[5542]:LOG:  archive recovery complete
2015-08-03 08:23:51 IST::@:[6088]:LOG:  autovacuum launcher started
2015-08-03 08:23:51 IST::@:[5541]:LOG:  database system is ready to accept connections


The whole process of looking for two history files 00000002.history and 00000001.history, 4 times and finally start the database took approximately 12 minutes.

1. How I can make this as fast to look for history files and if not present proceed quickly to start database in read/write mode ?  or not to look for history files at all.

2. After some reading, I tested by keeping keyword 'fast' in the trigger file.

I could see below in logs.

"WAL file not present yet. Checking for trigger file...
trigger file found: fast failover
Trigger file:         /app/recovery-prod/trigger/pgsql.trigger.5432
Waiting for WAL file: 0000000100000001000000CF
WAL file path:        /backup/prod/db_backup/archive/0000000100000001000000CF
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       60 seconds
Max wait interval:    0 forever
Command for restore:  cp "/backup/prod/db_backup/archive/0000000100000001000000CF" "pg_xlog/RECOVERYXLOG"
Keep archive history: 000000000000000000000000 and later
trigger file found: smart failover
running restore:      OK

and the restore took same time to check the history files.
In this case, I needed almost 14-15 minutes as it looked for 00000003.history aprt from 00000002.history and  00000001.history, even though I restarted the whole process by replacing the base backup afresh.

Am I doing something wrong here or is this was a fast failover expected to work?


3. Each iteration, I did to see the behaviour by stopping the current postgres server, deleting the previous base backup and restoring afresh added search for one more  non existing history file like 00000004.history and 00000005.history, further slowing the process.

As I restore from the base backup each time, then push the trigger file after applying all WAL archives after the base backup, I would like to either ignore looking for history files altogether or control how many history files are tried.  Also like to speed up starting the database in read write mode, so that I can point my application to this database and continue the business.  What can I do for that ?

Thanks for all the help.
Regards,
James

Re: Postgres 9.1 - getting a continous archiving database to accept connections taking too long

От
James Sebastian
Дата:


Sleep interval:       60 seconds
Max wait interval:    0 forever



The whole process of looking for two history files 00000002.history and 00000001.history, 4 times and finally start the database took approximately 12 minutes.

1. How I can make this as fast to look for history files and if not present proceed quickly to start database in read/write mode ?  or not to look for history files at all.

I could get an answer in another forum explaining how sleep time of 60 seconds, ensures almost 6 minutes for waiting for WAL files and history files.

3. Each iteration, I did to see the behaviour by stopping the current postgres server, deleting the previous base backup and restoring afresh added search for one more  non existing history file like 00000004.history and 00000005.history, further slowing the process.

As I restore from the base backup each time, then push the trigger file after applying all WAL archives after the base backup, I would like to either ignore looking for history files altogether or control how many history files are tried.  A
I am still trying to figure this out on how to tell the restore process to look for only 1 or max 2 history files.

Thanks
James