Обсуждение: Point in Time recovery on PostgreSQL (10.3.1)

Поиск
Список
Период
Сортировка

Point in Time recovery on PostgreSQL (10.3.1)

От
amit tripathi
Дата:
Hi All,

We need some help with the point in recovery test . We have followed the below steps but after creating recovery.conf file and restarting the machine our DB is getting corrupted and not getting started.
Please look at the below steps and help us analyze what is going wrong there. We have used offline postgresql installer to set up database. Required configuration like wal_level=hot_stanby, archive_mode=on, and archive_command='cp %p /mnt/server/archive/%f' in postgresql.conf are also set properly.
 
1. After DB set up, We have created a tablcespace, db and mapped db with the tablespace. Also created some tables to generate transaction files in pg_wal and archive directory (which is mnt/server/archive

2. SELECT pg_start_backup('TestPITR');

3. Taken backup(tar) of Postgresql data directory (opt/PostgreSQL/10/ data)>>tar zcf backup20180810.tar data/

4. SELECT pg_stop_backup();

5. Created some more tables and noted down their timestamp (for PITR, which will be added in recovery.conf file as recovery_target_time )

6. Stopped the DB>>service postgresql-10 stop

7. Created a new directory >>mkdir pgbackup

8. Moved tar file into the pgbackup directory >>mv backup20180810.tar /opt/PostgreSQL/10/pgbackup

9. Copied archive files into the pgbackup directory>>  cp -r /mnt/server/archive/ /opt/PostgreSQL/10/pgbackup/

10. Renamed the old data directory( opt/PostgreSQL/10/ data )to bad.data >>> mv data bad.data (now opt/PostgreSQL/10/ bad.data)

11. untar the  back taken in step 2>>tar -xvf backup20180810.tar

12. Moved the data directory to old path >>>mv /opt/PostgreSQL/10/pgbackup/data/ /opt/PostgreSQL/10/ (Now i have opt/PostgreSQL/10/ data-derived from tar file and with old transaction logs in pg_wal) 

13  Copied the updated pg_wal logs from bad.data to data >>>cp -r /opt/PostgreSQL/10/bad.data/pg_wal/0* /opt/PostgreSQL/10/data/pg_wal

14. Started the database  >> service postgresql-10 start (Our db works fine)

15. Created a recovery.conf file inside data folder >>opt/PostgreSQL/10/ data --given the permission and user as postgres 

Our recovery.conf file has two following parameters. recovery_target_time was taken from step-5

restore_command = 'cp /opt/PostgreSQL/10/pgbackup/archive/%f %p'
recovery_target_time = '2018-08-10 02:56:31'

16. Restarted the db server>> service postgresql-10 restart  

It doesn't restart instead we are getting below error(yesterday's log) on linux host. Also there is no log entry in log folder(opt/PostgreSQL/10/data/log) looks like it has stopped creating logs after restart.

journalctl -xe
Aug 09 06:47:53 scspr0512196001 systemd[1]: Starting PostgreSQL 10 database s
-- Subject: Unit postgresql-10.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit postgresql-10.service has begun starting up.
Aug 09 06:47:53 scspr0512196001 pg_ctl[17697]: waiting for server to start...
Aug 09 06:47:53 scspr0512196001 pg_ctl[17697]: pg_ctl: could not start server
Aug 09 06:47:53 scspr0512196001 pg_ctl[17697]: Examine the log output.
Aug 09 06:47:53 scspr0512196001 systemd[1]: postgresql-10.service: control pr
Aug 09 06:47:53 scspr0512196001 systemd[1]: Failed to start PostgreSQL 10 dat
-- Subject: Unit postgresql-10.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit postgresql-10.service has failed.
--
-- The result is failed.
Aug 09 06:47:53 scspr0512196001 systemd[1]: Unit postgresql-10.service entere
Aug 09 06:47:53 scspr0512196001 systemd[1]: postgresql-10.service failed.
Aug 09 06:47:53 scspr0512196001 polkitd[740]: Unregistered Authentication Age




Re: Point in Time recovery on PostgreSQL (10.3.1)

От
Stephen Frost
Дата:
Greetings,

* amit tripathi (amittripathi1445@gmail.com) wrote:
> We need some help with the point in recovery test . We have followed the
> below steps but after creating recovery.conf file and restarting the
> machine our DB is getting corrupted and not getting started.
> Please look at the below steps and help us analyze what is going wrong
> there. We have used offline postgresql installer to set up database.
> Required configuration like *wal_level=hot_stanby, archive_mode=on, and
> archive_command='cp %p /mnt/server/archive/%f*' in *postgresql.conf* are
> also set properly.

I'd strongly recommend that you consider using a real PostgreSQL backup
tool for your backups and not try to write your own.  As a simple point,
using 'cp' is not sufficient for a proper archive command.

> 1. After DB set up, We have created a tablcespace, db and mapped db with
> the tablespace. Also created some tables to generate transaction files in
> pg_wal and archive directory (which is *mnt/server/archive*)
>
> 2. SELECT pg_start_backup('TestPITR');
>
> 3. Taken backup(tar) of Postgresql data directory (opt/PostgreSQL/10/
> data)>>tar
> zcf backup20180810.tar data/
>
> 4. SELECT pg_stop_backup();

Did you read the documentation where we explicitly point out that the
non-exclusive backup method is recommended over the exclusive method?
There's good reasons for that:

https://www.postgresql.org/docs/current/static/continuous-archiving.html

> 5. Created some more tables and noted down their timestamp (for PITR, which
> will be added in recovery.conf file as *recovery_target_time* )
>
> 6. Stopped the DB>>service postgresql-10 stop
>
> 7. Created a new directory >>mkdir pgbackup
>
> 8. Moved tar file into the pgbackup directory >>mv backup20180810.tar
> /opt/PostgreSQL/10/pgbackup
>
> 9. Copied archive files into the pgbackup directory>>  cp -r /
> *mnt/server/archive*/ /opt/PostgreSQL/10/pgbackup/
>
> 10. Renamed the old data directory( opt/PostgreSQL/10/ data )to bad.data
> >>> mv data bad.data (now opt/PostgreSQL/10/ bad.data)
>
> 11. untar the  back taken in step 2>>tar -xvf backup20180810.tar
>
> 12. Moved the data directory to old path >>>mv
> /opt/PostgreSQL/10/pgbackup/data/ /opt/PostgreSQL/10/ (Now i have
> opt/PostgreSQL/10/ data-derived from tar file and with old transaction logs
> in pg_wal)

Didn't you mention a tablespace..?  Did you handle backing up the
tablespace as part of this process and then restoring it?  It doesn't
look like it from what you've said above, unless perhaps I missed it
(this is pretty difficult to follow...).

> 13  Copied the updated pg_wal logs from bad.data to data >>>cp -r
> /opt/PostgreSQL/10/bad.data/pg_wal/0* /opt/PostgreSQL/10/data/pg_wal

Why would you perform this step..?  The WAL should have been archived
through archive_command and should be restored using restore_command
specified through recovery.conf.

> 14. Started the database  >> service postgresql-10 start (Our db works fine)

What do you mean you started the database and it "works fine"?

> 15. Created a recovery.conf file inside data folder >>opt/PostgreSQL/10/ data
> --given the permission and user as postgres

Wait, you started the database before creating a recovery.conf file?
That's definitely not going to do what you want- PG will end up trying
to do crash recovery instead of restoring from a backup.

Is there a backup_label file?

> Our recovery.conf file has two following parameters. *recovery_target_time
> was taken from step-5*
>
> *restore_command = 'cp /opt/PostgreSQL/10/pgbackup/archive/%f
> %p'recovery_target_time = '2018-08-10 02:56:31'*
>
> 16. Restarted the db server>> service postgresql-10 restart
>
> It doesn't restart instead we are getting below error(yesterday's log) on
> linux host. Also there is no log entry in log
> folder(opt/PostgreSQL/10/data/log) looks like it has stopped creating logs
> after restart.
>
>
> * journalctl -xeAug 09 06:47:53 scspr0512196001 systemd[1]: Starting
> PostgreSQL 10 database s-- Subject: Unit postgresql-10.service has begun
> start-up-- Defined-By: systemd--
> Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
<https://www.google.com/url?q=http://lists.freedesktop.org/mailman/listinfo/systemd-devel&sa=D&source=hangouts&ust=1533964684221000&usg=AFQjCNFJDTSITdO_SpA_w3Y8ZNn7zwrDqQ>----
> Unit postgresql-10.service has begun starting up.Aug 09 06:47:53
> scspr0512196001 pg_ctl[17697]: waiting for server to start...Aug 09
> 06:47:53 scspr0512196001 pg_ctl[17697]: pg_ctl: could not start serverAug
> 09 06:47:53 scspr0512196001 pg_ctl[17697]: Examine the log output.Aug 09
> 06:47:53 scspr0512196001 systemd[1]: postgresql-10.service: control prAug
> 09 06:47:53 scspr0512196001 systemd[1]: Failed to start PostgreSQL 10 dat--
> Subject: Unit postgresql-10.service has failed-- Defined-By: systemd--
> Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
<https://www.google.com/url?q=http://lists.freedesktop.org/mailman/listinfo/systemd-devel&sa=D&source=hangouts&ust=1533964684221000&usg=AFQjCNFJDTSITdO_SpA_w3Y8ZNn7zwrDqQ>----
> Unit postgresql-10.service has failed.---- The result is failed.Aug 09
> 06:47:53 scspr0512196001 systemd[1]: Unit postgresql-10.service entereAug
> 09 06:47:53 scspr0512196001 systemd[1]: postgresql-10.service failed.Aug 09
> 06:47:53 scspr0512196001 polkitd[740]: Unregistered Authentication Age *

As mentioned in the above log message, you need to look at the
PostgreSQL logs to see what's actually happened.  These logs aren't
telling us anything useful.

However, what you're seeing here is that it's hard to write a backup
system using the low-level PostgreSQL base backup API, which is why
there are good tools written that handle all of that for you.  I'd
strongly recommend that you look at using one of those tools for your
backups instead of trying to roll your own.  My favorite is pgBackRest
(though I'm a bit biased by that), but there's a few different options
out there (barman, WAL-E are two others).

Thanks!

Stephen

Вложения