Re: Confused about how to enable backups (e.g. Write Ahead Log).

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Confused about how to enable backups (e.g. Write Ahead Log).
Дата
Msg-id 20200221162056.GG3195@tamriel.snowman.net
обсуждение исходный текст
Ответ на Confused about how to enable backups (e.g. Write Ahead Log).  (mimble9@danwin1210.me)
Список pgsql-novice
Greetings,

* mimble9@danwin1210.me (mimble9@danwin1210.me) wrote:
> I have been attempting, without success, to create a Write Ahead Log (WAL)
> backup using PostgreSQL 10.0 under 18.04 Ubuntu VPS.

There isn't really any such thing as a "WAL" backup.  The WAL is where
PG writes before it writes to the actual heap, but you don't typically
keep WAL since the start of the database and so you need to have a copy
of the heap files to start from and then you use WAL to replay up to a
specific point in time.

> It might be that I am doing the wrong thing and my method will never work.

The method you've outlined below won't work, no.

> In which case: what is the best way to achieve my 'Goal' (below)?

There's a few options.

> Goal:
>
> I want to backup a database every 15 minutes. Ideally, I want to backup
> only if the database has changed (perhaps this isn't possible and backups
> happen whether or not the database has changed)?

This is what's known as an 'incremental backup' and there's a few tools
out there which support that- but I suggest you first consider if your
goal is actually to backup every 15 minutes, or if your goal is to be
able to minimize data loss to, at most, 15 minutes, or if your goal is
to recover to any specific point in time.

The reason this matters is that PG supports Point-in-time-Recovery
(PITR), by using file-level backups combined with the write ahead log.
This means that you can restore a backup of PG and then play forward to
*any* point in time- 9:00, 9:15, 9:17, 10:20, whatever.  Taking more
frequent incremental backups will mean that there is less WAL to have to
go through to get to a specific point in time, but unless you have a
very write-heavy system, it won't take long to play through an hour of
WAL, so you might consider taking backups once an hour, or even less
frequently.  Reasonably common setups include daily incremental backups
and then regular differential and/or full backups.

> Attempts:
>
> I edited /etc/postgresql/10/main/postgresql.conf as follows:
>
> wal_level = replica
> archive_mode = on
> archive_command = 'cp %p /test/%f'
> archive_timeout = 900

I strongly recommend against trying to write your own backup software
for PG.  In particular, 'cp' isn't really safe to use as an
archive_command as it doesn't 'fsync' the WAL file after copying it,
meaning you might be missing WAL if the system crashes.

When using archive_command, hopefully with a proper tool, it can make
sense to have an archive_timeout set, which will cause PG to kick out a
WAL segment after that long, if it hasn't already, and there's been
changes since the last WAL segment.  This avoids the risk that a WAL
segment sits with data in it for a long and then the system crashes in
an unrecoverable way and those changes are then lost.  Note that this
primarily makes sense when you're actually archiving the WAL to a
different system.  PG, as part of each commit, will fsync the data to
the WAL and so as long as the filesystem recovers, no data will be lost.

> This should - in my opinion - every 15 minutes backup (cp) the database
> (referenced by %p) to the /test/ directory using the filename (referenced
> by %f) of the database.

No, that's not how archive_command works at all.  The archive_command
simply copies the *WAL* file (each of which are 16MB by default) to
another location (ideally, a remote one, where the file is fsync'd,
prior to returning to PG).  The database itself is *not* copied.

> I then:
>
> /etc/init.d/postgresql stop
> /etc/init.d/postgresql start
>
> And checked that PostgreSQL is running with /etc/init.d/postgresql status:
>
> postgresql.service - PostgreSQL RDBMS
>    Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor
> preset: enabled)
>    Active: active (exited) since Fri 2020-02-21 01:23:59 UTC; 9h ago
>   Process: 60534 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
>  Main PID: 60534 (code=exited, status=0/SUCCESS)
>
> Feb 21 01:23:58 server systemd[1]: postgresql.service: Failed to reset
> devices.list: Operation not permitted
> Feb 21 01:23:58 server systemd[1]: Starting PostgreSQL RDBMS...
> Feb 21 01:23:59 server systemd[1]: Started PostgreSQL RDBMS.
>
> However, the /test/ directory did not show any backup files

You would just end up with WAL files in that directory, per the
discussion above regarding the purpose of archive_command.

> I then read that I needed to use pg_basebackup to make an initial backup
> (before editing postgresql.conf) so I tried pg_basebackup -D /etc/backup.
> This created a duplicate of /var/lib/postgresql/10/main/ in /etc/backup
> but I'm not sure how this relates to or effects (if it does) the Write
> Ahead Log.

pg_basebackup is one tool to create a backup, but it doesn't (currently,
at least) support incremental backups, so each backup is a full backup
of the system.  The WAL can be used to take that backup up to a specific
point in time.

> I still don't have any backups in /test/ (or anywhere).

Based on what you wrote above, the contents of '/etc/backup' should
actually be a backup of PG, with the necessary WAL to bring it up
consistently.  Note that it's absolutely essential that all of the WAL
generated during a backup be kept, otherwise the backup is not valid.  I
strongly recommend using a tool which ensures that all WAL from a backup
is verified to have been stored properly.

Thanks,

Stephen

Вложения

В списке pgsql-novice по дате отправления:

Предыдущее
От: Stefan Fercot
Дата:
Сообщение: Re: Confused about how to enable backups (e.g. Write Ahead Log).
Следующее
От: mimble9@danwin1210.me
Дата:
Сообщение: Re: Confused about how to enable backups (e.g. Write Ahead Log).