Обсуждение: pg_basebackup problem...

Поиск
Список
Период
Сортировка

pg_basebackup problem...

От
John Scalia
Дата:
Hi all,

We've had a problem surface with the current pg_basebackup procedure we were following. Now, I want to explain that this pre-dates me and I recommended some additional flags to use when running it, specifically I told the group to include "-X f -c fast". Here is the old version they were invoking:

pg_basebackup -P -Ft -Z9 -D /var/lib/pgsql/9.2/backups/<date_stamped_backup_filename>

The problem we're having, is the tarball these generated will not successfully restore. That is, the tarball looks OK, but the resultant database cannot be started as the pg_xlog directory is empty and postgres says it can't locate the checkpoint segment to sanely initialize.

This really doesn't make sense to me. Shouldn't pg_basebackup capture the contents of the pg_xlog directory without needing to use a "-X f" flag? Otherwise, why could pg_basebackup create a "backup" that really is not usable? I'm really just trying to determine why pg_xlog is empty , if it really shouldn't be.
--
Jay

Re: pg_basebackup problem...

От
John Scalia
Дата:
OK, the plot thickens. I've been examining the source code to pg_basebackup written by Magnus Hagander. It definitely shows that if you do not issue a "-X f" or "-X s" then the pg_xlogs will NOT be included in the backup. This leads me to a couple more questions:

1) Does anyone know why anything in a WAL segment is required for a database to start? I ask this as the backup_label generated by pg_basebackup indicates a segment where the latest checkpoint is contained. Surely, there has to be some better way to permit the DB to sanely initialize. (Note, I haven't yet examined the sources as to how postgres starts.)

2) If a WAL segment is in fact required for the backed up DB to start, why would pg_basebackup not include those by default? To not do so, doesn't create a backup file, just in this case, a tarball that's worthless. After all, an non-restorable backup, isn't a backup at all.

On Wed, Aug 5, 2015 at 5:00 PM, John Scalia <jayknowsunix@gmail.com> wrote:
Hi all,

We've had a problem surface with the current pg_basebackup procedure we were following. Now, I want to explain that this pre-dates me and I recommended some additional flags to use when running it, specifically I told the group to include "-X f -c fast". Here is the old version they were invoking:

pg_basebackup -P -Ft -Z9 -D /var/lib/pgsql/9.2/backups/<date_stamped_backup_filename>

The problem we're having, is the tarball these generated will not successfully restore. That is, the tarball looks OK, but the resultant database cannot be started as the pg_xlog directory is empty and postgres says it can't locate the checkpoint segment to sanely initialize.

This really doesn't make sense to me. Shouldn't pg_basebackup capture the contents of the pg_xlog directory without needing to use a "-X f" flag? Otherwise, why could pg_basebackup create a "backup" that really is not usable? I'm really just trying to determine why pg_xlog is empty , if it really shouldn't be.
--
Jay

Re: pg_basebackup problem...

От
Kevin Grittner
Дата:
John Scalia <jayknowsunix@gmail.com> wrote:

> I've been examining the source code to pg_basebackup written by
> Magnus Hagander. It definitely shows that if you do not issue a
> "-X f" or "-X s" then the pg_xlogs will NOT be included in the
> backup. This leads me to a couple more questions:
>
> 1) Does anyone know why anything in a WAL segment is required for
> a database to start?

Yes.  This is covered in the docs, but in summary: the files could
be modified during the copy, and without WAL replay from before the
copy started the database would not necessarily be consistent.

> I ask this as the backup_label generated by pg_basebackup
> indicates a segment where the latest checkpoint is contained.

It indicates the location of the last checkpoint before the start
of the backup -- the latest point from which you cab start WAL
replay to achieve a consistent database.

> Surely, there has to be some better way to permit the DB to
> sanely initialize.

A write-ahead log is a well-known and widely used technique for
this.  What would you suggest instead?

> 2) If a WAL segment is in fact required for the backed up DB to
> start, why would pg_basebackup not include those by default? To
> not do so, doesn't create a backup file, just in this case, a
> tarball that's worthless.

... unless you are archiving the WAL to somewhere that it will be
kept long enough to be usable for such purposes.  If you are (and I
highly recommend that you do so), including WAL in the base backup
is a waste of both bandwidth and storage space.

Besides reviewing the documentation's discussions of WAL and
backups, you might want to read this:

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_basebackup problem...

От
John Scalia
Дата:
Thanks, Kevin,

I'm still considering a better way to sanely initialize a DB, but I acknowledge that a WAL replay is one way to do this. The problem I've been trying to solve is this: The old procedure took the tarball pg_basebackup generated, expanded it, then rsync'd it into the data directory after they removed everything in that directory including recursively removing the contents of pg_xlog. Hence, without including the -X f flag, there wouldn't be any WAL files available for initializing the DB. They old operation folks caused this problem, and now they want me to fix it, ie., get a database started from one of these backups where no WAL segments are present. I keep telling them this is impossible, but they're arguing it's been done here before and worked. I've been pointing out the impossibility and stating that there had to be some WAL files remaining in the pg_xlog directory for the restore to succeed.

On Thu, Aug 6, 2015 at 11:49 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
John Scalia <jayknowsunix@gmail.com> wrote:

> I've been examining the source code to pg_basebackup written by
> Magnus Hagander. It definitely shows that if you do not issue a
> "-X f" or "-X s" then the pg_xlogs will NOT be included in the
> backup. This leads me to a couple more questions:
>
> 1) Does anyone know why anything in a WAL segment is required for
> a database to start?

Yes.  This is covered in the docs, but in summary: the files could
be modified during the copy, and without WAL replay from before the
copy started the database would not necessarily be consistent.

> I ask this as the backup_label generated by pg_basebackup
> indicates a segment where the latest checkpoint is contained.

It indicates the location of the last checkpoint before the start
of the backup -- the latest point from which you cab start WAL
replay to achieve a consistent database.

> Surely, there has to be some better way to permit the DB to
> sanely initialize.

A write-ahead log is a well-known and widely used technique for
this.  What would you suggest instead?

> 2) If a WAL segment is in fact required for the backed up DB to
> start, why would pg_basebackup not include those by default? To
> not do so, doesn't create a backup file, just in this case, a
> tarball that's worthless.

... unless you are archiving the WAL to somewhere that it will be
kept long enough to be usable for such purposes.  If you are (and I
highly recommend that you do so), including WAL in the base backup
is a waste of both bandwidth and storage space.

Besides reviewing the documentation's discussions of WAL and
backups, you might want to read this:

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_basebackup problem...

От
Kevin Grittner
Дата:
John Scalia <jayknowsunix@gmail.com> wrote:

> The problem I've been trying to solve is this: The old procedure
> took the tarball pg_basebackup generated, expanded it, then
> rsync'd it into the data directory after they removed everything
> in that directory including recursively removing the contents of
> pg_xlog. Hence, without including the -X f flag, there wouldn't
> be any WAL files available for initializing the DB. They old
> operation folks caused this problem, and now they want me to fix
> it, ie., get a database started from one of these backups where
> no WAL segments are present. I keep telling them this is
> impossible, but they're arguing it's been done here before and
> worked. I've been pointing out the impossibility and stating that
> there had to be some WAL files remaining in the pg_xlog directory
> for the restore to succeed.

The only way to get the base backup started without the WAL records
that were generated during the backup is with pg_resetxlog, and
that provides no guarantees that there is no corruption.  See:

http://www.postgresql.org/docs/current/static/app-pgresetxlog.html

... which says, in part:

| pg_resetxlog clears the write-ahead log (WAL) and optionally
| resets some other control information stored in the pg_control
| file. This function is sometimes needed if these files have
| become corrupted. It should be used only as a last resort, when
| the server will not start due to such corruption.
|
| After running this command, it should be possible to start the
| server, but bear in mind that the database might contain
| inconsistent data due to partially-committed transactions. You
| should immediately dump your data, run initdb, and reload. After
| reload, check for inconsistencies and repair as needed.

Absence of WAL files in pg_xlog, or copies made while changes were
being logged to them (rather than taken from a stopped cluster or
through archiving, streaming replication, or pg_basebackup options)
is a form of database corruption.

There are many ways to get the WAL records; for example maybe they
previously restored before they were removed from the primary
system and they had the recovery.conf file configured to retrieve
them from there.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_basebackup problem...

От
John Scalia
Дата:
Oh, I had forgotten about pgresetxlog, but do recall that is a method of last resort.

On Thu, Aug 6, 2015 at 1:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
John Scalia <jayknowsunix@gmail.com> wrote:

> The problem I've been trying to solve is this: The old procedure
> took the tarball pg_basebackup generated, expanded it, then
> rsync'd it into the data directory after they removed everything
> in that directory including recursively removing the contents of
> pg_xlog. Hence, without including the -X f flag, there wouldn't
> be any WAL files available for initializing the DB. They old
> operation folks caused this problem, and now they want me to fix
> it, ie., get a database started from one of these backups where
> no WAL segments are present. I keep telling them this is
> impossible, but they're arguing it's been done here before and
> worked. I've been pointing out the impossibility and stating that
> there had to be some WAL files remaining in the pg_xlog directory
> for the restore to succeed.

The only way to get the base backup started without the WAL records
that were generated during the backup is with pg_resetxlog, and
that provides no guarantees that there is no corruption.  See:

http://www.postgresql.org/docs/current/static/app-pgresetxlog.html

... which says, in part:

| pg_resetxlog clears the write-ahead log (WAL) and optionally
| resets some other control information stored in the pg_control
| file. This function is sometimes needed if these files have
| become corrupted. It should be used only as a last resort, when
| the server will not start due to such corruption.
|
| After running this command, it should be possible to start the
| server, but bear in mind that the database might contain
| inconsistent data due to partially-committed transactions. You
| should immediately dump your data, run initdb, and reload. After
| reload, check for inconsistencies and repair as needed.

Absence of WAL files in pg_xlog, or copies made while changes were
being logged to them (rather than taken from a stopped cluster or
through archiving, streaming replication, or pg_basebackup options)
is a form of database corruption.

There are many ways to get the WAL records; for example maybe they
previously restored before they were removed from the primary
system and they had the recovery.conf file configured to retrieve
them from there.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_basebackup problem...

От
Peter Eisentraut
Дата:
On 8/6/15 12:49 PM, Kevin Grittner wrote:
>> 2) If a WAL segment is in fact required for the backed up DB to
>> > start, why would pg_basebackup not include those by default? To
>> > not do so, doesn't create a backup file, just in this case, a
>> > tarball that's worthless.
> ... unless you are archiving the WAL to somewhere that it will be
> kept long enough to be usable for such purposes.  If you are (and I
> highly recommend that you do so), including WAL in the base backup
> is a waste of both bandwidth and storage space.

This is arguably an artifact of the evolution of replication in
PostgreSQL.  You used to do tar backup + archiving, then you could
switch to pg_basebackup + archiving, and nowadays you could switch to
pg_basebackup without archiving, but the default behavior of
pg_basebackup still caters to the old case.



Re: pg_basebackup problem...

От
John Scalia
Дата:
Well, all is good now, I followed the instructions for pg_resetxlog and one of the most recent backups generated without fetching the WAL files in pg_xlog, did successfully start up after the procedure.

Everybody affected here is now in a far happier mood as a result. Thanks to everyone on this list, and especially to Kevin for reminding of pg_resetxlog.
--
Jay

On Thu, Aug 6, 2015 at 3:15 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 8/6/15 12:49 PM, Kevin Grittner wrote:
>> 2) If a WAL segment is in fact required for the backed up DB to
>> > start, why would pg_basebackup not include those by default? To
>> > not do so, doesn't create a backup file, just in this case, a
>> > tarball that's worthless.
> ... unless you are archiving the WAL to somewhere that it will be
> kept long enough to be usable for such purposes.  If you are (and I
> highly recommend that you do so), including WAL in the base backup
> is a waste of both bandwidth and storage space.

This is arguably an artifact of the evolution of replication in
PostgreSQL.  You used to do tar backup + archiving, then you could
switch to pg_basebackup + archiving, and nowadays you could switch to
pg_basebackup without archiving, but the default behavior of
pg_basebackup still caters to the old case.