Обсуждение: Warm Standby and resetting the primary as a standby

Поиск
Список
Период
Сортировка

Warm Standby and resetting the primary as a standby

От
Derrick Rice
Дата:
I've been reading up on the documentation for WAL shipping and warm standby configuration. One concern that I have (a common one, I'm sure) is that it seems that after bringing a standby server up as primary, other standby servers (including the original primary) need to be rebased before they can read the new primary's WALs in continuous recovery mode.

It seems that the cause of this is a change to the leading digit of the WAL files:

http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php
http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php

I was hoping that someone would shed some light on this situation with a technical explanation.  It's not clear to me why the WAL files are incompatible or why the digit increases. What does that first digit mean to postgresql?  Is it possible to have the restore_command ignore the leading digit?

I expected the WAL files to be compatible.  If I start two servers from the same "disk image" and then they get the same exact changes recorded in WAL, why should the next created WAL differ depending on which server creates it?  I imagine these two servers to have identical new versions of a "disk image" after consuming the exact same WALs (one generated them, the other read them).

I'm surprised that this question doesn't come up more often or that there's no explanation in the docs about why its necessary to rebase a primary that went down gracefully (e.g. for planned maintenance)

Thanks

Derrick

Re: Warm Standby and resetting the primary as a standby

От
Derrick Rice
Дата:
On Wed, Aug 18, 2010 at 9:48 AM, Derrick Rice <derrick.rice@gmail.com> wrote:
I've been reading up on the documentation for WAL shipping and warm standby configuration. One concern that I have (a common one, I'm sure) is that it seems that after bringing a standby server up as primary, other standby servers (including the original primary) need to be rebased before they can read the new primary's WALs in continuous recovery mode.

It seems that the cause of this is a change to the leading digit of the WAL files:

http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php
http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php

I was hoping that someone would shed some light on this situation with a technical explanation.  It's not clear to me why the WAL files are incompatible or why the digit increases. What does that first digit mean to postgresql?  Is it possible to have the restore_command ignore the leading digit?

I expected the WAL files to be compatible.  If I start two servers from the same "disk image" and then they get the same exact changes recorded in WAL, why should the next created WAL differ depending on which server creates it?  I imagine these two servers to have identical new versions of a "disk image" after consuming the exact same WALs (one generated them, the other read them).

I'm surprised that this question doesn't come up more often or that there's no explanation in the docs about why its necessary to rebase a primary that went down gracefully (e.g. for planned maintenance)

Thanks

Derrick

Considering the high level of activity on this list, I'm surprised not to have any discussion on this yet.  Please let me know if there is a better discussion area for this topic or if I can clarify / rephrase my question to make it more attractive.

Barring that, I guess I'll dig into the ultimate documentation: source.

Derrick

Re: Warm Standby and resetting the primary as a standby

От
Bruce Momjian
Дата:
Derrick Rice wrote:
> I've been reading up on the documentation for WAL shipping and warm standby
> configuration. One concern that I have (a common one, I'm sure) is that it
> seems that after bringing a standby server up as primary, other standby
> servers (including the original primary) need to be rebased before they can
> read the new primary's WALs in continuous recovery mode.
>
> It seems that the cause of this is a change to the leading digit of the WAL
> files:
>
> http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php
> http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php
>
> I was hoping that someone would shed some light on this situation with a
> technical explanation.  It's not clear to me why the WAL files are
> incompatible or why the digit increases. What does that first digit mean to
> postgresql?  Is it possible to have the restore_command ignore the leading
> digit?

The first digit in the WAL filename is the timeline.

I think we need to figure out a better way to promote slaves when there
is a new master, but no one has done the research yet.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Warm Standby and resetting the primary as a standby

От
Yaroslav Tykhiy
Дата:
On Sat, Aug 21, 2010 at 12:45:44PM -0400, Bruce Momjian wrote:
> Derrick Rice wrote:
> > I've been reading up on the documentation for WAL shipping and warm standby
> > configuration. One concern that I have (a common one, I'm sure) is that it
> > seems that after bringing a standby server up as primary, other standby
> > servers (including the original primary) need to be rebased before they can
> > read the new primary's WALs in continuous recovery mode.
> >
> > It seems that the cause of this is a change to the leading digit of the WAL
> > files:
> >
> > http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php
> > http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php
> >
> > I was hoping that someone would shed some light on this situation with a
> > technical explanation.  It's not clear to me why the WAL files are
> > incompatible or why the digit increases. What does that first digit mean to
> > postgresql?  Is it possible to have the restore_command ignore the leading
> > digit?
>
> The first digit in the WAL filename is the timeline.
>
> I think we need to figure out a better way to promote slaves when there
> is a new master, but no one has done the research yet.

In Postgresql 8.0, I used to rely on what seemed to be a bug in it when
it didn't switch timelines if restore_command returned a non-zero status,
and that worked like a charm more than once for me.  Can switching time-
lines be just made optional in recovery.conf or depending on what
restore_command returns?  Sorry if I'm missing any important architectural
points here.

Yar

Re: Warm Standby and resetting the primary as a standby

От
Bruce Momjian
Дата:
Yaroslav Tykhiy wrote:
> On Sat, Aug 21, 2010 at 12:45:44PM -0400, Bruce Momjian wrote:
> > Derrick Rice wrote:
> > > I've been reading up on the documentation for WAL shipping and warm standby
> > > configuration. One concern that I have (a common one, I'm sure) is that it
> > > seems that after bringing a standby server up as primary, other standby
> > > servers (including the original primary) need to be rebased before they can
> > > read the new primary's WALs in continuous recovery mode.
> > >
> > > It seems that the cause of this is a change to the leading digit of the WAL
> > > files:
> > >
> > > http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php
> > > http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php
> > >
> > > I was hoping that someone would shed some light on this situation with a
> > > technical explanation.  It's not clear to me why the WAL files are
> > > incompatible or why the digit increases. What does that first digit mean to
> > > postgresql?  Is it possible to have the restore_command ignore the leading
> > > digit?
> >
> > The first digit in the WAL filename is the timeline.
> >
> > I think we need to figure out a better way to promote slaves when there
> > is a new master, but no one has done the research yet.
>
> In Postgresql 8.0, I used to rely on what seemed to be a bug in it when
> it didn't switch timelines if restore_command returned a non-zero status,
> and that worked like a charm more than once for me.  Can switching time-
> lines be just made optional in recovery.conf or depending on what
> restore_command returns?  Sorry if I'm missing any important architectural
> points here.

Sorry, I don't know.  I think the timelines are only there for safety if
you have to fall back to the previous timeline, and to prevent timeline
mixing.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Warm Standby and resetting the primary as a standby

От
Derrick Rice
Дата:


On Mon, Aug 23, 2010 at 5:45 PM, Bruce Momjian <bruce@momjian.us> wrote:


Sorry, I don't know.  I think the timelines are only there for safety if
you have to fall back to the previous timeline, and to prevent timeline
mixing.

Thanks for the helpful answers.

Two follow up questions which, if they can be answered, will save some time before I go testing random theories.

Is there a way to bump a database up a timeline version without specifying the exact timeline version of interest?  Apparently doing a rebase from a database which has incremented its own timeline from doing a recovery does at least this.

Is it possible to interpret the requested file and ignore the timeline digits and provide a file from some other timeline?  Or is the timeline mean more than just the file name?

Derrick

Re: Warm Standby and resetting the primary as a standby

От
Bruce Momjian
Дата:
Derrick Rice wrote:
> On Mon, Aug 23, 2010 at 5:45 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
> >
> >
> > Sorry, I don't know.  I think the timelines are only there for safety if
> > you have to fall back to the previous timeline, and to prevent timeline
> > mixing.
>
>
> Thanks for the helpful answers.
>
> Two follow up questions which, if they can be answered, will save some time
> before I go testing random theories.
>
> Is there a way to bump a database up a timeline version without specifying
> the exact timeline version of interest?  Apparently doing a rebase from a
> database which has incremented its own timeline from doing a recovery does
> at least this.
>
> Is it possible to interpret the requested file and ignore the timeline
> digits and provide a file from some other timeline?  Or is the timeline mean
> more than just the file name?

No idea. Sorry.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Warm Standby and resetting the primary as a standby

От
Tom Lane
Дата:
Derrick Rice <derrick.rice@gmail.com> writes:
> Is it possible to interpret the requested file and ignore the timeline
> digits and provide a file from some other timeline?  Or is the timeline mean
> more than just the file name?

Sounds dangerous as can be to me.  The point of the timeline stuff is
that a particular range of WAL LSNs might contain different stuff in
different timelines.

            regards, tom lane