Обсуждение: Segmentation fault occurs when the standby becomes primary, in SR

Поиск
Список
Период
Сортировка

Segmentation fault occurs when the standby becomes primary, in SR

От
Fujii Masao
Дата:
Hi,

When I created the trigger file to activate the standby server,
I got the segmentation fault:

  sby [11342]: LOG:  trigger file found: ../trigger
  sby [11343]: FATAL:  terminating walreceiver process due to
administrator command
  sby [11342]: LOG:  redo done at 0/10000E0
  sby [11342]: LOG:  last completed transaction was at log time
2000-01-01 09:21:04.685861+09
  sby [11341]: LOG:  startup process (PID 11342) was terminated by
signal 11: Segmentation fault
  sby [11341]: LOG:  terminating any other active server processes

This happens in the following scenario:

0. The trigger file is found.
1. The variable StandbyMode is reset to FALSE before re-fetching
   the last applied record.
2. That record attempts to be read from the archive.
3. RestoreArchivedFile() goes through the following condition
   expression because the StandbyMode is off.

     if (StandbyMode && recoveryRestoreCommand == NULL)
         goto not_available;

4. RestoreArchivedFile() wrongly constructs the command to be
   executed even though restore_command has not been supplied
   (this is possible in standby mode).
   ---> Segmentation fault!

The attached patch would fix the bug.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: Segmentation fault occurs when the standby becomes primary, in SR

От
Heikki Linnakangas
Дата:
Fujii Masao wrote:
> When I created the trigger file to activate the standby server,
> I got the segmentation fault:
> 
> ...
> The attached patch would fix the bug.

Thanks, committed. (I kept the old comment, though, I liked it better)

Now, whether we should even allow setting up a standby without
restore_command is another question. It's *possible*, but you need to
enable archiving in the master anyway to take an on-line backup, and you
need the archive to catch up if the standby ever falls behind too much.

Then again, if the database is small, maybe you don't mind taking a new
base backup if the standby falls behind. And you *can* take a base
backup with a dummy archive_command (ie. archive_command='/bin/true'),
if you trust that the WAL files stay in pg_xlog long enough for standby
to stream them from there.

Perhaps we should require a restore_command. If you know what you're
doing, you can always use '/bin/false' as restore_command to hack around it.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Re: Segmentation fault occurs when the standby becomes primary, in SR

От
Robert Haas
Дата:
On Thu, Jan 28, 2010 at 2:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Perhaps we should require a restore_command. If you know what you're
> doing, you can always use '/bin/false' as restore_command to hack around it.

That seems kind of needlessly hacky (and it won't work on Windows).
Seems like it doesn't cost anything to let it be omitted altogether.

...Robert


Re: Segmentation fault occurs when the standby becomes primary, in SR

От
Fujii Masao
Дата:
On Fri, Jan 29, 2010 at 4:23 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Thanks, committed. (I kept the old comment, though, I liked it better)

Thanks!

> Then again, if the database is small, maybe you don't mind taking a new
> base backup if the standby falls behind. And you *can* take a base
> backup with a dummy archive_command (ie. archive_command='/bin/true'),
> if you trust that the WAL files stay in pg_xlog long enough for standby
> to stream them from there.

Yeah, this is one of the case that restore_command is not required
for SR.

> Perhaps we should require a restore_command. If you know what you're
> doing, you can always use '/bin/false' as restore_command to hack around it.

One of main aim of SR is an easy-to-setup. So I don't want to
impose such a hacky setting of restore_command on users.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center