Re: Serious problem: media recovery fails after system or PostgreSQL crash

Поиск
Список
Период
Сортировка
От MauMau
Тема Re: Serious problem: media recovery fails after system or PostgreSQL crash
Дата
Msg-id A78791C96DD04D6CBC81A1F44C9E15CE@maumau
обсуждение исходный текст
Ответ на Re: Serious problem: media recovery fails after system or PostgreSQL crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tomas wrote:
> So why don't you use an archive command that does not create such
> incomplete files? I mean something like this:
>
> archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f'
>
> Until the file is renamed, it's considered 'incomplete'.


That is a good idea.  This would surely solve my problem.  Don't you think 
this must be described in PG manual?

Now there are two solutions given.  I think it would be better to implement 
both of these.

1. Tomas's idea
Instruct the method like below in the PG manual.

archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f'

[Merits]
The incomplete file will not eventually remain.  The successful archival 
erases the incomplete file created during the previous interrupted archival.

[Demerits]
The administrator or developer of existing systems or products using 
PostgreSQL will not be likely to notice this change in the PG manual.  So 
they might face the media recovery failure.


2. My idea
PG continues media recovery when it encounters a partially filled WAL 
segment file.  It ereport(LOG)'s the unexpected size, and try to fetch the 
WAL file from pg_xlog/.  This is logically natural.  If the archived WAL 
file is smaller than the expected size, it (almost always) means that 
archive processing was interrupted and the unarchived WAL file is in 
pg_xlog/.  So PG should instead the same WAL file from pg_xlog/ and continue 
recovery.  PG will then try to fetch the subsequent WAL files from archive 
again, but will hardly find those files there and fetch them from pg_xlog/. 
This behavior complies with the description in the PG manual.

I mean just changing FATAL to LOG in the code in RestoreArchivedFile():

--------------------------------------------------   if (StandbyMode && stat_buf.st_size < expectedSize)    elevel =
DEBUG1;  else    elevel = FATAL;   ereport(elevel,     (errmsg("archive file \"%s\" has wrong size: %lu instead of
%lu",      xlogfname,       (unsigned long) stat_buf.st_size,       (unsigned long) expectedSize)));   return false;
 
--------------------------------------------------

[Merits]
The administrator or developer of existing systems or products using 
PostgreSQL will benefit from this fix.

[Demerits]
The incomplete archive file(s) might be left forever unless the DBA delete 
them.  That occurs if pg_xlog/ is lost and the incomplete archive files will 
not be overwritten.


Could you give me opinions what to do?  I'm willing to submit these fixes.


Regards
MauMau






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_upgrade problem with invalid indexes
Следующее
От: Andres Freund
Дата:
Сообщение: Re: pg_upgrade problem with invalid indexes