Re: BUG #14999: pg_rewind corrupts control file global/pg_control

Поиск
Список
Период
Сортировка
От TipTop Labs
Тема Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Дата
Msg-id E1979209-A39E-4229-BBE0-59BC43FB4691@tiptop-labs.com
обсуждение исходный текст
Ответ на Re: BUG #14999: pg_rewind corrupts control file global/pg_control  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: BUG #14999: pg_rewind corrupts control file global/pg_control  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
> On Apr 5, 2018, at 3:38 AM, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Apr 04, 2018 at 02:50:12PM -0400, Tom Lane wrote:
>> When I first started looking at this thread, I wondered if maybe somebody
>> had had in mind to create an active defense against starting a postmaster
>> in an inconsistent target cluster, by dint of intentionally truncating
>> pg_control before the transfer starts and not making it valid again till
>> the very end.  It's now clear from looking at the code that that's not
>> what's going on :-(.  But I wonder how hard it would be to make it so,
>> and whether that'd be worth doing if it's not too hard.
>
> One safeguard I can see is to add an extra loop just before processing
> the file map to check if a file planned for copy can be sanely opened or
> not.  That's simple to do with open_target_file(), followed by
> close_target_file().  If there is an EACCES error, then we complain at
> this early stage, and the data folder can be reused after cleaning up
> the data folder from those files.  That's not actually hard to do at
> quick glance, but I may be missing something so let's be careful.
>
> Another approach that we could consider, HEAD-only though, is to extend
> pg_stat_file so as an entry's st_mode is available and we could use that
> to not copy the file's data from the start.  That would be actually
> quite clean, particularly for large read-only files.
>
>> Actually, probably a safer way to attack that would be to remove or
>> rename the topmost PG_VERSION file, and then put it back afterwards.
>> That'd be far easier to recover from manually, if need be, than
>> clobbering pg_control.
>>
>> In any case, that seems separate from the question of what to do with
>> read-only files in the data directory.  Should we push forward with
>> committing Michael's previous patch, and leave that issue for later?
>
> That one is not good either as long as we don't make the difference in
> the data folder between three types of files:
> 1) Relation files.
> 2) System files which are critical at diverse degrees for the system.
> 3) Custom configuration files.
>
> pg_rewind is able to consider 1) as a separate category as it usually
> needs to fetch a range set of blocks, and puts 2) and 3) in the same
> bucket.  It could be possible to split 2) and 3) but the maintenance
> cost is high as for each new system file added in Postgres we would need
> to maintain an extra filtering logic in pg_rewind, something which is
> most likely going to rot, leading to potential corruption problems.
>
> My point is that I have seen on some VMware test beds a kernel going
> rogue because of ESX servers, causing a PostgreSQL-related partition to
> go in read-only mode.  So if you use my previous patch, and that the
> partition where the target server's data folder is located turns
> all-of-a-sudden to read-only, there is a risk of silencing real
> failures.  Even if the fetched data is ordered, paths like pg_xact and
> pg_twophase are processed after all relation files, so failing to write
> them correctly would silently break a set of transactions :(
>
> So please let me suggest a set of two patches:
> 1) 0001 is a documentation patch which should be back-patched.  This
> tells two things:
> 1-1) If pg_rewind fails in the middle of processing, then the best thing
> to do is to re-create the data folder using a new fresh backup.
> 1-2) Be careful of files that the process running pg_rewind is not able
> to write to.  If those are located on both the target and the source,
> then pg_rewind will copy it from the source to the target, leading to an
> immediate failure.  After removing those unwritable files, be sure to
> clean up anything which has been copied from the source and should be
> read-only, then rebuild the links.
> 2) 0002 is a patch for HEAD to order the set of chunks fetched, because
> this makes the data writes sequentials which is good for performance and
> limits the number of open()/close() on the files of the target's data
> folder.

Thanks for crediting me in patch 0002.

One final word about my original patch, since the commit message for 0001 refers to its limitations: I acknowledge that
itcannot cover situations where the order of processing files is such that the second loop detects non-writable files
onlyafter it had already done work on some others. It was meant to fix the specific problem I ran into without breaking
theregression tests that ship with the source code ("make check"). The main reason that it is not more elaborate is
thatI did not want to submit anything affecting more than a few lines of code without prior input from the community. 

So from my perspective I welcome both 0001 and 0002.

> I think that this is the best answer we can give now as there are a
> couple of ways to improving the handling of any checks, however there
> are side effects to any of them.  This also feels a bit like a new
> feature, while the documentation in pg_rewind sucks now for such
> details.  I have also spent some time detailing the commit message of
> the first patch about all that, that would be nice to keep in the git
> history.
>
> If the extra checks I am mentioning at the top of this email are worth
> adding, then it is possible to drop 1-2) partially, rewriting it to
> mention that pg_rewind tells the user at early stages about the failure
> and that a data folder can be again reused.  This does not change the
> fact that a pg_rewind breaking mid-flight should result in a new base
> backup, so documentation is mandatory anyway.  And there are a couple of
> approaches that we could consider here.  At least I saw two of them.
> Other folks may see better approaches than I did, who knows..
> Thanks,
> --
> Michael
>
<0001-Add-note-in-pg_rewind-documentation-about-read-only-.patch><0002-Enforce-ordering-of-data-chunks-fetched-in-pg_rewind.patch>

--
Christian



В списке pgsql-bugs по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: BUG #15143: Window Functions – Paranthese not allowed before OVER term
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #14999: pg_rewind corrupts control file global/pg_control