Re: PATCH: track last known XLOG segment in control file

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PATCH: track last known XLOG segment in control file
Дата
Msg-id 566CA5A4.8000709@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: PATCH: track last known XLOG segment in control file  (Andres Freund <andres@anarazel.de>)
Ответы Re: PATCH: track last known XLOG segment in control file  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers

On 12/12/2015 11:39 PM, Andres Freund wrote:
> On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote:
>> On 12/12/2015 11:20 PM, Andres Freund wrote:
>>> On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote:
>>>> this is the second improvement proposed in the thread [1] about ext4 data
>>>> loss issue. It adds another field to control file, tracking the last known
>>>> WAL segment. This does not eliminate the data loss, just the silent part of
>>>> it when the last segment gets lost (due to forgetting the rename, deleting
>>>> it by mistake or whatever). The patch makes sure the cluster refuses to
>>>> start if that happens.
>>>
>>> Uh, that's fairly expensive. In many cases it'll significantly
>>> increase the number of fsyncs.
>>
>> It should do exactly 1 additional fsync per WAL segment. Or do you think
>> otherwise?
>
> Which is nearly doubling the number of fsyncs, for a good number of
> workloads. And it does so to a separate file, i.e. it's not like
> these writes and the flushes can be combined. In workloads where
> pg_xlog is on a separate partition it'll add the only source of
> fsyncs besides checkpoint to the main data directory.

I doubt it will make any difference in practice, at least on reasonable 
hardware (which you should have, if fsync performance matters to you).

But some performance testing will be necessary, I don't expect this to 
go in without that. It'd be helpful if you could describe the workload.

>>> I've a bit of a hard time believing this'll be worthwhile.
>>
>> The trouble is protections like this only seem worthwhile after the fact,
>> when something happens. I think it's reasonable protection against issues
>> similar to the one I reported ~2 weeks ago. YMMV.
>
> Meh. That argument can be used to justify about everything.
>
> Obviously we should be more careful about fsyncing files, including
> the directories. I do plan come back to your recent patch.

My argument is that this is a reasonable protection against failures in 
that area - both our faults (in understanding the durability guarantees 
on a particular file system), or file system developer.

Maybe it's not, because the chance of running into exactly the same 
issue in this part of code is negligible.

>
>>> Additionally this doesn't seem to take WAL replay into account?
>>
>> I think the comparison in StartupXLOG needs to be less strict, to allow
>> cases when we actually replay more WAL segments. Is that what you mean?
>
> What I mean is that the value isn't updated during recovery, afaics.
> You could argue that minRecoveryPoint is that, in a way.

Oh, right. Will fix if we conclude that the general idea makes sense.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Bootstrap DATA is a pita
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)