Re: PATCH: track last known XLOG segment in control file
От | Tomas Vondra |
---|---|
Тема | Re: PATCH: track last known XLOG segment in control file |
Дата | |
Msg-id | 566CA5A4.8000709@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: PATCH: track last known XLOG segment in control file (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: PATCH: track last known XLOG segment in control file
(Amit Kapila <amit.kapila16@gmail.com>)
|
Список | pgsql-hackers |
On 12/12/2015 11:39 PM, Andres Freund wrote: > On 2015-12-12 23:28:33 +0100, Tomas Vondra wrote: >> On 12/12/2015 11:20 PM, Andres Freund wrote: >>> On 2015-12-12 22:14:13 +0100, Tomas Vondra wrote: >>>> this is the second improvement proposed in the thread [1] about ext4 data >>>> loss issue. It adds another field to control file, tracking the last known >>>> WAL segment. This does not eliminate the data loss, just the silent part of >>>> it when the last segment gets lost (due to forgetting the rename, deleting >>>> it by mistake or whatever). The patch makes sure the cluster refuses to >>>> start if that happens. >>> >>> Uh, that's fairly expensive. In many cases it'll significantly >>> increase the number of fsyncs. >> >> It should do exactly 1 additional fsync per WAL segment. Or do you think >> otherwise? > > Which is nearly doubling the number of fsyncs, for a good number of > workloads. And it does so to a separate file, i.e. it's not like > these writes and the flushes can be combined. In workloads where > pg_xlog is on a separate partition it'll add the only source of > fsyncs besides checkpoint to the main data directory. I doubt it will make any difference in practice, at least on reasonable hardware (which you should have, if fsync performance matters to you). But some performance testing will be necessary, I don't expect this to go in without that. It'd be helpful if you could describe the workload. >>> I've a bit of a hard time believing this'll be worthwhile. >> >> The trouble is protections like this only seem worthwhile after the fact, >> when something happens. I think it's reasonable protection against issues >> similar to the one I reported ~2 weeks ago. YMMV. > > Meh. That argument can be used to justify about everything. > > Obviously we should be more careful about fsyncing files, including > the directories. I do plan come back to your recent patch. My argument is that this is a reasonable protection against failures in that area - both our faults (in understanding the durability guarantees on a particular file system), or file system developer. Maybe it's not, because the chance of running into exactly the same issue in this part of code is negligible. > >>> Additionally this doesn't seem to take WAL replay into account? >> >> I think the comparison in StartupXLOG needs to be less strict, to allow >> cases when we actually replay more WAL segments. Is that what you mean? > > What I mean is that the value isn't updated during recovery, afaics. > You could argue that minRecoveryPoint is that, in a way. Oh, right. Will fix if we conclude that the general idea makes sense. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Andres FreundДата:
Сообщение: Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)