Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
| От | Anthony Iliopoulos | 
|---|---|
| Тема | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS | 
| Дата | |
| Msg-id | 20180409123126.GB4233@ai-wks обсуждение исходный текст | 
| Ответ на | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Geoff Winkless <pgsqladmin@geoff.dj>) | 
| Ответы | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS | 
| Список | pgsql-hackers | 
On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote: > On 9 April 2018 at 11:50, Anthony Iliopoulos <ailiop@altatus.com> wrote: > > > What you seem to be asking for is the capability of dropping > > buffers over the (kernel) fence and idemnifying the application > > from any further responsibility, i.e. a hard assurance > > that either the kernel will persist the pages or it will > > keep them around till the application recovers them > > asynchronously, the filesystem is unmounted, or the system > > is rebooted. > > > > That seems like a perfectly reasonable position to take, frankly. Indeed, as long as you are willing to ignore the consequences of this design decision: mainly, how you would recover memory when no application is interested in clearing the error. At which point other applications with different priorities will find this position rather unreasonable since there can be no way out of it for them. Good luck convincing any OS kernel upstream to go with this design. > The whole _point_ of an Operating System should be that you can do exactly > that. As a developer I should be able to call write() and fsync() and know > that if both calls have succeeded then the result is on disk, no matter > what another application has done in the meantime. If that's a "difficult" > problem then that's the OS's problem, not mine. If the OS doesn't do that, > it's _not_doing_its_job_. No OS kernel that I know of provides any promises for atomicity of a write()+fsync() sequence, unless one is using O_SYNC. It doesn't provide you with isolation either, as this is delegated to userspace, where processes that share a file should coordinate accordingly. It's not a difficult problem, but rather the kernels provide a common denominator of possible interfaces and designs that could accommodate a wider range of potential application scenarios for which the kernel cannot possibly anticipate requirements. There have been plenty of experimental works for providing a transactional (ACID) filesystem interface to applications. On the opposite end, there have been quite a few commercial databases that completely bypass the kernel storage stack. But I would assume it is reasonable to figure out something between those two extremes that can work in a "portable" fashion. Best regards, Anthony
В списке pgsql-hackers по дате отправления: