Re: fsync reliability

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: fsync reliability
Дата
Msg-id 4DB175FD.6070007@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: fsync reliability  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: fsync reliability  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
Simon Riggs wrote:
> We do issue fsync and then close, but only when we switch log files.
> We don't do that as part of the normal commit path.
>   

Since all these files are zero-filled before use, the space is allocated 
for them, and the remaining important filesystem layout metadata gets 
flushed during the close.  The only metadata that changes after 
that--things like the last access time--isn't important to the WAL 
functioning.  So the metadata doesn't need to be updated after a normal 
commit, it's already there.  There are two main risks when crashing 
while fsync is in the middle of executing a push out to physical 
storage: torn pages due to partial data writes, and other out of order 
writes.  The only filesystems where this isn't true are the copy on 
write ones, where the blocks move around on disk too.  But those all 
have their own more careful guarantees about metadata too.

> The issue you raise above where "fsync is not safe for Write Ahead
> Logging" doesn't sound good. I don't think what you've said has fully
> addressed that yet. We could replace the commit path with O_DIRECT and
> physically order the data blocks, but I would guess the code path to
> durable storage has way too many bits of code tweaking it for me to
> feel happy that was worth it.
>   

As far as I can tell the CRC is sufficient protection against that.  
This is all data that hasn't really been committed being torn up here.  
Once you trust that the metadata problem isn't real, reordered writes 
are the only going to destroy things that are in the middle of being 
flushed to disk.  A synchronous commit mangled this way will be rolled 
back regardless because it never really finished (fsync didn't return); 
an asynchronous one was never guaranteed to be on disk.

On many older Linux systems O_DIRECT is a less reliable code path than 
than write/fsync is, so you're right that isn't necessarily a useful 
step forward.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: fsync reliability
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: "stored procedures"