Re: Two fsync related performance issues?

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Two fsync related performance issues?
Дата
Msg-id CA+hUKGKT6XiPiEJrqeOFGi7RYCGzbBysF9pyWwv0-jm-oNajxg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Two fsync related performance issues?  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Two fsync related performance issues?  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Wed, Sep 9, 2020 at 3:49 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Sep 3, 2020 at 11:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Wed, May 27, 2020 at 12:31 AM Craig Ringer <craig@2ndquadrant.com> wrote:
> > > On Tue, 12 May 2020, 08:42 Paul Guo, <pguo@pivotal.io> wrote:
> > >> 1. StartupXLOG() does fsync on the whole data directory early in the crash recovery. I'm wondering if we could
skipsome directories (at least the pg_log/, table directories) since wal, etc could ensure consistency. Here is the
relatedcode.
 
> > >>
> > >>       if (ControlFile->state != DB_SHUTDOWNED &&
> > >>           ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > >>       {
> > >>           RemoveTempXlogFiles();
> > >>           SyncDataDirectory();
> > >>       }
>
> > 4.  In datadir_fsync_fname(), if ParseRelationPath() is able to
> > recognise a file as being a relation file, build a FileTag and call
> > RegisterSyncRequest() to tell the checkpointer to sync this file as
> > part of the end checkpoint (currently the end-of-recovery checkpoint,
> > but that could also be relaxed).
>
> For the record, Andres Freund mentioned a few problems with this
> off-list and suggested we consider calling Linux syncfs() for each top
> level directory that could potentially be on a different filesystem.
> That seems like a nice idea to look into.

Here's an experimental patch to try that.  One problem is that before
Linux 5.8, syncfs() doesn't report failures[1].  I'm not sure what to
think about that; in the current coding we just log them and carry on
anyway, but any theoretical problems that causes for BLK_DONE should
be moot anyway because of FPIs which result in more writes and syncs.
Another is that it may affect other files that aren't under pgdata as
collateral damage, but that seems acceptable.  It also makes me a bit
sad that this wouldn't help other OSes.

(Archeological note:  The improved syncfs() error reporting is linked
to 2018 PostgreSQL/Linux hacker discussions[2], because it was thought
that syncfs() might be useful for checkpointing, though I believe
since then things have moved on and the new thinking is that we'd use
a new proposed interface to read per-filesystem I/O error counters
while checkpointing.)

[1] https://man7.org/linux/man-pages/man2/sync.2.html
[2] https://lwn.net/Articles/752063/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "k.jamison@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist
Следующее
От: James Coleman
Дата:
Сообщение: Re: enable_incremental_sort changes query behavior