Re: fsync-pgdata-on-recovery tries to write to more files than previously

Поиск
Список
Период
Сортировка
От Abhijit Menon-Sen
Тема Re: fsync-pgdata-on-recovery tries to write to more files than previously
Дата
Msg-id 20150528142614.GA21390@toroid.org
обсуждение исходный текст
Ответ на Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Ответы Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Robert Haas <robertmhaas@gmail.com>)
Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi.

Here's an updated patch for the fsync problem(s).

A few points that may be worth considering:

1. I've made the ReadDir → ReadDirExtended change, but in retrospect I
   don't think it's really worth it. Using ReadDir and letting it die
   is probably fine. (Aside: I left ReadDirExtended static for now.)

2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c?
   I wasn't sure if I should move that to fd.c as well. I think it's
   borderline OK for now.

3. There's a comment in fsync_fname that we need to open directories
   with O_RDONLY and non-directories with O_RDWR. Does this also apply
   to pre_sync_fname (i.e. sync_file_range and posix_fadvise) on any
   platforms we care about? It doesn't seem so, but I'm not sure.

4. As a partial aside, why does fsync_fname use OpenTransientFile? It
   looks like it should use BasicOpenFile as pre_sync_fname does. We
   close the fd immediately after calling fsync anyway. Or is there
   some reason I'm missing that we should use OpenTransientFile in
   pre_sync_fname too?

   (Aside²: it's pointless to specify S_IRUSR|S_IWUSR in fsync_fname
   since we're not setting O_CREAT. Minor, so will submit separate
   patch.)

5. I made walktblspc_entries use stat() instead of lstat(), so that we
   can handle symlinks and directories the same way. Note that we won't
   continue to follow links inside the sub-directories because we use
   walkdir instead of recursing.

   But this depends on S_ISDIR() returning true after stat() on Windows
   with a junction. Does that work? And are the entries in pb_tblspc on
   Windows actually junctions?

I've tested this with unreadable files in PGDATA and junk inside
pb_tblspc. Feedback welcome.

I have a separate patch to initdb with the corresponding changes, which
I will post after dinner and a bit more testing.

-- Abhijit

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_upgrade resets timeline to 1
Следующее
От: Noah Misch
Дата:
Сообщение: Re: pg_upgrade resets timeline to 1