Re: fsync-pgdata-on-recovery tries to write to more files than previously

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: fsync-pgdata-on-recovery tries to write to more files than previously
Дата
Msg-id 20150526204403.GG5310@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Andres Freund <andres@anarazel.de>)
Ответы Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Список pgsql-hackers
On 2015-05-26 19:07:20 +0200, Andres Freund wrote:
> It is somewhat interesting that similar code has been used in
> pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
> causing reported problem. I think the relevant difference is that that
> code doesn't follow symlinks.  It's obviously also less exercised and
> poeople might just have fixed up permissions when encountering troubles.
> 
> Abhijit, do you recall why the code was changed to follow all symlinks
> in contrast to explicitly going through the tablespaces as initdb -S
> does? I'm pretty sure early versions of the patch pretty much had a
> verbatim copy of the initdb logic?  That logic is missing pg_xlog btw,
> which is bad for pg_upgrade.

So, this was discussed in the following thread, starting at:
http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org

"Actually, since surely we must follow symlinks everywhere, why do we
have to do this separately for pg_tblspc?  Shouldn't that link-following
occur automatically when walking PGDATA in the first place?"

I don't think it's true that we must follow symlinks everywhere. I
think, as argued upthread, that it's sufficient to recurse through
PGDATA, follow the symlinks in pg_tbspc, and if a symlink, also go
through pg_xlog separately.  There are no other places we it's "allowed"
to introduce symlinks and we have refuted bugreports of people having
problems after doing that.

So what I propose is:
1) Remove the automatic symlink following
2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in  initdb -S
3) Add a elevel argument to walkdir(), return if AllocateDir() fails,  continue for stat() failures in the readdir()
loop.
4) Add elevel argument to pre_sync_fname, fsync_fname, return after  errors.
5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By  virtue of not following symlinks we should not
needto worry about  EROFS
 

I'm inclined to think that 4) is a big enough compat break that a
fsync_fname_ext with the new argument is a good idea.

Arguments for/against?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Naoya Anzai
Дата:
Сообщение: why does txid_current() assign new transaction-id?
Следующее
От: Paul Smith
Дата:
Сообщение: Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound