Re: Checking for missing heap/index files

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Checking for missing heap/index files
Дата
Msg-id Y07ycEcWop2AMMok@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Checking for missing heap/index files  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Checking for missing heap/index files
Список pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Oct 18, 2022 at 12:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > There is no text suggesting that it's okay to miss, or to double-return,
> > an entry that is present throughout the scan.  So I'd interpret the case
> > you're worried about as "forbidden by POSIX".  Of course, it's known that
> > NFS fails to provide POSIX semantics in all cases --- but I don't know
> > if this is one of them.
>
> Yeah, me neither. One problem I see is that, even if the behavior is
> forbidden by POSIX, if it happens in practice on systems people
> actually use, then it's an issue. We even have documentation saying
> that it's OK to use NFS, and a lot of people do -- which IMHO is
> unfortunate, but it's also not clear what the realistic alternatives
> are. It's pretty hard to tell people in 2022 that they are only
> allowed to use PostgreSQL with local storage.
>
> But to put my cards on the table, it's not so much that I am worried
> about this problem myself as that I want to know whether we're going
> to do anything about it as a project, and if so, what, because it
> intersects a patch that I'm working on. So if we want to readdir() in
> one fell swoop and cache the results, I'm going to go write a patch
> for that. If we don't, then I'd like to know whether (a) we think that
> would be theoretically acceptable but not justified by the evidence
> presently available or (b) would be unacceptable due to (b1) the
> potential for increased memory usage or (b2) some other reason.

While I don't think it's really something that should be happening, it's
definitely something that's been seen with some networked filesystems,
as reported.  I also strongly suspect that on local filesystems there's
something that prevents this from happening but as mentioned that
doesn't cover all PG use cases.

In pgbackrest, we moved to doing a scan and cache'ing all of the results
in memory to reduce the risk when reading from the PG data dir.  We also
reworked our expire code (which removes an older backup from the backup
repository) to also do a complete scan before removing files.

I don't see it as likely to be acceptable, but arranging to not add or
remove files while the scan is happening would presumably eliminate the
risk entirely.  We've not seen this issue recur in the expire command
since the change to first completely scan the directory and then go and
remove the files from it.  Perhaps just not removing files during the
scan would be sufficient which might be more reasonable to do.

Thanks,

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: effective_multixact_freeze_max_age issue
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Checking for missing heap/index files