Re: adding support for posix_fadvise()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: adding support for posix_fadvise()
Дата
Msg-id 13972.1067870303@sss.pgh.pa.us
обсуждение исходный текст
Ответ на adding support for posix_fadvise()  (Neil Conway <neilc@samurai.com>)
Ответы Re: adding support for posix_fadvise()
Список pgsql-hackers
Neil Conway <neilc@samurai.com> writes:
> So what API is desirable for uses 2-4? I'm thinking of adding a new
> function to the smgr API, smgradvise().

It's a little premature to be inventing APIs when you have no evidence
that this will make any useful performance difference.  I'd recommend a
quick hack to get proof of concept before you bother with nice APIs.

> Given a Relation and an advice, this would:
> (a) propagate the advice for this relation to all the open FDs for the
> relation

"All"?  You cannot affect the FDs being used by other backends.  It's
fairly unclear to me what the posix_fadvise function is really going
to do for files that are being accessed by multiple processes.  For
instance, is there any value in setting POSIX_FADV_DONTNEED on a WAL
file, given that every other backend is going to have that same file
open?  I would expect that rational kernel behavior would be to ignore
this advice unless it's set by the last backend to have the file open
--- but I'm not sure we can synchronize the closing of old WAL segments
well enough to know which backend is the last to close the file.

A related problem is that the smgr uses the same FD to access the same
relation no matter how many scans are in progress.  Think about a
complex query that is doing both a seqscan and an indexscan on the same
relation (a self-join could easily do this).  You'd really need to
change this if you want POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM to
get set usefully.

In short I think you need to do some more thinking about what the scope
of the advice flags is going to be ...

> (b) store the new advice somewhere so that new FDs for the relation can
> have this advice set for them: clients should just be able to call
> smgradvise() without needing to worry if someone else has already called
> smgropen() for the relation in the past. One problem is how to store
> this: I don't think it can be a field of RelationData, since that is
> transient. Any suggestions?

Something Vadim had wanted to do for years is to decouple the smgr and
lower levels from the existing Relation cache, and have a low-level
notion of "open relation" that only requires having the "RelFileNode"
value to open it.  This would allow eliminating the concept of blind
write, which would be a Very Good Thing.  It would make sense to
associate the advice setting with such low-level relations.  One
possible way to handle the multiple-scan issue is to make the desired
advice part of the low-level open() call, so that you actually have
different low-level relations for seq and random access to a relation.
Not sure if this works cleanly when you take into account issues like
smgrunlink, but it's something to think about.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: adding support for posix_fadvise()
Следующее
От: Andrew Sullivan
Дата:
Сообщение: Re: Experimental patch for inter-page delay in VACUUM