Re: Prereading using posix_fadvise (was Re: Commitfest patches)

Поиск
Список
Период
Сортировка
От Zeugswetter Andreas OSB SD
Тема Re: Prereading using posix_fadvise (was Re: Commitfest patches)
Дата
Msg-id E1539E0ED7043848906A8FF995BDA57902EB1583@m0143.s-mxs.net
обсуждение исходный текст
Ответ на Prereading using posix_fadvise (was Re: Commitfest patches)  (Heikki Linnakangas <heikki@enterprisedb.com>)
Ответы Re: Prereading using posix_fadvise (was Re: Commitfest patches)  (Heikki Linnakangas <heikki@enterprisedb.com>)
Список pgsql-hackers
Heikki wrote:
> It seems that the worst case for this patch is a scan on a table that
> doesn't fit in shared_buffers, but is fully cached in the OS cache. In

> that case, the posix_fadvise calls would be a certain waste of time.

I think this is a misunderstanding, the fadvise is not issued to read
the
whole table and is not issued for table scans at all (and if it were it
would
only advise for the next N pages).

So it has nothing to do with table size. The fadvise calls need to be
(and are)
limited by what can be used in the near future, and not for the whole
statement.

e.g. N next level index pages that are relevant, or N relevant heap
pages one
index leaf page points at. Maybe in the index case N does not need to be
limited,
since we have a natural limit on how many pointers fit on one page.

In general I think separate reader processes (or threads :-) that
directly read
into the bufferpool would be a more portable and efficient
implementation.
E.g. it could use ScatterGather IO. So I think we should look, that the
fadvise
solution is not obstruing that path, but I think it does not.

Gregory wrote:
>> A more invasive form of this patch would be to assign and pin a
buffer when
>> the preread is done. That would men subsequently we would have a
pinned buffer
>> ready to go and not need to go back to the buffer manager a second
time. We
>> would instead just "complete" the i/o by issuing a normal read call.

I guess you would rather need to mark the buffer for use for this page,
but let any backend that needs it first, pin it and issue the read.
I think the fadviser should not pin it in advance, since he cannot
guarantee to
actually read the page [soon]. Rather remember the buffer and later
check and pin
it for the read. Else you might be blocking the buffer.
But I think doing something like this might be good since it avoids
issuing duplicate
fadvises.

Andreas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: segfault in locking code
Следующее
От: "Brendan Jurd"
Дата:
Сообщение: Re: Status of GIT mirror (Was having problem in rsync'ing cvs)