[Linus Torvalds ] Re: statfs() / statvfs() syscall ballsup...

Поиск
Список
Период
Сортировка
От Greg Stark
Тема [Linus Torvalds ] Re: statfs() / statvfs() syscall ballsup...
Дата
Msg-id 87zng62jgz.fsf@stark.dyndns.tv
обсуждение исходный текст
Ответы Re: Database Kernels and O_DIRECT  (James Rogers <jamesr@best.com>)
Список pgsql-hackers
There's an interesting thread on linux-kernel right now about O_DIRECT and the
kernel i/o APIs databases need. I noticed a connection between what they were
discussing and the earlier discussions here and the pining for an interface to
avoid having vacuum preempt other disk i/o.


Someone from Oracle is on there explaining what Oracle's needs are. Perhaps
someone more knowledgable than myself could explain what would most help
postgres in this area.


There was another thread I commented on that touched on another postgres
wishlist item. A way to sync IDE disks reliably without disabling write
caching entirely. There was some inkling that newer drives might provide for
such a possibility. Perhaps that too could be worth advocating for on
postgres's behalf.




On 12 Oct 2003, Greg Stark wrote:
>
> There are other reasons databases want to control their own cache. The
> application knows more about the usage and the future usage of the data than
> the kernel does.

But this again is not an argument for not using the page cache - it's only
an argument for _telling_ the kernel about its use.

> However on busy servers whenever it's run it causes lots of pain because the
> kernel flushes all the cached data in favour of the data this job touches.

Yes. But this is actually pretty easy to avoid in-kernel, since all of the
LRU logic is pretty localized.

It could be done on a per-process thing ("this process should not pollute
the active list") or on a per-fd thing ("accesses through this particular
open are not to pollute the active list").

>                                     And
> worse, there's no way to indicate that the i/o it's doing is lower priority,
> so i/o bound servers get hit dramatically.

IO priorities are pretty much worthless. It doesn't _matter_ if other
processes get preferred treatment - what is costly is the latency cost of
seeking. What you want is not priorities, but batching.

            Linus




--
greg

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: 2-phase commit
Следующее
От: dwolt@iserv.net (Dawn M. Wolthuis)
Дата:
Сообщение: Re: Dreaming About Redesigning SQL