Обсуждение: O_DIRECT use

Поиск
Список
Период
Сортировка

O_DIRECT use

От
Bruce Momjian
Дата:
I have added this item to TODO:
* Consider use of open/fctl(O_DIRECT) to minimize OS caching

Web shows it minimized file system caching, perhaps for sequential
scans:
http://archives2.us.postgresql.org/pgsql-hackers/2001-09/msg00713.php

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I have added this item to TODO:
> >     * Consider use of open/fctl(O_DIRECT) to minimize OS caching
> 
> Why exactly would we wish to minimize OS caching?
> 
> In my mind, Postgres has always relied heavily on the existence of a
> layer of kernel caching.  Disabling that will hurt far more than help.

Not sure. Someone on IRC brought it up.  If we are sequential scanning a
large table, caching may be bad because we are pushing out stuff already
in the cache that may be useful.  It is related to this TODO item:
* Add free-behind capability for large sequential scans (Bruce)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have added this item to TODO:
>     * Consider use of open/fctl(O_DIRECT) to minimize OS caching

Why exactly would we wish to minimize OS caching?

In my mind, Postgres has always relied heavily on the existence of a
layer of kernel caching.  Disabling that will hurt far more than help.
        regards, tom lane


Re: O_DIRECT use

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Why exactly would we wish to minimize OS caching?
> 
> > Not sure. Someone on IRC brought it up.  If we are sequential scanning a
> > large table, caching may be bad because we are pushing out stuff already
> > in the cache that may be useful.
> 
> Yeah, but people normally try to set things up to avoid doing large
> sequential scans, at least in all the contexts where they need high
> performance.  For index searches you definitely want all the caching
> you can get.
> 
> For that matter, I would expect that O_DIRECT also defeats readahead,
> so I'd fully expect it to be a loser for seqscans too.

I am told on FreeBSD it does not disable read-ahead, just caching;
something that needs more research.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> For that matter, I would expect that O_DIRECT also defeats readahead,
>> so I'd fully expect it to be a loser for seqscans too.

> I am told on FreeBSD it does not disable read-ahead, just caching;
> something that needs more research.

Hmm.  I always thought of read-ahead as preloading buffer cache entries.

It'd be interesting to get a description of *exactly* what this flag
does, rather than handwavy approximations.  Time to start reading the
kernel code, I suppose.
        regards, tom lane


Re: O_DIRECT use

От
Brent Verner
Дата:
[2002-01-04 16:31] Bruce Momjian said:

| Not sure. Someone on IRC brought it up.  

Is there a pg IRC channel?  What is the server?

cheers. brent

-- 
"Develop your talent, man, and leave the world something. Records are 
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman


Re: O_DIRECT use

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Why exactly would we wish to minimize OS caching?

> Not sure. Someone on IRC brought it up.  If we are sequential scanning a
> large table, caching may be bad because we are pushing out stuff already
> in the cache that may be useful.

Yeah, but people normally try to set things up to avoid doing large
sequential scans, at least in all the contexts where they need high
performance.  For index searches you definitely want all the caching
you can get.

For that matter, I would expect that O_DIRECT also defeats readahead,
so I'd fully expect it to be a loser for seqscans too.
        regards, tom lane


Re: O_DIRECT use

От
Bruce Momjian
Дата:
Brent Verner wrote:
> [2002-01-04 16:31] Bruce Momjian said:
> 
> | Not sure. Someone on IRC brought it up.  
> 
> Is there a pg IRC channel?  What is the server?

FAQ item text is:
   <P>There is also an IRC channel on EFNet, channel   <I>#PostgreSQL.</I> I use the unix command <CODE>irc -c
'#PostgreSQL'"$USER" irc.phoenix.net.</CODE></P>
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Bruce Momjian
Дата:
Brent Verner wrote:
> [2002-01-04 16:31] Bruce Momjian said:
> 
> | Not sure. Someone on IRC brought it up.  
> 
> Is there a pg IRC channel?  What is the server?

See FAQ item 1.6.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> For that matter, I would expect that O_DIRECT also defeats readahead,
> >> so I'd fully expect it to be a loser for seqscans too.
> 
> > I am told on FreeBSD it does not disable read-ahead, just caching;
> > something that needs more research.
> 
> Hmm.  I always thought of read-ahead as preloading buffer cache entries.
> 
> It'd be interesting to get a description of *exactly* what this flag
> does, rather than handwavy approximations.  Time to start reading the
> kernel code, I suppose.

I found this before adding the item:
http://www.pairlist.net/pipermail/flow-tools/2001-October/000058.html

And this for FreeBSD 4.4:

2.1 Kernel Changes
  The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this  flag for open files will attempt to
minimizethe cache effects of reading  and writing.
 


I also found:
http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html

These later ones seem to indicate there isn't read-ahead, meaning we
would have to do our own prefetches.  Eck.  I am unclear if that is true
on all OS's.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: O_DIRECT use

От
Matthew Kirkwood
Дата:
On Fri, 4 Jan 2002, Bruce Momjian wrote:

> > >> For that matter, I would expect that O_DIRECT also defeats readahead,
> > >> so I'd fully expect it to be a loser for seqscans too.

> And this for FreeBSD 4.4:

>    The O_DIRECT flag has been added to open(2) and fcntl(2). Specifying this
>    flag for open files will attempt to minimize the cache effects of reading
>    and writing.

This seems rather vague.  Can any FreeBSD person here say
whether the semantics are any stronger?

>     http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-o_direct.html
>
> These later ones seem to indicate there isn't read-ahead, meaning we
> would have to do our own prefetches.  Eck.  I am unclear if that is
> true on all OS's.

The Linux O_DIRECT semantics are intended to be harder.
In essence, the kernel _will not cache_ data read from
or written to such a file or device.

The point of this, incidentally, was to be able to run
things like Oracle Parallel Server and other shared-
disk setups.  It's use as an "I don't need this cached"
mechanism is secondary, and rather sub-optimal, as seen
here; you disable software read-ahead and introduce
coherence issues with non-O_DIRECT openers of the file.
(I'm not sure of the precise Linux semantics of this,
but it's probably fair to say that you may as well
consider them undefined.)

Linux 2.4 has "madvise", but unfortunately no matching
"fadvise".  A quick Google implied that FreeBSD is in
the same boat.

Matthew.