Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Дата
Msg-id 199803122026.PAA05918@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (ocie@paracel.com)
Список pgsql-hackers
>
> Bruce Momjian wrote:
> >
> > > I have had the pleasure to work on the guts of one of the major databases
> > > raw partition storage managers over the last ten years (hint, not my
> > > current domain), and guess what? It implements a file system. And, not a
> > > particularly good filesystem at that. Think about something like "FAT",
> > > but not quite that nice. It was also a major source of pain in that it
> > > was complex, heavily concurrent, and any errors show up as massive data
> > > loss or corruption. Be careful what you wish for.
> >
> > Interesting.
>
> Perhaps we could:
>
> a) Incorporate an existing filesystem into the code (ext2?).  By
> Incorporate, I mean that we would just take the latest version of the
> code and link it into the executable, or into a library and make calls
> to some of the lower level access and allocation routines.
>
> b) suggest that for higher performance, the user should format the
> disk partition with ext2 (or whatever) and turn off caching and set the
> block size to the maximum possible.
>
> I know for a fact that ext2 lets the user select the block size, and
> it looks like Linux at least supports a sync mount option which makes
> all I/O to this FS synchronous (which I assume would turn off write
> caching at least).  If caching could be disabled, then option b would
> seem to provide performance equivalent to a.

I checked, and under BSD/OS, the readahead call for ufs looks like:

            error = breadn(vp, lbn, size,
                (daddr_t *)&nextlbn, &nextsize, 1, NOCRED, &bp);

The '1' is requesting a read-ahead of one block past the requested
block.  Clearly this is not tunable, tough a read-ahead of one is not a
significant performance problem.  In most cases, the block was already
read as part of the disk scan, but this gives us the next block in cases
where we are reading sequentially.

The sync option is not really desired because we do our own syncs on
transaction completion.  Don't want a sync on every write.  Don't think
you can disable caching.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: ocie@paracel.com
Дата:
Сообщение: Re: [HACKERS] PL/pgSQL - for discussion
Следующее
От: dg@illustra.com (David Gould)
Дата:
Сообщение: Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?