Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?

Поиск
Список
Период
Сортировка
От dg@illustra.com (David Gould)
Тема Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Дата
Msg-id 9803120600.AA26541@hawk.illustra.com
обсуждение исходный текст
Ответ на Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (The Hermit Hacker <scrappy@hub.org>)
Ответы Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-hackers
> Redirected to 'the proper list' - pgsql-hackers@postgresql.org
>
> On Wed, 11 Mar 1998, Chris Albertson wrote:
>
> > Also, is anyone working on storage mangers?  I was thinking that
> > a raw partition manager would be good to have.  Could be faster
> > then one that uses the file system.  Give it two partitions and
> > it could do stripping and gain some real speed.
>
>     stripping can be done from the operating system level to give you
> that 'boost'...and Oracle, in fact, moved away from the raw partition
> level to just using the Unix file system...I believe it would
> overcomplicate the backend, and give a negligible boost in performance, if
> we had to build a 'low level drive interface'...
>
> Marc G. Fournier
> Systems Administrator @ hub.org
> primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

I have had the pleasure to work on the guts of one of the major databases
raw partition storage managers over the last ten years (hint, not my
current domain), and guess what? It implements a file system. And, not a
particularly good filesystem at that. Think about something like "FAT",
but not quite that nice. It was also a major source of pain in that it
was complex, heavily concurrent, and any errors show up as massive data
loss or corruption. Be careful what you wish for.

Most of the supposed benefit comes from integrating the buffer cache
management and the writeahead log so that you can defer or avoid I/O (as
long as the log records get to disk, there is no reason to ever write the
data page unless you need the buffer for something else). You can also
convert random I/O to semi sequential I/O if most writes are done by a page
cleaner or by a checkpoint as this gives you lots of I/O to sort.

I don't know the current state of Postgres so I cannot comment on it, but at
least with Illustra, the lack of a traditional writeahead log style
transaction system was a major performance hit as it forced an fsync at
every commit. A good WAL system gets many commits per log I/O, but
Illusta was stuck with many writes per transaction. If Postgres still does
this (and the recent elimination of time travel suggests that it might not),
it would be well worth fixing.

A last point, the raw disk, implement our own filesystem architecture used
by some systems is much more compelling if the filesystems are slow and
inflexible, and the filesystem caching is ineffective. These things were
more true back in the early 80's when these systems were being designed.
Things are not as bad now, in particular ext2 has quite good performance.

Sorry for the ramble...

-dg


David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
 - I realize now that irony has no place in business communications.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] port/getrusage.c?
Следующее
От: Hal Snyder
Дата:
Сообщение: Re: [HACKERS] port/getrusage.c?