Обсуждение: Raw device I/O for large objects

Поиск
Список
Период
Сортировка

Raw device I/O for large objects

От
Georgi Chulkov
Дата:
Hello,

I am a graduate student of computer science and I have been looking at 
PostgreSQL for my master's thesis work.

I am looking into implementing raw device I/O for large objects into 
PostgreSQL (maybe for all storage, I'm not sure which would be 
easier/better). I am extremely new to the codebase, however.

Could someone please point me to the right places to look at, and how/where to 
get started? Would such a development be useful at all? Is anyone working on 
anything related?

Any feedback / information would be highly appreciated!

Thanks,
Georgi


Re: Raw device I/O for large objects

От
"Sibte Abbas"
Дата:
On 9/17/07, Georgi Chulkov <godji@metapenguin.org> wrote:
>
> Could someone please point me to the right places to look at, and how/where to
> get started? Would such a development be useful at all? Is anyone working on
> anything related?
>
> Any feedback / information would be highly appreciated!
>

http://www.postgresql.org/docs/techdocs
http://www.postgresql.org/docs/faq/

The postgresql documentation:
http://www.postgresql.org/docs/8.2/interactive/index.html

Also, If you have the source, the src/tools/backend directory has some
useful material for starters.

regards,
--
Sibte Abbas


Re: Raw device I/O for large objects

От
Tom Lane
Дата:
Georgi Chulkov <godji@metapenguin.org> writes:
> I am looking into implementing raw device I/O for large objects into 
> PostgreSQL (maybe for all storage, I'm not sure which would be 
> easier/better).

We've heard this idea proposed before, and it's been shot down as a poor
use of development effort every time.  Check the archives for previous
threads, but the basic argument goes like this: when Oracle et al did
that twenty years ago, it was a good idea because (1) operating systems
tended to have sucky filesystems, (2) performance and reliability
properties of same were not very consistent across platforms, and (3)
being large commercial software vendors they could afford to throw lots
of warm bodies at anything that seemed like a bottleneck.  None of those
arguments holds up well for us today however.  If you think you want to
reimplement a filesystem you need to have some pretty concrete reasons
why you can outsmart all the smart folks who have worked on
your-favorite-OS's filesystems for lo these many years.  There's also
the fact that on any reasonably modern disk hardware, "raw I/O" is
anything but.

My opinion is that there is lots of lower-hanging fruit elsewhere.
You can find some ideas on our TODO list, or troll the pghackers
list archives for other discussions.
        regards, tom lane


Re: Raw device I/O for large objects

От
Georgi Chulkov
Дата:
Hi,

> We've heard this idea proposed before, and it's been shot down as a poor
> use of development effort every time.  Check the archives for previous
> threads, but the basic argument goes like this: when Oracle et al did
> that twenty years ago, it was a good idea because (1) operating systems
> tended to have sucky filesystems, (2) performance and reliability
> properties of same were not very consistent across platforms, and (3)
> being large commercial software vendors they could afford to throw lots
> of warm bodies at anything that seemed like a bottleneck.  None of those
> arguments holds up well for us today however.  If you think you want to
> reimplement a filesystem you need to have some pretty concrete reasons
> why you can outsmart all the smart folks who have worked on
> your-favorite-OS's filesystems for lo these many years.  There's also
> the fact that on any reasonably modern disk hardware, "raw I/O" is
> anything but.

Thanks, I agree with all your arguments.

Here's the reason why I'm looking at raw device storage for large objects only 
(as opposed to all tables): with raw device I/O I can control, to an extent, 
spatial locality. So, if I have an application that wants to store N large 
objects (totaling several gigabytes) and read them back in some order that is 
well-known in advance, I could store my large objects in that order on the 
raw device.* Sequentially reading them back would then be very efficient. 
With a file system underneath, I don't have that freedom. (Such a scenario 
occurs with raster databases, for example.)

* assuming I have a way to communicate these requirements; that's a whole new 
problem

Please allow me to ask then:
1. In your opinion, would the above scenario indeed benefit from a raw-device 
interface for large objects?
2. How feasible it is to decouple general table storage from large object 
storage?

Thank you for your time,

Georgi


Re: Raw device I/O for large objects

От
"Luke Lonergan"
Дата:
<p><font size="2">Index organized tables would do this and it would be a generic capability.<br /><br /> - Luke<br
/><br/> Msg is shrt cuz m on ma treo<br /><br />  -----Original Message-----<br /> From:   Georgi Chulkov [<a
href="mailto:godji@metapenguin.org">mailto:godji@metapenguin.org</a>]<br/> Sent:   Monday, September 17, 2007 11:50 PM
EasternStandard Time<br /> To:     Tom Lane<br /> Cc:     pgsql-hackers@postgresql.org<br /> Subject:        Re:
[HACKERS]Raw device I/O for large objects<br /><br /> Hi,<br /><br /> > We've heard this idea proposed before, and
it'sbeen shot down as a poor<br /> > use of development effort every time.  Check the archives for previous<br />
>threads, but the basic argument goes like this: when Oracle et al did<br /> > that twenty years ago, it was a
goodidea because (1) operating systems<br /> > tended to have sucky filesystems, (2) performance and reliability<br
/>> properties of same were not very consistent across platforms, and (3)<br /> > being large commercial software
vendorsthey could afford to throw lots<br /> > of warm bodies at anything that seemed like a bottleneck.  None of
those<br/> > arguments holds up well for us today however.  If you think you want to<br /> > reimplement a
filesystemyou need to have some pretty concrete reasons<br /> > why you can outsmart all the smart folks who have
workedon<br /> > your-favorite-OS's filesystems for lo these many years.  There's also<br /> > the fact that on
anyreasonably modern disk hardware, "raw I/O" is<br /> > anything but.<br /><br /> Thanks, I agree with all your
arguments.<br/><br /> Here's the reason why I'm looking at raw device storage for large objects only<br /> (as opposed
toall tables): with raw device I/O I can control, to an extent,<br /> spatial locality. So, if I have an application
thatwants to store N large<br /> objects (totaling several gigabytes) and read them back in some order that is<br />
well-knownin advance, I could store my large objects in that order on the<br /> raw device.* Sequentially reading them
backwould then be very efficient.<br /> With a file system underneath, I don't have that freedom. (Such a scenario<br
/>occurs with raster databases, for example.)<br /><br /> * assuming I have a way to communicate these requirements;
that'sa whole new<br /> problem<br /><br /> Please allow me to ask then:<br /> 1. In your opinion, would the above
scenarioindeed benefit from a raw-device<br /> interface for large objects?<br /> 2. How feasible it is to decouple
generaltable storage from large object<br /> storage?<br /><br /> Thank you for your time,<br /><br /> Georgi<br /><br
/>---------------------------(end of broadcast)---------------------------<br /> TIP 1: if posting/reading through
Usenet,please send an appropriate<br />        subscribe-nomail command to majordomo@postgresql.org so that your<br />
      message can get through to the mailing list cleanly<br /></font> 

Re: Raw device I/O for large objects

От
Markus Schiltknecht
Дата:
Hi,

Georgi Chulkov wrote:
> Please allow me to ask then:
> 1. In your opinion, would the above scenario indeed benefit from a raw-device 
> interface for large objects?

No, because file systems also try to do what you outline above. They 
certainly don't split sequential data up into blocks and distribute them 
randomly over the device, at least not without having a pretty good 
reason to do so (with which you'd also have to fight).

The possible gain achievable is pretty minimal, especially in 
conjunction with a (hopefully battery backed) write cache.

> 2. How feasible it is to decouple general table storage from large object 
> storage?

I think that would be the easiest part. I would go for a pluggable 
storage implementation, selectable per tablespace. But then again, I 
wouldn't do it at all. After all, this is what MySQL is doing. And we 
certainly don't want to repeat their mistakes! Or do you know anybody 
who goes like: "Yepee, multiple storages engines to choose from for my 
(un)valuable data, lets put some here and others there...".

Let's optimize the *one* storage engine we have and try to make that 
work well together with the various filesystems it uses. Because 
filesystems are already very good in what they are used for. (And we are 
glad we can use a filesystem and don't need to implement one ourselves).

Regards

Markus



Re: Raw device I/O for large objects

От
Tom Lane
Дата:
Georgi Chulkov <godji@metapenguin.org> writes:
> Here's the reason why I'm looking at raw device storage for large objects only 
> (as opposed to all tables): with raw device I/O I can control, to an extent, 
> spatial locality. So, if I have an application that wants to store N large 
> objects (totaling several gigabytes) and read them back in some order that is 
> well-known in advance, I could store my large objects in that order on the 
> raw device.* Sequentially reading them back would then be very efficient. 
> With a file system underneath, I don't have that freedom. (Such a scenario 
> occurs with raster databases, for example.)

Not sure I buy that argument.  If you have loaded these large objects in
the desired order, then the data will be consecutively located in
pg_largeobject, and if the underlying filesystem is at all sane about
where it extends a growing file, the data will be pretty much
consecutive on disk too.  You could probably get marginal improvements
by cutting out the middleman but I'm not sure there's reason to think
there'd be spectacular improvements.

> Please allow me to ask then:
> 1. In your opinion, would the above scenario indeed benefit from a raw-device 
> interface for large objects?

I don't say it wouldn't benefit.  What I'm questioning is the size of
the benefit compared to the amount of work required to get it.
"Supporting raw I/O" is not some trivial bit of work --- you essentially
have to reimplement your own filesystem, because like it or not you
*do* have to think about space management.  If we went in this direction
we'd be buying into a lot of work, not to mention a lot of ongoing
portability headaches.  So far no one's been able to make a case that
it's worth that level of effort.

> 2. How feasible it is to decouple general table storage from large object 
> storage?

You might try digging into the original POSTGRES sources --- at one time
there were several different large-object APIs.  I'm not sure if they
exposed them just as different sets of access functions or if there was
something more elegant.  My own feeling though is that you probably
don't want to go that way, because with outside-the-database storage you
lose transactional behavior (unless you're up for reinventing that
wheel too).  I'd try replacing md.c, or maybe resurrecting smgr.c as
something that can really switch between more than one underlying
storage manager.
        regards, tom lane


Re: Raw device I/O for large objects

От
Georgi Chulkov
Дата:
Thank you everyone for your valuable input! I will have a look at some other 
part of PostgreSQL, and maybe find something else to do instead.

Best,
Georgi