Re: use BLOBS or use pointers to files on the filesystem?

Поиск
Список
Период
Сортировка
От Eric Ridge
Тема Re: use BLOBS or use pointers to files on the filesystem?
Дата
Msg-id D3ADE25911614840BC69C72E3171E4ED24865B@tcdiexch.tcdi.com
обсуждение исходный текст
Ответ на use BLOBS or use pointers to files on the filesystem?  ("Travis Bear" <ec9098@no.spam.go.com>)
Список pgsql-general
> > I'm kind of a database newbie in general,  so be nice to me if this
> > is a bonehead question.  I am building a java servlet based web
> > application to manage photos and images.  My first thougth has been
> > to store the images as BLOBs in the database, but I read
> one user say
> > that you're better off not using BLOBs.  Instead, he says,
> store your
> > large data objects as files on the filesystem, and just use the
> > database to store the path to the file and any file-related meta
> > data.

I remember that thread!  That was me that basically said, "store your
files on the filesystem, and your data in a database."

In my world, we manage in the upwards of 18.5 million images.  We have a
small database table with about 5 fields (about 4.5gig of data) that
tell us where the images are (over 2.3 terrabytes).

So you see, we can keep our database server storage "small", and our
image farm storage "big".  We can upgrade our database server w/o
bringing our images offline.  We can upgrade our image farm w/o bringing
the database offline.

Another advantage is that storing pointers allows your images to live on
a server that isn't yours!

id   | path
------------
2312 | file:///c:/images/you.gif
2313 | http://www.somesite.com/images/me.gif
2314 | ftp://www.somesite.com/images/me2.gif

As long as your db application understands the protocols you can point
to images anywhere in the known universe.

> Compared to retrieving stuff directly from the database, this would be

absolutely it's going to be faster!  Postgres, and most other db's for
that matter, aren't designed to store files.  They're not filesystems,
they're databases!

> significantly faster. You could use a caching mechanism though to make
> that disadvantage go away (have a look at www.phpbuilder.com for
> examples on how to do that in PHP - which you might draw upon for your
> Java solution).

caching is good.  esp. if the originals are stored in a far away place.
saves lots of round-trips.

>
> The downside of having information in the filesystem (your
> images), and meta-information about it in the database (the path to
the
> images etc.) is that you have to work much harder to make your
application
> detect and fix concurrency issues: If any other application besides
your Servlet

The upside is that you have to work harder to make your application
detect and fix concurrency issues.

> can access the filesystem (which is pretty normal), those others might
> modify it without modifying the database.

This is a design decision to make up front:  "Can (read: should) the
filesystem be modified independently of the database?".  I say YES:

It's 4:45pm and Travis just received 10 CD's of images from a client
that need to be loaded into his database by 6:00pm.  Each CD contains
10,000 images.  (Fourtunality, the CD's contain an index file that
closely match his table schema).  So while Travis is writing a little
perl script to massage the index files into a bunch of INSERT
statements, his assistant can begin copying the CD's over to the image
server (or directory).

And because Travis' storage and data systems are decoupled, he gets the
job done on time and the client is happy.

> Then you have a filesystem out of sync with the database. Once you
have that problem, it
> would be hard to recover from. (Could anyone explain that better? I
can't seem to
> find the right words to make it really clear.)

Or, it could be part of the plan.

Sure, storing the images as blobs is going to be easier from a
development standpoint.  But the easy solution isn't always the best
solution.

eric

В списке pgsql-general по дате отправления:

Предыдущее
От: "Robert B. Easter"
Дата:
Сообщение: Re: What is the practical limitation of no multi-threading?
Следующее
От: "Nick Fankhauser"
Дата:
Сообщение: Re: Remote Access to pgsql DB ???