Обсуждение: What I do with PostgreSQL

Поиск
Список
Период
Сортировка

What I do with PostgreSQL

От
alex avriette
Дата:
This might not be the correct list to send this to, but none of the other
lists seemed appropriate. A friend of mine who uses postgres extensively at
his job suggested I might send y'all a note outlining what we do with it
here. 

In general, I am discouraged from providing specific data to non-employees
about what we do. But Dan (the aforementioned friend) said that you guys
would be interested in knowing what I am currently doing with postgres, so
that you know that its up to the challenges we don¹t often get to put
hardware and software to.

I am working in the publications division of the American Chemical Society.
We are in the process of taking all of our 30+ journals from the last 150 or
so years and digitizing them. This process entails scanning over 2.5 million
pages (though this is really only a rough estimate. It could be much higher)
and digitizing them. Our output is in several formats. First, we have the
input TIFF (from the scans), we have PDF's which we render using Adobe
Capture, XML (which we pay a vendor for), and a proprietary format called
DjVu which is kind of.... Well, its like metadata. Initially, we were using
perl scripts and shell scripts to traverse the entire filesystem looking for
files.

This got rather difficult and was time consuming. My suggestion was to just
use a database for keeping track of stuff. We have something like 27
different instances of oracle running here on 4 or 5 different machines. I
don't know much about our oracle stuff. My solution was to just go download
and install postgres.

Our hardware is a cluster of 3 ultra 10's, a pair of 700-dvd jukeboxes (with
burners), a 2.5tb SAN, 10 DAT tape readers, a pair of dvd-roms, and 2 200gb
disk packs (one for each of our tape-reading suns -- the other one manages
the DVD jukes). We also run capture on four dell poweredge servers running
NT. We run the DjVu software on an additional 3 poweredge servers. That
stuff is NT. The SAN is run on a cluster of 4 sun e 3500's.

I am pumping about 200gb a week through the pg database, and our estimated
database size is something like 4tb by the end of the year.

We populate the database with perl scripts. The sun that runs the dvd jukes
is also our database server. We have shell scripts that look over our data
on the disk, and we use sun's NFS to keep disks between the suns and some
funky Sun smb-esque software to keep disks mounted on the nt boxes.

And that's just the "large" database. I have an additional database that I
am using to store the textual data we receive in the form of
"crystallography information files" (http://www.iucr.org/) which are roughly
6,000 lines long. I have 10,000 of them stored at the moment in the
database, going back to about 1996. As you can tell, this database is going
to get much bigger. At the moment it's living on an Ultra 2 in a 2gb
partition.

In some ways, I am amazed that postgres has stood up to the challenge. In
others, however, I am not in the least surprised. Its a fantastic piece of
software that requires almost no intervention on my part. I talked to one of
our oracle dba's about it. He actually (im not kidding here) did not believe
it could be a database if it did not require maintenance.

I am very happy with postgres and I am glad to provide information about our
setup if you'd like to know anything else.

If you'd like to quote me on the environment if youre interested in putting
something in a FAQ (i.e., "can postgres scale up to > tb scale?"), that¹s
fine as well, but I would like to make sure that it doesn¹t point to ACS and
is not too specific.

Anyhow, thanks for your hard work guys/gals.

alex



Re: What I do with PostgreSQL

От
Lamar Owen
Дата:
On Monday 16 July 2001 14:48, alex avriette wrote:
> Our hardware is a cluster of 3 ultra 10's, a pair of 700-dvd jukeboxes
> (with burners), a 2.5tb SAN, 10 DAT tape readers, a pair of dvd-roms, and 2
> 200gb disk packs (one for each of our tape-reading suns -- the other one
> manages the DVD jukes). We also run capture on four dell poweredge servers
> running NT. We run the DjVu software on an additional 3 poweredge servers.
> That stuff is NT. The SAN is run on a cluster of 4 sun e 3500's.

> I am pumping about 200gb a week through the pg database, and our estimated
> database size is something like 4tb by the end of the year.

> In some ways, I am amazed that postgres has stood up to the challenge. In
> others, however, I am not in the least surprised. Its a fantastic piece of
> software that requires almost no intervention on my part. I talked to one
> of our oracle dba's about it. He actually (im not kidding here) did not
> believe it could be a database if it did not require maintenance.

Can anyone say 'Woof!'?

This is awesome.  Thank you, Alex, for sharing this testimonial -- your 
database sounds like a serious test of 'scalability' no matter which way you 
slice it.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


RE: What I do with PostgreSQL

От
"Darren King"
Дата:
>> I am pumping about 200gb a week through the pg database,
>> and our estimated database size is something like 4tb by
>> the end of the year.
>
> Can anyone say 'Woof!'?

Amen, Lamar.  I was trying to think of something myself besides
'Wow!'...

As a side note, there's a blurb in the July 16, 2001 Interactive Week
about the MySQL AB vs NuSphere spat and the last paragraph of the
article casts a very favorable nod towards PostgreSQL.

I quote (any typos are mine) ...

"Analysts said MySQL must find a way to generate a development community
and support if it wants to compete with another open source database,
PostgreSQL, distributed by Red Hat and Great Bridge."

Article doesn't say who the "analysts" are, but the implication that
MySQL isn't up to competing with PostgreSQL was interesting to my eyes!
:)

Darren