On Thu, 2005-05-12 at 00:34, Nikola Milutinovic wrote:
> Hi all.
>
> This might be OT, especiallz since I do not have the actual data for
> volume and throughput, but can give only "bystander" impression.
>
> I might get on-board a project that deals with a high volume and high
> throughput data crunching task. It will involve data pattern recognition
> and similar stuff. The project is meant to run in Java and use Berkeley
> DB, probably Apache Lucene.
>
> Now, this is the juicy part. Due to high volume and high throughput,
> data is actually stored in ordinary files, while Berkeley DB is used
> only for indexes to that data!
>
> Like I've said, I don't have the figures, but I was told that that was
> the only way to make it work, everything else failed to perform. My
> question, in your oppinion, can PgSQL perform in such a scenario? Using
> JDBC, of course.
>
> I do realize that PgSQL gives a lot of good stuff, but here the speed is
> of essence. The previous project has stripped Java code to the bare
> bones, regarding data structures, just to make it faster.
This sounds like a batch processing job, and those are often handled
much more quickly by hand written perl / java / php / yourfavlanghere.
Is there a need for concurrent updates / selects on these data? If not,
then batch processing and tossing the results into a database may well
be the best way to do it.