Boris Köster wrote:
> Hello friends,
>
> I have a question. Currently I am planning a new project that should
> collect really much data. My question is:
>
> What should I do if the disk-space is not enough? Is there something
> to distribute data over several machines and to collect the data with
> a select statement if required? The informations stored are needed to
> be analyzed.
>
> It is a enterprise-computing project currently in a development and a
> little bit planning phase, I want to use
> postgresql, but how should I handle real mass-data?
>
> Don´t tell me to enhance disk-space, whatever we use it´s not enough.
> We need more than one machine and we need to analyze the data over
> several machines if possible with one select statement... or is there
> a better idea how to handle really much data? Its important to us to
> have realtime-analysis so we can not let the user wait for whatever.
>
> Sorry for this question but I have a problem with that thingie.
Hmm, interesting. I have similar needs. Let's hear what the gurus
have to say. But asking independent of PostgreSQL, what do you want
the RDBMS to do? You probably want a virtual shared disk storage,
such as a RAID system to which you can connect multiple hosts. VMS
clusters have that feature. The disks are independent of the hosts.
But then of course it's non-trivial to use multiple server hosts on
the same database storage. Oracle can do something like that (but
you pay heavy $$$).
So, what is it you want the system to do? Parallelize a single query
over multiple hosts? I wouldn't count on that being available with
PostgreSQL any time soon.
-Gunther
--
Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org