Обсуждение: Clustering, parallelised operating system, super-computing

Поиск
Список
Период
Сортировка

Clustering, parallelised operating system, super-computing

От
Brian Modra
Дата:
Hi,
I've been told that PostgreSQL and other similar databases don't work
well on a parallelised operating system because they make good use of
shared memory which does not cross the boundary between nodes in a
cluster.

So I am wondering if any work is being done to make it possible to
have a single database schema that spans a number of hosts?

For example, a table on one host/node that has a reference to a table
on another host/node with deletes cascading back.
e.g.

on host A:

create table person (
    username text not null unique primary key,
    ...
);


on host B:

create table race_entry (
   person text
       references person(username)
       match full
       on delete cascade
       on update cascade,
   ...
);

--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: Clustering, parallelised operating system, super-computing

От
Bruce Momjian
Дата:
Brian Modra wrote:
> Hi,
> I've been told that PostgreSQL and other similar databases don't work
> well on a parallelised operating system because they make good use of
> shared memory which does not cross the boundary between nodes in a
> cluster.
>
> So I am wondering if any work is being done to make it possible to
> have a single database schema that spans a number of hosts?
>
> For example, a table on one host/node that has a reference to a table
> on another host/node with deletes cascading back.
> e.g.

Not currently.  There are some prototypes in development, but those
usually have the same database on all the machines and they share the
load.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

Re: Clustering, parallelised operating system, super-computing

От
Brian Modra
Дата:
On 14/05/2010, Bruce Momjian <bruce@momjian.us> wrote:
> Brian Modra wrote:
>> Hi,
>> I've been told that PostgreSQL and other similar databases don't work
>> well on a parallelised operating system because they make good use of
>> shared memory which does not cross the boundary between nodes in a
>> cluster.
>>
>> So I am wondering if any work is being done to make it possible to
>> have a single database schema that spans a number of hosts?
>>
>> For example, a table on one host/node that has a reference to a table
>> on another host/node with deletes cascading back.
>> e.g.
>
> Not currently.  There are some prototypes in development, but those
> usually have the same database on all the machines and they share the
> load.

I'm trying to solve the problem of firstly distributing the volume of
data, and secondarily the load.

So far, I'm putting some bulky data onto different hosts, where there
is no need to ever do a join. I put a "reference" table onto a host
with the data that needs to be joined, then I can select the actual
data from the other host by unique IDs after the join has been
performed locally.

To create a reference with "on delete cascade" across hosts, I create
a trigger (after) delete, and in the plpgsql I call dblink to do the
remote delete.

Similarly, I can do joins in plpgsql with the help of dblink.
But, doing joins across hosts certainly does defeat the purpose of
"distributing the load".

I think that the schema design must be done carefully when distributing data.
So it really will be difficult to get this "supercomputer database" right.

Maybe the best way to solve this is not to do automatic distribution
of the data, but rather to provide tools for implementing distributed
references and joins.

I'm thinking of working on this as part of "The Karoo Project" Open
Source Project I'm working on, and would appreciate
comments/support/criticism.
Thanks

--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: Clustering, parallelised operating system, super-computing

От
Benjamin Smith
Дата:
On Thursday, May 13, 2010 11:51:08 pm Brian Modra wrote:
> Maybe the best way to solve this is not to do automatic distribution
> of the data, but rather to provide tools for implementing distributed
> references and joins.

Here's my vote! I'd *LOVE* it if I could do a simple cross-database join
(without the ugliness of dblink), it would be just awesome. Two beers for
cross-database foreign keys...

We already do use dblink extensively with a wrapper. for various reporting
functions. Since the cross-database queries are in the minority, it does
function as load balancing, even if the cross-joined queries aren't so
balanced.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.