Обсуждение: Clustering, parallelised operating system, super-computing
Hi, I've been told that PostgreSQL and other similar databases don't work well on a parallelised operating system because they make good use of shared memory which does not cross the boundary between nodes in a cluster. So I am wondering if any work is being done to make it possible to have a single database schema that spans a number of hosts? For example, a table on one host/node that has a reference to a table on another host/node with deletes cascading back. e.g. on host A: create table person ( username text not null unique primary key, ... ); on host B: create table race_entry ( person text references person(username) match full on delete cascade on update cascade, ... ); -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/
Brian Modra wrote: > Hi, > I've been told that PostgreSQL and other similar databases don't work > well on a parallelised operating system because they make good use of > shared memory which does not cross the boundary between nodes in a > cluster. > > So I am wondering if any work is being done to make it possible to > have a single database schema that spans a number of hosts? > > For example, a table on one host/node that has a reference to a table > on another host/node with deletes cascading back. > e.g. Not currently. There are some prototypes in development, but those usually have the same database on all the machines and they share the load. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On 14/05/2010, Bruce Momjian <bruce@momjian.us> wrote: > Brian Modra wrote: >> Hi, >> I've been told that PostgreSQL and other similar databases don't work >> well on a parallelised operating system because they make good use of >> shared memory which does not cross the boundary between nodes in a >> cluster. >> >> So I am wondering if any work is being done to make it possible to >> have a single database schema that spans a number of hosts? >> >> For example, a table on one host/node that has a reference to a table >> on another host/node with deletes cascading back. >> e.g. > > Not currently. There are some prototypes in development, but those > usually have the same database on all the machines and they share the > load. I'm trying to solve the problem of firstly distributing the volume of data, and secondarily the load. So far, I'm putting some bulky data onto different hosts, where there is no need to ever do a join. I put a "reference" table onto a host with the data that needs to be joined, then I can select the actual data from the other host by unique IDs after the join has been performed locally. To create a reference with "on delete cascade" across hosts, I create a trigger (after) delete, and in the plpgsql I call dblink to do the remote delete. Similarly, I can do joins in plpgsql with the help of dblink. But, doing joins across hosts certainly does defeat the purpose of "distributing the load". I think that the schema design must be done carefully when distributing data. So it really will be difficult to get this "supercomputer database" right. Maybe the best way to solve this is not to do automatic distribution of the data, but rather to provide tools for implementing distributed references and joins. I'm thinking of working on this as part of "The Karoo Project" Open Source Project I'm working on, and would appreciate comments/support/criticism. Thanks -- Brian Modra Land line: +27 23 5411 462 Mobile: +27 79 69 77 082 5 Jan Louw Str, Prince Albert, 6930 Postal: P.O. Box 2, Prince Albert 6930 South Africa http://www.zwartberg.com/
On Thursday, May 13, 2010 11:51:08 pm Brian Modra wrote: > Maybe the best way to solve this is not to do automatic distribution > of the data, but rather to provide tools for implementing distributed > references and joins. Here's my vote! I'd *LOVE* it if I could do a simple cross-database join (without the ugliness of dblink), it would be just awesome. Two beers for cross-database foreign keys... We already do use dblink extensively with a wrapper. for various reporting functions. Since the cross-database queries are in the minority, it does function as load balancing, even if the cross-joined queries aren't so balanced. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.