Обсуждение: Nodes and trees...
Dear fellow Postgres users, :-) please consider this table: CREATE TABLE nodes ( id int PRIMARY KEY, parent int REFERENCES nodes(id) ); In this table, each node *can* have a parent node. You can picture the whole set of rows of this table as one or more trees with nodes and the root of the tree is the node which has no parent node (that is, parent is NULL). Now here's my objective: I want to *quickly* find all nodes that have the same root node. I could iterate through the complete tree in question starting from the root node and going all the way through to the terminal/leaf nodes (those without a child), but that could take quite long depending on how large the tree is. So, what I could do is this: CREATE TABLE nodes ( id int PRIMARY KEY, parent int REFERENCES nodes(id), root int NOT NULL REFERENCES nodes(id) ); and fill out the root column every time I insert a node. But then there is the problem of anomalies, where the root column could have an incorrect value (that is, the id of a node which is actually NOT the root of the same tree). I guess I could check for correctness using triggers. I also thought that using views might make adding a root column to the table completely unnecessary and at the same time allow for quickly finding nodes which have the same root. So, what's the best method to do this? It must be fast, it must prevent anomalies, and must also be either SQL standard or if it is not, at least it must be easily portable across the most popular SQL databases. I also explicitly don't want to create an extra tree ID or something like that, because it only mitigates the problem of anomalies, but does not solve it. Thanks in advance, Jason.
On Tue, Aug 3, 2010 at 8:01 AM, Jason Schauberger <crossroads0000@googlemail.com> wrote: > Dear fellow Postgres users, :-) > > please consider this table: > > CREATE TABLE nodes ( > > id int PRIMARY KEY, > > parent int REFERENCES nodes(id) > > ); > > In this table, each node *can* have a parent node. You can picture the > whole set of rows of this table as one or more trees with nodes and > the root of the tree is the node which has no parent node (that is, > parent is NULL). > > Now here's my objective: I want to *quickly* find all nodes that have > the same root node. > > I could iterate through the complete tree in question starting from > the root node and going all the way through to the terminal/leaf nodes > (those without a child), but that could take quite long depending on > how large the tree is. > > So, what I could do is this: > > CREATE TABLE nodes ( > > id int PRIMARY KEY, > > parent int REFERENCES nodes(id), > > root int NOT NULL REFERENCES nodes(id) > > ); > > and fill out the root column every time I insert a node. But then > there is the problem of anomalies, where the root column could have an > incorrect value (that is, the id of a node which is actually NOT the > root of the same tree). I guess I could check for correctness using > triggers. yes, you should absolutely do that -- the role of the database is to (as much as possible) make anomalies in your data impossible. If you go this route, set the root_id node in the trigger. Postgres triggers aren't strictly portable, but should be able to be migrated with minimal effort. Looking it up in the application is possible, but undesirable IMO. If you are only specifically interested in the root node, you can materialize it in the application and index it. This isn't a general solution for investigating familial relationships of nodes, but it doesn't sound like you need something general. The general solution that is going to be most portable is going probably going to be a materialized path approach. You encode the parents of a node in a string (in postgres you can use an array) in such away that you can pull out specific parents_ids. Materialized approaches have at least one big downside: updates to the tree are extremely expensive. Here is a pretty neat way of doing nested data with farey fractions with SQL examples designed for Oracle -- they should be able to be ported to postgres without too much effort: http://arxiv.org/html/cs/0401014. I've been looking for an excuse to play with it! merlin
> -----Original Message----- > From: Jason Schauberger [mailto:crossroads0000@googlemail.com] > Sent: Tuesday, August 03, 2010 8:02 AM > To: pgsql-general@postgresql.org > Subject: Nodes and trees... > > Dear fellow Postgres users, :-) > > please consider this table: > > CREATE TABLE nodes ( > > id int PRIMARY KEY, > > parent int REFERENCES nodes(id) > > ); > > In this table, each node *can* have a parent node. You can > picture the whole set of rows of this table as one or more > trees with nodes and the root of the tree is the node which > has no parent node (that is, parent is NULL). > > Now here's my objective: I want to *quickly* find all nodes > that have the same root node. > > I could iterate through the complete tree in question > starting from the root node and going all the way through to > the terminal/leaf nodes (those without a child), but that > could take quite long depending on how large the tree is. > > So, what I could do is this: > > CREATE TABLE nodes ( > > id int PRIMARY KEY, > > parent int REFERENCES nodes(id), > > root int NOT NULL REFERENCES nodes(id) > > ); > > and fill out the root column every time I insert a node. But > then there is the problem of anomalies, where the root column > could have an incorrect value (that is, the id of a node > which is actually NOT the root of the same tree). I guess I > could check for correctness using triggers. > > I also thought that using views might make adding a root > column to the table completely unnecessary and at the same > time allow for quickly finding nodes which have the same root. > > So, what's the best method to do this? It must be fast, it > must prevent anomalies, and must also be either SQL standard > or if it is not, at least it must be easily portable across > the most popular SQL databases. I also explicitly don't want > to create an extra tree ID or something like that, because it > only mitigates the problem of anomalies, but does not solve it. > > Thanks in advance, > > Jason. > Look up connectby() in tablefuncs contrib module. Regards, Igor Neyman
On Tue, Aug 03, 2010 at 02:01:58PM +0200, Jason Schauberger wrote: > Dear fellow Postgres users, :-) > > please consider this table: > > CREATE TABLE nodes ( > > id int PRIMARY KEY, > > parent int REFERENCES nodes(id) > > ); Generally, you'll want to separate the nodes table from the edges table, as in: CREATE TABLE nodes (id INTEGER PRIMARY KEY); CREATE TABLE edges ( tail INTEGER NOT NULL REFERENCES nodes(id), head INTEGER NOT NULL REFERENCES nodes(id), PRIMARY KEY(tail, head), CHECK (tail <> head) ); Then you might want to prevent other kinds of issues (more uniqueness, "must be forest," etc.) with other constraints, but let's not go there for now. > In this table, each node *can* have a parent node. You can picture > the whole set of rows of this table as one or more trees with nodes > and the root of the tree is the node which has no parent node (that > is, parent is NULL). > > Now here's my objective: I want to *quickly* find all nodes that > have the same root node. Given a "root" node, i.e. one which appears only as a tail in the edges table, you'd do something like this: WITH descendants AS ( SELECT head FROM edges WHERE tail=1 /* the root node */ UNION SELECT e.head FROM edges e JOIN descendants d ON (e.tail = d.head) ) SELECT * FROM descendants; You might want to index edges.tail and edges.head. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate