Обсуждение: Nodes and trees...

Поиск
Список
Период
Сортировка

Nodes and trees...

От
Jason Schauberger
Дата:
Dear fellow Postgres users, :-)

please consider this table:

CREATE TABLE nodes (

id      int     PRIMARY KEY,

parent     int     REFERENCES nodes(id)

);

In this table, each node *can* have a parent node. You can picture the
whole set of rows of this table as one or more trees with nodes and
the root of the tree is the node which has no parent node (that is,
parent is NULL).

Now here's my objective: I want to *quickly* find all nodes that have
the same root node.

I could iterate through the complete tree in question starting from
the root node and going all the way through to the terminal/leaf nodes
(those without a child), but that could take quite long depending on
how large the tree is.

So, what I could do is this:

CREATE TABLE nodes (

id      int     PRIMARY KEY,

parent     int     REFERENCES nodes(id),

root    int     NOT NULL        REFERENCES nodes(id)

);

and fill out the root column every time I insert a node. But then
there is the problem of anomalies, where the root column could have an
incorrect value (that is, the id of a node which is actually NOT the
root of the same tree). I guess I could check for correctness using
triggers.

I also thought that using views might make adding a root column to the
table completely unnecessary and at the same time allow for quickly
finding nodes which have the same root.

So, what's the best method to do this? It must be fast, it must
prevent anomalies, and must also be either SQL standard or if it is
not, at least it must be easily portable across the most popular SQL
databases.  I also explicitly don't want to create an extra tree ID or
something like that, because it only mitigates the problem of
anomalies, but does not solve it.

Thanks in advance,

Jason.

Re: Nodes and trees...

От
Merlin Moncure
Дата:
On Tue, Aug 3, 2010 at 8:01 AM, Jason Schauberger
<crossroads0000@googlemail.com> wrote:
> Dear fellow Postgres users, :-)
>
> please consider this table:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id)
>
> );
>
> In this table, each node *can* have a parent node. You can picture the
> whole set of rows of this table as one or more trees with nodes and
> the root of the tree is the node which has no parent node (that is,
> parent is NULL).
>
> Now here's my objective: I want to *quickly* find all nodes that have
> the same root node.
>
> I could iterate through the complete tree in question starting from
> the root node and going all the way through to the terminal/leaf nodes
> (those without a child), but that could take quite long depending on
> how large the tree is.
>
> So, what I could do is this:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id),
>
> root    int     NOT NULL        REFERENCES nodes(id)
>
> );
>
> and fill out the root column every time I insert a node. But then
> there is the problem of anomalies, where the root column could have an
> incorrect value (that is, the id of a node which is actually NOT the
> root of the same tree). I guess I could check for correctness using
> triggers.

yes, you should absolutely do that -- the role of the database is to
(as much as possible) make anomalies in your data impossible.  If you
go this route, set the root_id node in the trigger.  Postgres triggers
aren't strictly portable, but should be able to be migrated with
minimal effort. Looking it up in the application is possible, but
undesirable IMO.

If you are only specifically interested in the root node, you can
materialize it in the application and index it.  This isn't a general
solution for investigating familial relationships of nodes, but it
doesn't sound like you need something general.

The general solution that is going to be most portable is going
probably going to be a materialized path approach.  You encode the
parents of a node in a string (in postgres you can use an array) in
such away that you can pull out specific parents_ids.  Materialized
approaches have at least one big downside: updates to the tree are
extremely expensive.

Here is a pretty neat way of doing nested data with farey fractions
with SQL examples designed for Oracle -- they should be able to be
ported to postgres without too much effort:
http://arxiv.org/html/cs/0401014.  I've been looking for an excuse to
play with it!

merlin

Re: Nodes and trees...

От
"Igor Neyman"
Дата:

> -----Original Message-----
> From: Jason Schauberger [mailto:crossroads0000@googlemail.com]
> Sent: Tuesday, August 03, 2010 8:02 AM
> To: pgsql-general@postgresql.org
> Subject: Nodes and trees...
>
> Dear fellow Postgres users, :-)
>
> please consider this table:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id)
>
> );
>
> In this table, each node *can* have a parent node. You can
> picture the whole set of rows of this table as one or more
> trees with nodes and the root of the tree is the node which
> has no parent node (that is, parent is NULL).
>
> Now here's my objective: I want to *quickly* find all nodes
> that have the same root node.
>
> I could iterate through the complete tree in question
> starting from the root node and going all the way through to
> the terminal/leaf nodes (those without a child), but that
> could take quite long depending on how large the tree is.
>
> So, what I could do is this:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id),
>
> root    int     NOT NULL        REFERENCES nodes(id)
>
> );
>
> and fill out the root column every time I insert a node. But
> then there is the problem of anomalies, where the root column
> could have an incorrect value (that is, the id of a node
> which is actually NOT the root of the same tree). I guess I
> could check for correctness using triggers.
>
> I also thought that using views might make adding a root
> column to the table completely unnecessary and at the same
> time allow for quickly finding nodes which have the same root.
>
> So, what's the best method to do this? It must be fast, it
> must prevent anomalies, and must also be either SQL standard
> or if it is not, at least it must be easily portable across
> the most popular SQL databases.  I also explicitly don't want
> to create an extra tree ID or something like that, because it
> only mitigates the problem of anomalies, but does not solve it.
>
> Thanks in advance,
>
> Jason.
>

Look up connectby() in tablefuncs contrib module.

Regards,
Igor Neyman

Re: Nodes and trees...

От
David Fetter
Дата:
On Tue, Aug 03, 2010 at 02:01:58PM +0200, Jason Schauberger wrote:
> Dear fellow Postgres users, :-)
>
> please consider this table:
>
> CREATE TABLE nodes (
>
> id      int     PRIMARY KEY,
>
> parent     int     REFERENCES nodes(id)
>
> );

Generally, you'll want to separate the nodes table from the edges
table, as in:

CREATE TABLE nodes (id INTEGER PRIMARY KEY);

CREATE TABLE edges (
    tail INTEGER NOT NULL REFERENCES nodes(id),
    head INTEGER NOT NULL REFERENCES nodes(id),
    PRIMARY KEY(tail, head),
    CHECK (tail <> head)
);

Then you might want to prevent other kinds of issues (more uniqueness,
"must be forest," etc.) with other constraints, but let's not go there
for now.

> In this table, each node *can* have a parent node. You can picture
> the whole set of rows of this table as one or more trees with nodes
> and the root of the tree is the node which has no parent node (that
> is, parent is NULL).
>
> Now here's my objective: I want to *quickly* find all nodes that
> have the same root node.

Given a "root" node, i.e. one which appears only as a tail in the
edges table, you'd do something like this:

WITH descendants AS (
    SELECT head FROM edges WHERE tail=1 /* the root node */
UNION
    SELECT e.head FROM edges e JOIN descendants d ON (e.tail = d.head)
)
SELECT * FROM descendants;

You might want to index edges.tail and edges.head.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate