Обсуждение: Tree type, how best to impliment?

Поиск
Список
Период
Сортировка

Tree type, how best to impliment?

От
Terry Mackintosh
Дата:
Hi all

I'm working on writing a search engine, and need a branching linked list,
or tree.  When I did the message board there was a similar need which was
over come at the application level.  But I don't really want to do it that
way this time, and at least one other person has asked about how to
implment a tree structure of items.

So, what I decided on was a free-form IP-address-like field, instead of a
rigid structure 4 levels deep, 255 wide to a level, as is an IP address, I
thought to have any number of levels each as wide as need be.

How to impliment?  I started off with char(255) and figured on some sort
of trigger to both generate the next number on insert, and update the
parent record of the record just inserted to increment a 'next
node-number' field.  Whould probably hack the .../contrib/spi/* stuff
agian to make the function to do all this.

As a tree like structure is occasionally needed by others as well, I am
woundering if maybe a better way to impliment it might be as a new data
type?  Except that I'm not sure of how to do that, or if that would be the
best way?

Table example (for those who care):
CREATE TABLE categories (  category       char(30)    NOT NULL,  pcatid         char(255)   NOT NULL,  cat_id
char(255)  PRIMARY KEY,  nidsufix       int4        DEFAULT 1 NOT NULL,  UNIQUE ( category, pcatid ));
 

pcatid stands for 'parent category id'.
nidsufix stands for 'next id sufix'.

So, the very first record will have pcatid = 0, cat_id = 1, nidsufix = 1.
If a child record is then inserted, it's pcatid = 1, cat_id = 1.1 (the
first '1' is the cat_id of the parent, the second '1' is the nidsufix of
the parent), nidsufix = 1 *AND* the parent record (cat_id = 1) has to have
it's nidsufix incremented to 2, thus the next child of '1' would have a
cat_id of '1.2', a child of that child: cat_id = '1.2.1' and so on.
The only limit on both depth and width is the amount of numbers and dots
that will fit into a char(255) field.

Thanks for any advice
Terry Mackintosh <terry@terrym.com>          http://www.terrym.com
sysadmin/owner  Please! No MIME encoded or HTML mail, unless needed.

Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4
-------------------------------------------------------------------
Success Is A Choice ... book by Rick Patino, get it, read it!



Re: [HACKERS] Tree type, how best to impliment?

От
Terry Mackintosh
Дата:
Hi again

I got to thinking that doing a tree address like this, the parent address
is all but the last dot-number of the current address, so is redundant,
as it can be derived from the current address.

So, the new idea looks something like this:
CREATE TABLE categories (  category       char(30)    NOT NULL,  cat_id         char(255)   PRIMARY KEY,  nidsufix
int4        DEFAULT 1 NOT NULL,  UNIQUE ( category, substr(cat_id, 1,     lenght(cat_id) - (lenght(cat_id) -
strpos(cat_id,'.')))));
 

Two problems
1.  functions can not be called from inside of the UNIQUE constraint?
2.  strpos() returns the FIRST '.', and I need the LAST '.'.  Is there a
similar function that will return the last position of the substring?

If both of these can not be resolved, then it will be neccesary to use a
parent id field, even though that infomation is contained with in the
cat_id field.

Thanks
Terry Mackintosh <terry@terrym.com>          http://www.terrym.com
sysadmin/owner  Please! No MIME encoded or HTML mail, unless needed.

Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4
-------------------------------------------------------------------
Success Is A Choice ... book by Rick Patino, get it, read it!



Re: [HACKERS] Tree type, how best to impliment?

От
Tom Lane
Дата:
Terry Mackintosh <terry@terrym.com> writes:
> CREATE TABLE categories (
>    category       char(30)    NOT NULL,
>    pcatid         char(255)   NOT NULL,
>    cat_id         char(255)   PRIMARY KEY,
>    nidsufix       int4        DEFAULT 1 NOT NULL,
>    UNIQUE ( category, pcatid ));

OK, let me get this straight ...

1. cat_id is the unique object identifier for the current table row.  You provide an index on it (via PRIMARY KEY) so
itcan be used for  fast lookup.
 
2. pcatid is a child node's back-link to its parent node.
3. nidsufix exists to allow easy generation of the next child ID for  a given node.
4. category is what?  Payload data?  It sure doesn't seem related to  the tree structure per se.

Why is "category, pcatid" unique?  This seems to constrain a parent
to have only one child per category value --- is that what you want?
If so, why not use the category code as the ID suffix, and not have to
bother with maintaining a next-ID counter?

In theory pcatid is redundant, since you could form it by stripping the
last ".xxx" section from cat_id.  It might be worth storing anyway to
speed up relational queries --- eg you'd doSELECT ... WHERE pcatid = 'something'
to find the children of a given node.  But without an index for pcatid
it's not clear that's a win.  If you make a SQL function parent_ID() to
strip the textual suffix, then a functional index on parent_ID(cat_id)
should be as fast as an indexed pcatid field for searches, and it'd save
storage.

> The only limit on both depth and width is the amount of numbers and dots
> that will fit into a char(255) field.

If you use type text instead of a fixed-width char() field, there's no
limit to the depth ... and for normal not-too-deep trees it'd save
much storage compared to a fixed-width char(255) field...

A purely stylistic suggestion: IDs of the form "1.2.3.4" might be
mistaken for IP addresses, which of course they ain't.  It might save
confusion down the road to use a different delimiter.  Not slash either
unless you want the things to look like filenames ... maybe comma or
colon?
        regards, tom lane


Re: [HACKERS] Tree type, how best to impliment?

От
Terry Mackintosh
Дата:
On Tue, 17 Nov 1998, Tom Lane wrote:

> Terry Mackintosh <terry@terrym.com> writes:
> > CREATE TABLE categories (
> >    category       char(30)    NOT NULL,
> >    pcatid         char(255)   NOT NULL,
> >    cat_id         char(255)   PRIMARY KEY,
> >    nidsufix       int4        DEFAULT 1 NOT NULL,
> >    UNIQUE ( category, pcatid ));
> 
> OK, let me get this straight ...
> 
> 1. cat_id is the unique object identifier for the current table row.
>    You provide an index on it (via PRIMARY KEY) so it can be used for
>    fast lookup.
> 2. pcatid is a child node's back-link to its parent node.
> 3. nidsufix exists to allow easy generation of the next child ID for
>    a given node.

Yes to all.

> 4. category is what?  Payload data?  It sure doesn't seem related to
>    the tree structure per se.

Yes, this will be a tree of categories for a search engine, could also be
a message id in a threaded message database.

Example:             Things (1)                  /       \               Big (1.1)   Small (1.2)            /
\      Cars (1.1.1)   Boats (1.1.2)
 


> Why is "category, pcatid" unique?  This seems to constrain a parent
> to have only one child per category value --- is that what you want?

Yes.
It is very much like a directory structure, one would not want two
directories of the same name both off the same point in the file system.

> If so, why not use the category code as the ID suffix, and not have to
> bother with maintaining a next-ID counter?

category is human readable, for display, the id is not, and when deciding
what the next child's name should be, if not for the next-ID one would
have to go count all the other records that have the same parent.

> In theory pcatid is redundant, since you could form it by stripping the
> last ".xxx" section from cat_id.  It might be worth storing anyway to
> speed up relational queries --- eg you'd do
>     SELECT ... WHERE pcatid = 'something'

Yes, I soon realized this :-) but as per my other post, could not figure
out how to do this for the UNIQUE constraint.

> to find the children of a given node.  But without an index for pcatid

I had planed to index it.

> ...
> > The only limit on both depth and width is the amount of numbers and dots
> > that will fit into a char(255) field.
> 
> If you use type text instead of a fixed-width char() field, there's no
> limit to the depth ... and for normal not-too-deep trees it'd save
> much storage compared to a fixed-width char(255) field...

Yes, I just was not sure how well indexes work with text fields?

> A purely stylistic suggestion: IDs of the form "1.2.3.4" might be
> mistaken for IP addresses, which of course they ain't.  It might save
> confusion down the road to use a different delimiter.  Not slash either
> unless you want the things to look like filenames ... maybe comma or
> colon?

Actually a directory structure is probably the closest analogy, and for
that reason I had thought about using slashes.

I had also though about one field only (text), where the categories would
be all chained together delimited with slashes and have a PRIMARY KEY
index.  That would automate by design all of the above problems.  But it
would creat new ones:-)  Like for many deep records, the table would be
much bigger.

Also, I wanted to use this same concept in other projects, and a one field
approch would only be good (maybe) for this project.  And if worked out,
this could be usefull to others as well.
Thanks, and have a great day
Terry Mackintosh <terry@terrym.com>          http://www.terrym.com
sysadmin/owner  Please! No MIME encoded or HTML mail, unless needed.

Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4
-------------------------------------------------------------------
Success Is A Choice ... book by Rick Patino, get it, read it!



Re: [HACKERS] Tree type, how best to impliment?

От
Tom Lane
Дата:
Terry Mackintosh <terry@terrym.com> writes:
> On Tue, 17 Nov 1998, Tom Lane wrote:
>> Why is "category, pcatid" unique?  This seems to constrain a parent
>> to have only one child per category value --- is that what you want?

> Yes.
> It is very much like a directory structure, one would not want two
> directories of the same name both off the same point in the file system.

Precisely...

>> If so, why not use the category code as the ID suffix, and not have to
>> bother with maintaining a next-ID counter?

> category is human readable, for display, the id is not, and when deciding
> what the next child's name should be, if not for the next-ID one would
> have to go count all the other records that have the same parent.

Why do you need a next ID at all?  Think directory structure.  Using
your example, this is perfectly valid:
                   Things                  /       \               Big        Small            /       \          Cars
  Boats
 

The index will work just as well (probably better) with key strings
like "Things/Big/Small" as with key strings like "1.1.2".  Moreover,
you don't need a separate index to enforce uniqueness, and you don't
need to update the parent row when adding a child.

You do need invented IDs if the category items are not necessarily
unique, but it seems your problem does not need that, so why complicate
the concept?

> Yes, I just was not sure how well indexes work with text fields?

I use 'em all the time...

> I had also though about one field only (text), where the categories would
> be all chained together delimited with slashes and have a PRIMARY KEY
> index.  That would automate by design all of the above problems.  But it
> would creat new ones:-)  Like for many deep records, the table would be
> much bigger.

If your higher-level nodes have long category names, then the space
savings from using an ID instead of a category name might become
interesting.  But if you were willing to use a fixed-size char(255)
(in fact two of 'em!) in every tuple for IDs, I don't think you can
complain about the average space cost of this approach...

Another way to approach this is to give each node a unique serial
number, and use the serial number as the child's back-link:
CREATE TABLE tab (    nodeID        int4    serial primary key,    parentID    int4    not null,  -- nodeID of parent
node   category    text    not null,    unique (parentID, category));
 

(I may not have the "serial" syntax quite right, but you get the idea.)
This approach is very compact, but a row's position in the hierarchy
is represented only indirectly --- you have to chase the parentID links
if you need to build the full pathname given just the nodeID.  You can't
lookup a row directly by its "category/subcategory/subsubcategory" path
either; you need a query for each level in order to fetch the parent
IDs.  But you needed that in your original design too, since there's no
other way to get the numeric IDs for subcategories.

A further space saving is to use the Postgres OIDs as the row
identifiers, but that makes it difficult to dump and reload the table,
so I don't recommend it.
        regards, tom lane


Re: [HACKERS] Tree type, how best to impliment?

От
"Mark Hollomon"
Дата:
Tom Lane wrote:
> A purely stylistic suggestion: IDs of the form "1.2.3.4" might be
> mistaken for IP addresses, which of course they ain't.  It might save
> confusion down the road to use a different delimiter.  Not slash either
> unless you want the things to look like filenames ... maybe comma or
> colon?
> 
>                         regards, tom lane

But the 'dot' notation also looks like an entry in an SNMP MIB.
I've been thinking about a similar structure for psql storage
of a MIB.

-- 

--------------
Mark Hollomon
mhh@nortelnetworks.com


Re: [HACKERS] Tree type, how best to impliment?

От
Terry Mackintosh
Дата:
Hi

On Mon, 23 Nov 1998, Mark Hollomon wrote:

> Tom Lane wrote:
> > A purely stylistic suggestion: IDs of the form "1.2.3.4" might be
> > mistaken for IP addresses, which of course they ain't.  It might save
> > confusion down the road to use a different delimiter.  Not slash either
> > unless you want the things to look like filenames ... maybe comma or
> > colon?
> > 
> >                         regards, tom lane
> 
> But the 'dot' notation also looks like an entry in an SNMP MIB.
> I've been thinking about a similar structure for psql storage
> of a MIB.

Well, as it logically mimics a directory structure, I finally went with a
'/' and full names instead of numbers.
And as there were no takers to help improve my understanding of how to
work with the SPI stuff, and as the docs on it are not very extensive, I
finally did the referincial integrity stuff for cross linking categories 
at the application lever, where it was realatively simple to write.
'cross links' == symbolic links in a file system.  In fact, logically
speaking, it is a file system.

I don't know what an 'MIB' is, but if this sort of thing is what you need,
then I can work with you, as I already have the details worked out, and I
would love to impliment some of the now-app-level-details on the database
side, where they belong.

Have a great day
Terry Mackintosh <terry@terrym.com>          http://www.terrym.com
sysadmin/owner  Please! No MIME encoded or HTML mail, unless needed.

Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4
-------------------------------------------------------------------
Success Is A Choice ... book by Rick Patino, get it, read it!



Re: [HACKERS] Tree type, how best to impliment?

От
"Mark Hollomon"
Дата:
Terry Mackintosh wrote:
> 
> Hi
> 
> Well, as it logically mimics a directory structure, I finally went with a
> '/' and full names instead of numbers.

Can't say I blame you.

> And as there were no takers to help improve my understanding of how to
> work with the SPI stuff, and as the docs on it are not very extensive, I
> finally did the referincial integrity stuff for cross linking categories
> at the application lever, where it was realatively simple to write.
> 'cross links' == symbolic links in a file system.  In fact, logically
> speaking, it is a file system.
> 
> I don't know what an 'MIB' is, but if this sort of thing is what you need,
> then I can work with you, as I already have the details worked out, and I
> would love to impliment some of the now-app-level-details on the database
> side, where they belong.

MIB stands for Managemnet Information Base.
The Simple Network Management Protocol defines a 'database' of
information
about the devices that are to managed called a MIB. Each device as a
'OID'
that represents a traversal of a tree structure.
http://www.dordt.edu:457/NetAdminG/snmpC.smi.html
is a very short intro to the tree structure.
Each node gets a numerical label and an optional alpha label. The
'official'
OID for the node is the numeric labels for all ancestor nodes starting
at
the root, strung together with dots between them. (sound familiar?).
You can string together the alpha labels to create a symbolic OID. If
the
alpha label is unique, it is even permitted to use just the alpha label
of
the node of interest. This is actually useful. The top layers of the
tree
are set by the standards commitees. The local MIB is almost always a
proper
subtree of the 'internet' node of the standard tree. So, you can start
your
naming at 'internet' instead of having to alwas specify [ iso org dod
internet ... ]

We have a PostgreSQL database that keeps our inventory of routers,
desktops,
etc. The MIB however, is editted by hand. My ultimate goal is to be able
to generate the MIB from data stored in the database.

I'm afraid I am somewhat time constrained at the moment. Haven't even
upgraded
to 6.4 yet. My boss seems to think I ought to do what _he_ wants me to
do.
Oh, well.


--------------
Mark Hollomon
mhh@nortelnetworks.com