Обсуждение: Tree type, how best to impliment?
Hi all I'm working on writing a search engine, and need a branching linked list, or tree. When I did the message board there was a similar need which was over come at the application level. But I don't really want to do it that way this time, and at least one other person has asked about how to implment a tree structure of items. So, what I decided on was a free-form IP-address-like field, instead of a rigid structure 4 levels deep, 255 wide to a level, as is an IP address, I thought to have any number of levels each as wide as need be. How to impliment? I started off with char(255) and figured on some sort of trigger to both generate the next number on insert, and update the parent record of the record just inserted to increment a 'next node-number' field. Whould probably hack the .../contrib/spi/* stuff agian to make the function to do all this. As a tree like structure is occasionally needed by others as well, I am woundering if maybe a better way to impliment it might be as a new data type? Except that I'm not sure of how to do that, or if that would be the best way? Table example (for those who care): CREATE TABLE categories ( category char(30) NOT NULL, pcatid char(255) NOT NULL, cat_id char(255) PRIMARY KEY, nidsufix int4 DEFAULT 1 NOT NULL, UNIQUE ( category, pcatid )); pcatid stands for 'parent category id'. nidsufix stands for 'next id sufix'. So, the very first record will have pcatid = 0, cat_id = 1, nidsufix = 1. If a child record is then inserted, it's pcatid = 1, cat_id = 1.1 (the first '1' is the cat_id of the parent, the second '1' is the nidsufix of the parent), nidsufix = 1 *AND* the parent record (cat_id = 1) has to have it's nidsufix incremented to 2, thus the next child of '1' would have a cat_id of '1.2', a child of that child: cat_id = '1.2.1' and so on. The only limit on both depth and width is the amount of numbers and dots that will fit into a char(255) field. Thanks for any advice Terry Mackintosh <terry@terrym.com> http://www.terrym.com sysadmin/owner Please! No MIME encoded or HTML mail, unless needed. Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4 ------------------------------------------------------------------- Success Is A Choice ... book by Rick Patino, get it, read it!
Hi again I got to thinking that doing a tree address like this, the parent address is all but the last dot-number of the current address, so is redundant, as it can be derived from the current address. So, the new idea looks something like this: CREATE TABLE categories ( category char(30) NOT NULL, cat_id char(255) PRIMARY KEY, nidsufix int4 DEFAULT 1 NOT NULL, UNIQUE ( category, substr(cat_id, 1, lenght(cat_id) - (lenght(cat_id) - strpos(cat_id,'.'))))); Two problems 1. functions can not be called from inside of the UNIQUE constraint? 2. strpos() returns the FIRST '.', and I need the LAST '.'. Is there a similar function that will return the last position of the substring? If both of these can not be resolved, then it will be neccesary to use a parent id field, even though that infomation is contained with in the cat_id field. Thanks Terry Mackintosh <terry@terrym.com> http://www.terrym.com sysadmin/owner Please! No MIME encoded or HTML mail, unless needed. Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4 ------------------------------------------------------------------- Success Is A Choice ... book by Rick Patino, get it, read it!
Terry Mackintosh <terry@terrym.com> writes: > CREATE TABLE categories ( > category char(30) NOT NULL, > pcatid char(255) NOT NULL, > cat_id char(255) PRIMARY KEY, > nidsufix int4 DEFAULT 1 NOT NULL, > UNIQUE ( category, pcatid )); OK, let me get this straight ... 1. cat_id is the unique object identifier for the current table row. You provide an index on it (via PRIMARY KEY) so itcan be used for fast lookup. 2. pcatid is a child node's back-link to its parent node. 3. nidsufix exists to allow easy generation of the next child ID for a given node. 4. category is what? Payload data? It sure doesn't seem related to the tree structure per se. Why is "category, pcatid" unique? This seems to constrain a parent to have only one child per category value --- is that what you want? If so, why not use the category code as the ID suffix, and not have to bother with maintaining a next-ID counter? In theory pcatid is redundant, since you could form it by stripping the last ".xxx" section from cat_id. It might be worth storing anyway to speed up relational queries --- eg you'd doSELECT ... WHERE pcatid = 'something' to find the children of a given node. But without an index for pcatid it's not clear that's a win. If you make a SQL function parent_ID() to strip the textual suffix, then a functional index on parent_ID(cat_id) should be as fast as an indexed pcatid field for searches, and it'd save storage. > The only limit on both depth and width is the amount of numbers and dots > that will fit into a char(255) field. If you use type text instead of a fixed-width char() field, there's no limit to the depth ... and for normal not-too-deep trees it'd save much storage compared to a fixed-width char(255) field... A purely stylistic suggestion: IDs of the form "1.2.3.4" might be mistaken for IP addresses, which of course they ain't. It might save confusion down the road to use a different delimiter. Not slash either unless you want the things to look like filenames ... maybe comma or colon? regards, tom lane
On Tue, 17 Nov 1998, Tom Lane wrote: > Terry Mackintosh <terry@terrym.com> writes: > > CREATE TABLE categories ( > > category char(30) NOT NULL, > > pcatid char(255) NOT NULL, > > cat_id char(255) PRIMARY KEY, > > nidsufix int4 DEFAULT 1 NOT NULL, > > UNIQUE ( category, pcatid )); > > OK, let me get this straight ... > > 1. cat_id is the unique object identifier for the current table row. > You provide an index on it (via PRIMARY KEY) so it can be used for > fast lookup. > 2. pcatid is a child node's back-link to its parent node. > 3. nidsufix exists to allow easy generation of the next child ID for > a given node. Yes to all. > 4. category is what? Payload data? It sure doesn't seem related to > the tree structure per se. Yes, this will be a tree of categories for a search engine, could also be a message id in a threaded message database. Example: Things (1) / \ Big (1.1) Small (1.2) / \ Cars (1.1.1) Boats (1.1.2) > Why is "category, pcatid" unique? This seems to constrain a parent > to have only one child per category value --- is that what you want? Yes. It is very much like a directory structure, one would not want two directories of the same name both off the same point in the file system. > If so, why not use the category code as the ID suffix, and not have to > bother with maintaining a next-ID counter? category is human readable, for display, the id is not, and when deciding what the next child's name should be, if not for the next-ID one would have to go count all the other records that have the same parent. > In theory pcatid is redundant, since you could form it by stripping the > last ".xxx" section from cat_id. It might be worth storing anyway to > speed up relational queries --- eg you'd do > SELECT ... WHERE pcatid = 'something' Yes, I soon realized this :-) but as per my other post, could not figure out how to do this for the UNIQUE constraint. > to find the children of a given node. But without an index for pcatid I had planed to index it. > ... > > The only limit on both depth and width is the amount of numbers and dots > > that will fit into a char(255) field. > > If you use type text instead of a fixed-width char() field, there's no > limit to the depth ... and for normal not-too-deep trees it'd save > much storage compared to a fixed-width char(255) field... Yes, I just was not sure how well indexes work with text fields? > A purely stylistic suggestion: IDs of the form "1.2.3.4" might be > mistaken for IP addresses, which of course they ain't. It might save > confusion down the road to use a different delimiter. Not slash either > unless you want the things to look like filenames ... maybe comma or > colon? Actually a directory structure is probably the closest analogy, and for that reason I had thought about using slashes. I had also though about one field only (text), where the categories would be all chained together delimited with slashes and have a PRIMARY KEY index. That would automate by design all of the above problems. But it would creat new ones:-) Like for many deep records, the table would be much bigger. Also, I wanted to use this same concept in other projects, and a one field approch would only be good (maybe) for this project. And if worked out, this could be usefull to others as well. Thanks, and have a great day Terry Mackintosh <terry@terrym.com> http://www.terrym.com sysadmin/owner Please! No MIME encoded or HTML mail, unless needed. Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4 ------------------------------------------------------------------- Success Is A Choice ... book by Rick Patino, get it, read it!
Terry Mackintosh <terry@terrym.com> writes: > On Tue, 17 Nov 1998, Tom Lane wrote: >> Why is "category, pcatid" unique? This seems to constrain a parent >> to have only one child per category value --- is that what you want? > Yes. > It is very much like a directory structure, one would not want two > directories of the same name both off the same point in the file system. Precisely... >> If so, why not use the category code as the ID suffix, and not have to >> bother with maintaining a next-ID counter? > category is human readable, for display, the id is not, and when deciding > what the next child's name should be, if not for the next-ID one would > have to go count all the other records that have the same parent. Why do you need a next ID at all? Think directory structure. Using your example, this is perfectly valid: Things / \ Big Small / \ Cars Boats The index will work just as well (probably better) with key strings like "Things/Big/Small" as with key strings like "1.1.2". Moreover, you don't need a separate index to enforce uniqueness, and you don't need to update the parent row when adding a child. You do need invented IDs if the category items are not necessarily unique, but it seems your problem does not need that, so why complicate the concept? > Yes, I just was not sure how well indexes work with text fields? I use 'em all the time... > I had also though about one field only (text), where the categories would > be all chained together delimited with slashes and have a PRIMARY KEY > index. That would automate by design all of the above problems. But it > would creat new ones:-) Like for many deep records, the table would be > much bigger. If your higher-level nodes have long category names, then the space savings from using an ID instead of a category name might become interesting. But if you were willing to use a fixed-size char(255) (in fact two of 'em!) in every tuple for IDs, I don't think you can complain about the average space cost of this approach... Another way to approach this is to give each node a unique serial number, and use the serial number as the child's back-link: CREATE TABLE tab ( nodeID int4 serial primary key, parentID int4 not null, -- nodeID of parent node category text not null, unique (parentID, category)); (I may not have the "serial" syntax quite right, but you get the idea.) This approach is very compact, but a row's position in the hierarchy is represented only indirectly --- you have to chase the parentID links if you need to build the full pathname given just the nodeID. You can't lookup a row directly by its "category/subcategory/subsubcategory" path either; you need a query for each level in order to fetch the parent IDs. But you needed that in your original design too, since there's no other way to get the numeric IDs for subcategories. A further space saving is to use the Postgres OIDs as the row identifiers, but that makes it difficult to dump and reload the table, so I don't recommend it. regards, tom lane
Tom Lane wrote: > A purely stylistic suggestion: IDs of the form "1.2.3.4" might be > mistaken for IP addresses, which of course they ain't. It might save > confusion down the road to use a different delimiter. Not slash either > unless you want the things to look like filenames ... maybe comma or > colon? > > regards, tom lane But the 'dot' notation also looks like an entry in an SNMP MIB. I've been thinking about a similar structure for psql storage of a MIB. -- -------------- Mark Hollomon mhh@nortelnetworks.com
Hi On Mon, 23 Nov 1998, Mark Hollomon wrote: > Tom Lane wrote: > > A purely stylistic suggestion: IDs of the form "1.2.3.4" might be > > mistaken for IP addresses, which of course they ain't. It might save > > confusion down the road to use a different delimiter. Not slash either > > unless you want the things to look like filenames ... maybe comma or > > colon? > > > > regards, tom lane > > But the 'dot' notation also looks like an entry in an SNMP MIB. > I've been thinking about a similar structure for psql storage > of a MIB. Well, as it logically mimics a directory structure, I finally went with a '/' and full names instead of numbers. And as there were no takers to help improve my understanding of how to work with the SPI stuff, and as the docs on it are not very extensive, I finally did the referincial integrity stuff for cross linking categories at the application lever, where it was realatively simple to write. 'cross links' == symbolic links in a file system. In fact, logically speaking, it is a file system. I don't know what an 'MIB' is, but if this sort of thing is what you need, then I can work with you, as I already have the details worked out, and I would love to impliment some of the now-app-level-details on the database side, where they belong. Have a great day Terry Mackintosh <terry@terrym.com> http://www.terrym.com sysadmin/owner Please! No MIME encoded or HTML mail, unless needed. Proudly powered by R H Linux 4.2, Apache 1.3, PHP 3, PostgreSQL 6.4 ------------------------------------------------------------------- Success Is A Choice ... book by Rick Patino, get it, read it!
Terry Mackintosh wrote: > > Hi > > Well, as it logically mimics a directory structure, I finally went with a > '/' and full names instead of numbers. Can't say I blame you. > And as there were no takers to help improve my understanding of how to > work with the SPI stuff, and as the docs on it are not very extensive, I > finally did the referincial integrity stuff for cross linking categories > at the application lever, where it was realatively simple to write. > 'cross links' == symbolic links in a file system. In fact, logically > speaking, it is a file system. > > I don't know what an 'MIB' is, but if this sort of thing is what you need, > then I can work with you, as I already have the details worked out, and I > would love to impliment some of the now-app-level-details on the database > side, where they belong. MIB stands for Managemnet Information Base. The Simple Network Management Protocol defines a 'database' of information about the devices that are to managed called a MIB. Each device as a 'OID' that represents a traversal of a tree structure. http://www.dordt.edu:457/NetAdminG/snmpC.smi.html is a very short intro to the tree structure. Each node gets a numerical label and an optional alpha label. The 'official' OID for the node is the numeric labels for all ancestor nodes starting at the root, strung together with dots between them. (sound familiar?). You can string together the alpha labels to create a symbolic OID. If the alpha label is unique, it is even permitted to use just the alpha label of the node of interest. This is actually useful. The top layers of the tree are set by the standards commitees. The local MIB is almost always a proper subtree of the 'internet' node of the standard tree. So, you can start your naming at 'internet' instead of having to alwas specify [ iso org dod internet ... ] We have a PostgreSQL database that keeps our inventory of routers, desktops, etc. The MIB however, is editted by hand. My ultimate goal is to be able to generate the MIB from data stored in the database. I'm afraid I am somewhat time constrained at the moment. Haven't even upgraded to 6.4 yet. My boss seems to think I ought to do what _he_ wants me to do. Oh, well. -------------- Mark Hollomon mhh@nortelnetworks.com