Обсуждение: Can SQL return a threaded-comment-view result set?
[I also posted this to comp.databases but since I'm actually using PostgreSQL I would be content with a pgsql specific answer.] I think I already know that the answer is that this can't be done, but I'll ask anyways. Suppose you want to use an RDBMS to store messages for a threaded message forum like usenet and then display the messages. A toy table definition (that I've tried to make standards compliant) might look like: create table messages ( message_id integer, in_reply_to integer, created date, author varchar(20), title varchar(30), message varchar(256), primary key (message_id) ); The in_reply_to field, if not null, means that the message is a reply to the message with the message_id it has stored. Suppose now that we populate the database with a 5 message discussion. insert into messages values (1, null, '2003-09-01', 'John', 'Favorite DB?', 'What is your favorite database?'); insert into messages values (2, null, '2003-09-02', 'Mike', 'New DB2 benchmarks', 'I just posted some new DB2 benchmarks.'); insert into messages values (3, 1, '2003-09-03', 'Mike', 'Re: Favorite DB?', 'I\'d say DB2.'); insert into messages values (4, 1, '2003-09-05', 'Dave', 'Re: Favorite DB?', 'I\'m an Oracle man myself.'); insert into messages values (5, 3, '2003-09-07', 'John', 'Re: Favorite DB?', 'DB2? I thought you liked free databases?'); If we rendered an oldest-first threaded view of the discussion it would look like: Author: John Title: Favorite DB? Date: 2003-09-01 What is your favorite database? Author: Mike Title: Re: Favorite DB? Date: 2003-09-03 I'd say DB2. Author: John Title: Re: Favorite DB? Date: 2003-09-07 DB2? I thought you liked free databases?. Author: Dave Title: Re: Favorite DB? Date: 2003-09-05 I'm an Oracle man myself. Author: Mike Title: New DB2 benchmarks Date: 2003-09-02 I just posted some new DB2 benchmarks. My question is: is it possible to use pure SQL to return a result set that would make rendering a threaded view like the above really easy? That is, is there an SQL query that would return something like: i | r | created | auth | title | message | nesting ---+---+------------+------+--------------------+-----------+--------- 1 | | 2003-09-01 | John | Favorite DB? | What is y | 0 3 | 1 | 2003-09-03 | Mike | Re: Favorite DB? | I'd say D | 1 5 | 3 | 2003-09-07 | John | Re: Favorite DB? | DB2? I th | 2 4 | 1 | 2003-09-05 | Dave | Re: Favorite DB? | I'm an Or | 1 2 | | 2003-09-02 | Mike | New DB2 benchmarks | I just po | 0 If I had an SQL query that could return that then it would be very easy to have a computer program print threaded views like the one above. If this can't be done, then do any of you have recommendations about the best way to accomplish this with the least amount of inefficient back-and-forth between the database and, say, Java or some other language? Thank you very much in advance for any answers! This has been a frustrating matter for me. Chris
mvppetlab@yahoo.com (Chris) writes: > My question is: is it possible to use pure SQL to return a result set > that would make rendering a threaded view like the above really easy? What you're really after is a tree structure. This is relatively easy to do with Oracle's CONNECT BY or SQL99's WITH, neither of which we have in Postgres (yet). I think in bog-standard SQL92 you can't readily do it without adding some additional fields to the table. I highly recommend you buy a copy of Joe Celko's "SQL For Smarties". He has two whole chapters on different ways to handle trees in SQL. regards, tom lane
Tom Lane wrote: > mvppetlab@yahoo.com (Chris) writes: >>My question is: is it possible to use pure SQL to return a result set >>that would make rendering a threaded view like the above really easy? > > What you're really after is a tree structure. This is relatively easy > to do with Oracle's CONNECT BY or SQL99's WITH, neither of which we have > in Postgres (yet). I think in bog-standard SQL92 you can't readily do > it without adding some additional fields to the table. > > I highly recommend you buy a copy of Joe Celko's "SQL For Smarties". > He has two whole chapters on different ways to handle trees in SQL. In the short term, if you don't mind a Postgres specific answer, you could also take a look at contrib/tablefunc for a function called connectby(). HTH, Joe
Chris, > Suppose you want to use an RDBMS to store messages for a threaded > message forum like usenet and then display the messages. A toy table > definition (that I've tried to make standards compliant) might look > like: This is not at all a new problem. Tree sturctures in SQL are one of those "classic" problems with many solutions. Joe Celko's "SQL for Smarties, 2nd Edition" has 2 chapters on tree structures. The first thing you should do is read these chapters; otherwise, you won't be able to make an informed decision about what tree structure to use. The main ones are: Relational-table (a table for each level of the tree) Adjacency List (what you described) String-Append (tree in a Text field, as USA:California:SanFrancisco) Nested Set (hard to explain) In a few months, Joe will be publishing a whole book about them, and Joe and I will have an article in the first issue of DotDot covering implementation of a nested set tree using PostgreSQL "data-push" functions. (I would not recommend nested sets for your issue, as the tree does not update quickly, and works poorly for fragmented trees) Right now, you can also check out two PostgreSQL-proprietary tree structure solutions in the /contrib directory of your Postgres source (assuming that you have 7.3 or up): Joe Conway's connectby() in /dblink, which works similar to Oracles' CONNECT BY /ltree , which uses the "string-building" tree stucture technique. -- Josh Berkus Aglio Database Solutions San Francisco
Hello all, I've been meaning to get back to you all but I just haven't had time. Okay, I've got a little bit of time now so here goes.... I received many useful answers from many of you including Tom Lane, Joe Conway, and Josh Berkus. Max Nachlinger in particular on October 5th (which was my birthday) sent me a large amount of threaded discussion forum code of his own. (Nice birthday present, Max. Thank you.) I will be investigating his solution when I have more time since his is almost certainly more efficient than my own. My own solution is a 20-line PL/pgSQL function I put together after reading the 7.3 docs at postgresql.org. It requires no modifications to my original example table definitions other than that I decided to use a 0 value instead of a NULL value for the in_reply_to column when a message isn't a reply, because that way my plpgsql function doesn't have to treat NULL as a special case. In particular, my solution doesn't require a message to keep pointers to its children. If a message is a reply, it simply points to its parent's id via in_reply_to. You can add as many messages as you want with just single simple INSERT statements; you don't have to do any tree-refactoring or updating to the parent. The downside is that while insert speed couldn't be any better and inserting couldn't be any easier, building the threaded view seems rather algorithmically inefficient, and in almost all applications optimising for obtaining the threaded view rather than insert speed is more important. One probably couldn't base even a moderate-load application on this solution, but if one wanted to anyways I suppose an in-memory tree representation could be maintained which allows new messages to be linked into the in-memory tree efficiently as they're inserted into the database, and then whenever the application is shutdown and reloaded it could rebuild that in-memory representation on startup. Or something. And until you run out of memory.... (Also, simply caching the results of queries could be effective if you have many identical queries producing identical results [which my application does] so this solution might not work too bad for me.) For the sake of googlers and like novices reading this, I've adapted my PL/pgSQL function so that it works with the original example I posted. (My real code uses more fields, different types, and has some other subtle differences because there's more than one type of table to consider and there are foreign key constraints.) After loading the below code, evaluating select * from threadview(0, 0); builds a table like the one I wanted in my original posting. ---Chris ================================================= -- This code originally due to Chris Barry, http://www.goodfig.org/feedback -- It's hereby placed in the public domain. These public domain licenses -- usually have some sort of warning about no guarantee of fitness for a particular -- purpose, etc. Well, the below code is DEFINITELY not fit for any purpose! So, -- use it at your own peril. Caveat emptor. -- drop database discussion; create database discussion; \c discussion -- The path to plpgsql.so may need to be edited for your system. create function plpgsql_call_handler() returns opaque as '/usr/local/pgsql/lib/plpgsql.so' language 'c'; create language 'plpgsql' handler plpgsql_call_handler lancompiler 'PL/pgSQL'; create table messages ( message_id integer, in_reply_to integer, created date, author varchar(20), title varchar(30), message varchar(256), primary key (message_id) ); -- A threadrow is the same thing as a row from the messages table -- except a nesting integer has been added so the client knows how -- much to indent the thread message. I'm not sure if there's a -- syntax that makes it unnecessary to duplicate the redundant -- information from the messages table (e.g inheritance). create type threadrow as ( message_id integer, in_reply_to integer, created date, author varchar(20), title varchar(30), message varchar(256), nesting integer ); create or replace function threadview(int, int) returns setof threadrow as ' declare p alias for $1; -- p is the parent i alias for $2; -- i is the indentation (nesting) c threadrow%rowtype; c2 threadrow%rowtype; begin for c in select *, 0 as nesting from messages where in_reply_to = p order by created asc loop c.nesting = i; return next c; for c2 in select * from threadview(c.message_id, i+1) loop return next c2; end loop; end loop; return; end; ' language 'plpgsql'; -- Load the table with some example data: insert into messages values (1, 0, '2003-09-01', 'John', 'Favorite DB?', 'What is your favorite database?'); insert into messages values (2, 0, '2003-09-02', 'Mike', 'New DB2 benchmarks', 'I just posted some new DB2 benchmarks.'); insert into messages values (3, 1, '2003-09-03', 'Mike', 'Re: Favorite DB?', 'I\'d say DB2.'); insert into messages values (4, 1, '2003-09-05', 'Dave', 'Re: Favorite DB?', 'I\'m an Oracle man myself.'); insert into messages values (5, 3, '2003-09-07', 'John', 'Re: Favorite DB?', 'DB2? I thought you liked free databases?');