Обсуждение: In need of help with message fetching query
Hello everyone.
I'm working on a small project of mine, which basically revolves
around messages.
These messages are to be ordered in a very standard fashion of single-depth
threads. That is, a message can be a reply, and a so-called "head".
A head is simply the head of a chain. To put it more eloquently, the table is:
create table messages (
id serial not null primary key,
author_id integer null references account (id) deferrable
initially deferred,
text varchar(200) not null,
timestamp timestamp with time zone not null,
reply_to_id integer null,
reply_to_account_id integer not null references account (id)
deferrable initially deferred,
unique (account_id, reply_to_id)
);
That is, a message is made by an account, and *can* be made in reply
to another message. If it is, reply_to_account_id will equal the
author_id of the corresponding row where reply_to_id = id.
So, for example, a simple case:
#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1
Would be:
Hello world!
+-- Hey there.
+-- 'Sup?
This is a simple case -- the head in this particular instance is #1.
Now, given the above, imagine we were to have:
#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1
#4 by:A1 "Not much!" reply_to:#3
It immediately becomes more complicated -- #4 is a reply to #3, which
is a reply to #1. The desired resultset would be:
#3 by:A3 "'Sup?" reply_to:#1
#4 by:A1 "Not much!" reply_to:#3
#1 by:A1 "Hello world!" reply_to:null
#2 by:A2 "Hey there." reply_to:#1
#3 by:A3 "'Sup?" reply_to:#1
Now, #3 has become a head in its first appearance. In the second, at
the end, it is a mere reply. Thus:
'Sup?
+-- Not much!
Hello world!
+-- Hey there.
+-- 'Sup?
So the head is really "has one or more replies, or reply_to is null."
The query could thus be summed up into, "Get all rows which have
reply_to_id null, or at least one other row refers to in its
reply_to_id. Then, for every such row, include all messages which have
reply_to_id = current row's id"
It is entirely possible that this results in two queries -- one for
fetching the heads, one for fetching all replies.
Further, there are other constraints which would appear trivial: the
heads fetching part will have a limit, and a where clause telling it
"get things older than this", as well as "get rows with author_id in
(x, y, ...)".
There is also a very quirky issue -- the sorting. A head's placement
in the resultset is dependant on its latest reply. That is, if a new
row is added to our table:
#5 by:A4 "O hi!" reply_to:#1
With its timestamp set to the current time, this will result in the #1
"chain" getting bubbled up to the top!
I implemented this as: order by (select b.timestamp from messages b
where reply_to_id = o.id order by timestamp desc limit 1) desc
Obviously, this isn't a very cheap query at all! Even with me being
humble, my hardware simply won't cut it after some amount of messages.
So, I think that sums my problem up fairly well. I've been trying to
get this to work the way I want for days, but I'm simply not skilled
enough in SQL to make this work. I did read manuals, and I think I
really exhausted most of my options.
(As an aside: while I deploy using PostgreSQL (what else?), I also use
SQLite on my MacBook, but I could give that up.)
On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote:
> [...] messages are to be ordered in a very standard fashion of single-depth
> threads. That is, a message can be a reply, and a so-called "head".
>
> create table messages (
> id serial not null primary key,
> author_id integer null references account,
> text varchar(200) not null,
> timestamp timestamp not null,
> reply_to_id integer null,
> reply_to_account_id integer not null references account,
> unique (account_id, reply_to_id)
> );
I'd almost advocate having a having another table, and thus breaking
various normalization rules, to make your life easier:
CREATE TABLE heads (
head_message_id INTEGER PRIMARY KEY REFERENCES messages,
last_reply TIMESTAMP NOT NULL
);
You can have a trigger to maintain this table, and maybe even one
to insert things as well. Not sure about the other rules you want,
but if a "thread" can only involve two people (obviously for email
conversions this isn't true) then you could move the author_id and
reply_to_account_id into this table as well. If you did this you could
move other attributes around and get things normalized again.
Does that help with ideas?
--
Sam http://samason.me.uk/
Hey, thanks for your quick response On Wed, Feb 11, 2009 at 12:00 AM, Sam Mason <sam@samason.me.uk> wrote: > On Tue, Feb 10, 2009 at 11:19:13PM +0100, Max Strrrmberg wrote: >> [...] messages are to be ordered in a very standard fashion of single-depth >> threads. That is, a message can be a reply, and a so-called "head". >> >> create table messages ( >> id serial not null primary key, >> author_id integer null references account, >> text varchar(200) not null, >> timestamp timestamp not null, >> reply_to_id integer null, >> reply_to_account_id integer not null references account, >> unique (account_id, reply_to_id) >> ); > > I'd almost advocate having a having another table, and thus breaking > various normalization rules, to make your life easier: > > CREATE TABLE heads ( > head_message_id INTEGER PRIMARY KEY REFERENCES messages, > last_reply TIMESTAMP NOT NULL > ); > > You can have a trigger to maintain this table, and maybe even one > to insert things as well. Not sure about the other rules you want, > but if a "thread" can only involve two people (obviously for email > conversions this isn't true) then you could move the author_id and > reply_to_account_id into this table as well. If you did this you could > move other attributes around and get things normalized again. > > Does that help with ideas? You make an interesting point. So, whenever a row is inserted into messages, I would want to do something like `update or insert into heads for id = coalesce(replied_to, own_id), set last_reply = own_timestamp.` Correct? Though I might've been crap at explaining it, I think there's an essential part missing here: a segment of the tree could be up for fetch because it has a reply from a certain account. That's why I chose to denormalize reply_to_account_id. So... to get an equivalent non-NF table, I would want one row in heads per reply? :/ And update each row with last_reply. Seems to me that is getting unwieldy pretty fast. Hrm... > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >