Re: Optimize query for listing un-read messages

Поиск
Список
Период
Сортировка
От Vitalii Tymchyshyn
Тема Re: Optimize query for listing un-read messages
Дата
Msg-id CABWW-d1z1qEJOPAZTGfQ0w+yG4SBH2T_ikHC7THeMUUsVCfgow@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimize query for listing un-read messages  (Andreas Joseph Krogh <andreas@visena.com>)
Список pgsql-performance
What statistics do you have on the data? I suppose most messages are read by low number of users, mostly 0 or one.
I can see two options to consider:
1) Use arrays to store information on which users have already read the message. You may need GIN/GIST index to search fast.
2) Introduce some kind of special column(s) for the cases when the message is unread by everybody or was read by at most one user. E.g. read_by columns with null value for unread, special value for read by many and real user if read by only one.
in this case your condition would be (read_by is null or read_by not in (current_user or special_value) or (read_by = special_value and not exists()). Note that optimizer may have problems with such a complex expression nd you may need to use "union all" instead on "or". Partial index(es) for null/special value may help.

Best regards, Vitalii Tymchyshyn


2014-05-02 10:20 GMT+03:00 Andreas Joseph Krogh <andreas@visena.com>:
På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James <cjames@emolecules.com>:
On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
I have a schema where I have lots of messages and some users who might have read some of them. When a message is read by a user I create an entry i a table message_property holding the property (is_read) for that user.
 
The schema is as follows:
[...]
 
create table person(
    id serial primary key,
    username varchar not null unique
);
 
create table message(
    id serial primary key,
    subject varchar
);
 
create table message_property(
    message_id integer not null references message(id),
    person_id integer not null references person(id),
    is_read boolean not null default false,
    unique(message_id, person_id)
);
 
[...]
 So, for person 1 there are 10 unread messages, out of a total 1mill. 5 of those unread does not have an entry in message_property and 5 have an entry and is_read set to FALSE.
 
Here's a possible enhancement: add two columns, an indexed timestamp to the message table, and a "timestamp of the oldest message this user has NOT read" on the person table. If most users read messages in a timely fashion, this would (in most cases) narrow down the portion of the messages table to a tiny fraction of the total -- just those messages newer than the oldest message this user has not read.
 
When you sign up a new user, you can set his timestamp to the time the account was created, since presumably messages before that time don't apply.
 
Whether this will help depends a lot on actual use patterns, i.e. do users typically read all messages or do they leave a bunch of unread messages sitting around forever?
 
Thanks fort the suggestion. A user must be able to read arbitrary old messages, and messages don't expire.
 
--
Andreas Jospeh Krogh
CTO / Partner - Visena AS
 

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Andreas Joseph Krogh
Дата:
Сообщение: Re: Optimize query for listing un-read messages
Следующее
От: David G Johnston
Дата:
Сообщение: Re: PostgreSQL's query planner is using the wrong index, what can I do to improve this situation?