Re: Indexing queries with bit masks

Поиск
Список
Период
Сортировка
От Filip Rembiałkowski
Тема Re: Indexing queries with bit masks
Дата
Msg-id r2x92869e661005011325z99373ab8v11dc66182fa7a99@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Indexing queries with bit masks  (Mike Christensen <mike@kitchenpc.com>)
Ответы Re: Indexing queries with bit masks  (Mike Christensen <mike@kitchenpc.com>)
Список pgsql-general
2010/4/30 Mike Christensen <mike@kitchenpc.com>:
> Ok I've been blatantly lying, err, purposely simplifying the problem for the
> sake of the original email :)
>
> I've read over the responses, and am actually now considering just not using
> any index at all.  Here's why:
>
> First, this actually isn't the only thing on the WHERE clause.  It will only
> query for users who are "friends" with you so it can notify them of your
> activities.  That's done via a weird JOIN on a table that holds all the
> friend relationships.  So in reality, it will only load maybe a hundred
> rows, or maybe a thousand every once in a while if you're way popular.  If
> I'm not mistaken, it should use the index to narrow it down to the list of
> friends, and then use a sequential scan to weed out the ones who subscribe
> to that type of notification.
>
> Second, the only thing /ever/ that will do this query is the queue service
> whose job it is to process notifications (which are files dropped on the
> file system) and email people all day long.  This service handles one job at
> a time, and could potentially run on its own machine with its own read-only
> copy of the database.  Thus, even if it was a fairly slow query, it's not
> gonna bring down the rest of the site.
>
> Regarding the idea of putting an index on each bit, I thought about this
> earlier as well as just kinda cringed.  The users table gets updated quite a
> bit (last logon, session id, any time they change their profile info,
> etc)..  Too many indexes is bad.  I could just put the data in another table
> of course, which lead me to another idea.  Have a table called Subscriptions
> and have each row hold a user id and a notification type.  I could index
> both, and join on (Subscriptions.UserId = Users.UserId AND
> Subscriptions.Type = 8).  This would be pretty dang fast, however updates
> are kinda a royal pain.  When the user changes which types of subscriptions
> they want (via a list of checkboxes), I'd have to figure out which rows to
> delete and which new ones to insert.  However, I think I have an idea in
> mind for a PgSQL function you pass in the bitmask to and then it
> "translates" it to conditional deletes and inserts.
>
> A third idea I'm tossing around is just not worry about it.  Put the bitmask
> in the DB, but not filter on it.  Every "friend" would be loaded into the
> dataset, but the queue processor would just "skip" rows if they didn't
> subscribe to that event.  In other words, move the problem down to the
> business layer.  The drawback is potentially large number of rows are
> loaded, serialized, etc into memory that will just be ignored.  But of
> course the DB is probably a read-only copy and it's not even close to the
> bottle neck of the email queue under heavy load, so it's probably a
> non-issue.  If mailing is slow, I just add more queue services..
>
> I'm exploring all these ideas.  I predict using the bitwise AND on the where
> clause isn't gonna be the worst design ever, and it's sure easier to
> implement than a table of subscriptions.  What do you guys think?

I would say "normalize". Which means that I like your "separate table"
idea best.
It's clear, obvious, and 3NF - conforming solution.
Changing the set of subscriptions with delete-update-insert combo is
not so bad as you would think.
Encapsulating it in some kind of functional API looks nice too.

Filip

В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: PostgreSQL vs. Microsoft SQL server
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Native DB replication for PG