Re: Indexing queries with bit masks

Поиск
Список
Период
Сортировка
От Mike Christensen
Тема Re: Indexing queries with bit masks
Дата
Msg-id r2o7aa638e01005020459w375fc567p9b7b3f80bc6cb595@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Indexing queries with bit masks  (Filip Rembiałkowski <plk.zuber@gmail.com>)
Список pgsql-general
Hey thanks..  I thought I'd share the method I came up with for updating subscriptions.  Basically, as far as my code is concerned the DB uses a bitmask (at least for updates) but I abstract it through a function.  First off, I have a little helper function so I don't repeat the same code a bunch of times:

CREATE OR REPLACE FUNCTION KPC_UpdateEmailPreferenceHelper(_enable boolean, _userid uuid, _type smallint)
   RETURNS void AS
   $BODY$
      BEGIN
         IF _enable THEN
            INSERT INTO EmailPreferences (UserId, NotificationType) SELECT _userid, _type
            WHERE NOT EXISTS (SELECT 1 FROM EmailPreferences WHERE UserId = _userid AND NotificationType = _type);
         ELSE
            DELETE FROM EmailPreferences WHERE UserId = _userid AND NotificationType = _type;
         END IF;
      END;
   $BODY$
   LANGUAGE 'plpgsql' VOLATILE
   COST 100;

Then through my code I call this one with a bitmask of which notifications the user wants to subscribe to:

CREATE OR REPLACE FUNCTION KPC_UpdateEmailPreference(_userid uuid, _prefs smallint)
  RETURNS void AS
$BODY$
      BEGIN
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 1 > 0, _userid, 1::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 2 > 0, _userid, 2::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 4 > 0, _userid, 3::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 8 > 0, _userid, 4::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 16 > 0, _userid, 5::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 32 > 0, _userid, 6::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 64 > 0, _userid, 7::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 128 > 0, _userid, 8::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 256 > 0, _userid, 9::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 512 > 0, _userid, 10::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 1024 > 0, _userid, 11::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 2048 > 0, _userid, 12::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 4096 > 0, _userid, 13::smallint);
         PERFORM KPC_UpdateEmailPreferenceHelper (_prefs & 8192 > 0, _userid, 14::smallint);
      END;
   $BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;

Seems to work pretty well, anyone have any feedback?

Mike

2010/5/1 Filip Rembiałkowski <plk.zuber@gmail.com>
2010/4/30 Mike Christensen <mike@kitchenpc.com>:
> Ok I've been blatantly lying, err, purposely simplifying the problem for the
> sake of the original email :)
>
> I've read over the responses, and am actually now considering just not using
> any index at all.  Here's why:
>
> First, this actually isn't the only thing on the WHERE clause.  It will only
> query for users who are "friends" with you so it can notify them of your
> activities.  That's done via a weird JOIN on a table that holds all the
> friend relationships.  So in reality, it will only load maybe a hundred
> rows, or maybe a thousand every once in a while if you're way popular.  If
> I'm not mistaken, it should use the index to narrow it down to the list of
> friends, and then use a sequential scan to weed out the ones who subscribe
> to that type of notification.
>
> Second, the only thing /ever/ that will do this query is the queue service
> whose job it is to process notifications (which are files dropped on the
> file system) and email people all day long.  This service handles one job at
> a time, and could potentially run on its own machine with its own read-only
> copy of the database.  Thus, even if it was a fairly slow query, it's not
> gonna bring down the rest of the site.
>
> Regarding the idea of putting an index on each bit, I thought about this
> earlier as well as just kinda cringed.  The users table gets updated quite a
> bit (last logon, session id, any time they change their profile info,
> etc)..  Too many indexes is bad.  I could just put the data in another table
> of course, which lead me to another idea.  Have a table called Subscriptions
> and have each row hold a user id and a notification type.  I could index
> both, and join on (Subscriptions.UserId = Users.UserId AND
> Subscriptions.Type = 8).  This would be pretty dang fast, however updates
> are kinda a royal pain.  When the user changes which types of subscriptions
> they want (via a list of checkboxes), I'd have to figure out which rows to
> delete and which new ones to insert.  However, I think I have an idea in
> mind for a PgSQL function you pass in the bitmask to and then it
> "translates" it to conditional deletes and inserts.
>
> A third idea I'm tossing around is just not worry about it.  Put the bitmask
> in the DB, but not filter on it.  Every "friend" would be loaded into the
> dataset, but the queue processor would just "skip" rows if they didn't
> subscribe to that event.  In other words, move the problem down to the
> business layer.  The drawback is potentially large number of rows are
> loaded, serialized, etc into memory that will just be ignored.  But of
> course the DB is probably a read-only copy and it's not even close to the
> bottle neck of the email queue under heavy load, so it's probably a
> non-issue.  If mailing is slow, I just add more queue services..
>
> I'm exploring all these ideas.  I predict using the bitwise AND on the where
> clause isn't gonna be the worst design ever, and it's sure easier to
> implement than a table of subscriptions.  What do you guys think?

I would say "normalize". Which means that I like your "separate table"
idea best.
It's clear, obvious, and 3NF - conforming solution.
Changing the set of subscriptions with delete-update-insert combo is
not so bad as you would think.
Encapsulating it in some kind of functional API looks nice too.

Filip

В списке pgsql-general по дате отправления:

Предыдущее
От: John Gage
Дата:
Сообщение: Re: Order of execution in shell echo to psql
Следующее
От: Sean
Дата:
Сообщение: About the limit of storage