Обсуждение: Advice about how to delete

Поиск
Список
Период
Сортировка

Advice about how to delete

От
Arnau
Дата:
Hi all,

   I have the following scenario, I have users and groups where a user
can belong to n groups, and a group can have n users. A user must belogn
at least to a group. So when I delete a group I must check that there
isn't any orphan. To do this I have something like that:

   CREATE TABLE users
   (
     user_id    SERIAL8 PRIMARY KEY
     user_name  VARCHAR(50)
   )

   CREATE TABLE groups
   (
     group_id    SERIAL8 PRIMARY KEY,
     group_name  VARCHAR(50)
   )

   CREATE TABLE user_groups
   (
     user_id   INT8 REFERENCES users(user_id),
     group_id  INT8 REFERENCE groups(group_id),
     CONSTRAINT pk PRIMARY_KEY ( user_id, group_id)
   )

   CREATE INDEX idx_user_id ON user_groups( user_id );
   CREATE INDEX idx_group_id ON user_groups( group_id );

   FUNCTION delete_group( INT8 )
   DECLARE
     p_groupid ALIAS FOR $1;
     v_deleted INTEGER;
     v_count   INTEGER;
     result    RECORD;

   BEGIN
     v_deleted = 0;

     FOR result IN SELECT user_id FROM user_groups WHERE group_id =
p_groupid
     LOOP

       SELECT INTO v_count COUNT(user_id) FROM user_groups WHERE user_id
= result.user_id LIMIT 2;

       IF v_count = 1 THEN
         DELETE FROM users WHERE user_id = result.user_id;
         v_deleted = v_deleted + 1;
       END IF;

     END LOOP;

     DELETE FROM groups WHERE group_id = p_groupid;

     RETURN v_deleted;
   END;


   This works quite fast with small groups but when the group has an
important number of users, it takes too much time. The delete_group
action is fired from the user interface of the application.

   Do you have any idea about how I could improve the performance of this?

Thanks all
--
Arnau

Re: Advice about how to delete

От
Michael Glaesemann
Дата:
On Jul 6, 2007, at 9:42 , Arnau wrote:

>   I have the following scenario, I have users and groups where a
> user can belong to n groups, and a group can have n users. A user
> must belogn at least to a group. So when I delete a group I must
> check that there isn't any orphan. To do this I have something like
> that:


>       IF v_count = 1 THEN
>         DELETE FROM users WHERE user_id = result.user_id;
>         v_deleted = v_deleted + 1;
>       END IF;

Am I right in reading that you're deleting any users that would be
orphans? If so, you can just delete the orphans after rather than
delete them beforehand (untested):

-- delete user_group
DELETE FROM user_groups
WHERE user_group_id = p_group_id;

-- delete users that don't belong to any group
DELETE FROM users
WHERE user_id IN (
     SELECT user_id
     LEFT JOIN user_groups
     WHERE group_id IS NULL);

This should execute pretty quickly. You don't need to loop over any
results. Remember, SQL is a set-based language, so if you can pose
your question in a set-based way, you can probably find a pretty
good, efficient solution.

Michael Glaesemann
grzm seespotcode net



Re: Advice about how to delete

От
Arnau
Дата:
Hi Michael,

Michael Glaesemann wrote:
>
> On Jul 6, 2007, at 9:42 , Arnau wrote:
>
>>   I have the following scenario, I have users and groups where a user
>> can belong to n groups, and a group can have n users. A user must
>> belogn at least to a group. So when I delete a group I must check that
>> there isn't any orphan. To do this I have something like that:
>
>
>>       IF v_count = 1 THEN
>>         DELETE FROM users WHERE user_id = result.user_id;
>>         v_deleted = v_deleted + 1;
>>       END IF;
>
> Am I right in reading that you're deleting any users that would be
> orphans? If so, you can just delete the orphans after rather than delete
> them beforehand (untested):
>
> -- delete user_groupDELETE FROM user_groups
> WHERE user_group_id = p_group_id;
>
> -- delete users that don't belong to any group
> DELETE FROM users
> WHERE user_id IN (
>     SELECT user_id
>     LEFT JOIN user_groups
>     WHERE group_id IS NULL);
>
> This should execute pretty quickly. You don't need to loop over any
> results. Remember, SQL is a set-based language, so if you can pose your
> question in a set-based way, you can probably find a pretty good,
> efficient solution.

   I have tested your solution and it's much worse than mine.

   My test database has about 254000 users and about 30 groups. The test
I have done is remove a group with 258 users, my solution has taken
about 3 seconds and your solution after 20seconds didn't finished. Of
course the test machine is an old celeron with few MB of RAM, but as
test machine does the job.

Thank you very much
--
Arnau

Re: Advice about how to delete

От
Heikki Linnakangas
Дата:
Arnau wrote:
>   CREATE TABLE user_groups
>   (
>     user_id   INT8 REFERENCES users(user_id),
>     group_id  INT8 REFERENCE groups(group_id),
>     CONSTRAINT pk PRIMARY_KEY ( user_id, group_id)
>   )
>
>   CREATE INDEX idx_user_id ON user_groups( user_id );

The primary key implicitly creates an index on (user_id, group_id), so
you probably don't need this additional index.

>   This works quite fast with small groups but when the group has an
> important number of users, it takes too much time. The delete_group
> action is fired from the user interface of the application.

It looks like you're not deleting rows from user_groups when a group is
deleted. Perhaps the table definition you posted misses ON DELETE
CASCADE on the foreign key declarations?

I would implement this with triggers. Use the ON DELETE CASCADE to take
care of deleting rows from user_groups and create an ON DELETE trigger
on user_groups to delete orphan rows. Like this:

CREATE OR REPLACE FUNCTION delete_orphan_users () RETURNS trigger AS $$
   DECLARE
   BEGIN
     PERFORM  * FROM user_groups ug WHERE ug.user_id = OLD.user_id;
     IF NOT FOUND THEN
         DELETE FROM users WHERE users.user_id = OLD.user_id;
     END IF;

     RETURN NULL;
   END;
$$ LANGUAGE 'plpgsql';

DROP TRIGGER IF EXISTS d_usergroup ON user_groups;
CREATE TRIGGER d_usergroup AFTER DELETE ON user_groups FOR EACH ROW
EXECUTE PROCEDURE delete_orphan_users();

This might not be significantly faster, but it's easier to work with.

>   Do you have any idea about how I could improve the performance of this?

Michael Glaesemann's idea of using a single statement to delete all
orphan users with one statement is a good one, though you want to refine
it a bit so that you don't need to do a full table scan every time.
Perhaps like this, before deleting rows from user_groups:

DELETE FROM users WHERE user_id IN (
   SELECT u.user_id FROM users u
   LEFT OUTER JOIN user_groups ug ON (u.user_id = ug.user_id AND
ug.group_id <> 10)
   WHERE group_id IS NULL
   AND u.user_id IN (SELECT user_id FROM user_groups where group_id = 10)
);

Or maybe you could just leave the orphans in the table, and delete them
later in batch?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com