Обсуждение: massive INSERT
Hi everyone! I'm trying to maintain a 'membership' table in my postgres database. Membership is determined by a potentially costly algorithm that basically returns a series of member ids, where I insert those ids into my membership table (called groups). Up to now, for speed purposes, I've been deleting all users in a group, then re-building the group from scratch. The tables look roughly like this: id | integer | not null default nextval('"xrefmgrp_id_seq"'::text) membergroupid | integer | not null default 0 memberid | integer | not null default 0 There's a constraint on the table saying that (membergroupid,memberid) needs to be UNIQUE. ...so before re-building a table, I do a: delete from xrefmembergroup where membergroupid = 4 ; ...then blast the id's into the table: insert into xrefmembergroup (membergroupid, memberid) select 4 as membergroupid, member.id as memberid from member where <* huge complex select*> ...I've found this to be faster then running the query, figuring out who needs to be removed from the group, who needs to be added and whatnot. The thing that I'm worried about is that this table is going to be pretty big (potentially millions of rows), and everytime I rebuild this table I lose (or at least invalidate) all my indexes. Is that the case? Is constantly deleting then adding large numbersof rows from a table really bad for performance? I'm worried this isn't going to scale well. Anyone know of a better way to do this? Thoughtsand comments would be appreciated. Thanks! /kurt
Kurt Overberg <kurt@hotdogrecords.com> writes: > ...I've found this to be faster then running the query, figuring out who needs > to be removed from the group, who needs to be added and whatnot. The thing that > I'm worried about is that this table is going to be pretty big (potentially > millions of rows), and everytime I rebuild this table I lose (or at least > invalidate) all my indexes. Is that the case? Is constantly deleting then > adding large numbers of rows from a table really bad for performance? I'm > worried this isn't going to scale well. Anyone know of a better way to do > this? Thoughts and comments would be appreciated. From a programming point of view it's usually cleaner to do it as you're doing than trying to find the differences and do the minimal changes. So I usually go with that. There are performance implications though. Every time you update or delete a record in postgres it generates old garbage tuples that need to get cleaned up. If you're doing a big batch job it can overflow the space set aside to recover them which means instead of VACUUM you might have to do a VACUUM FULL or else play with the fsm parameters for your database. It's not clear from your description if you're regenerating the entire table or just a specific group. If you're looping through all the groups regenerating all of them you should perhaps consider using TRUNCATE instead of delete. TRUNCATE is a bigger hammer and avoids generating any free space to clean up. Alternatively you should consider running VACUUM ANALYZE after every group update you do and possibly a VACUUM FULL at an hour you can withstand some downtime. I'm not sure what you mean by losing your indexes. They'll be just as valid afterwards as they were before, though they might become less efficient for basically the same reasons as the table above. You may find running the periodic REINDEX during downtime helps. If you're updating the majority of the table and can do it during downtime you might consider dropping the index while performing the update and then recreating it at the end. But that's optional. -- greg