Re: massive INSERT

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: massive INSERT
Дата
Msg-id 873clqer3f.fsf@stark.dyndns.tv
обсуждение исходный текст
Ответ на massive INSERT  (Kurt Overberg <kurt@hotdogrecords.com>)
Список pgsql-sql
Kurt Overberg <kurt@hotdogrecords.com> writes:

> ...I've found this to be faster then running the query, figuring out who needs
> to be removed from the group, who needs to be added and whatnot. The thing that
> I'm worried about is that this table is going to be pretty big (potentially
> millions of rows), and everytime I rebuild this table I lose (or at least
> invalidate) all my indexes.  Is that the case? Is constantly deleting then
> adding large numbers of rows from a table really bad for performance?  I'm
> worried this isn't going to scale well.  Anyone know of a better way to do
> this?   Thoughts and comments would be appreciated.

From a programming point of view it's usually cleaner to do it as you're doing
than trying to find the differences and do the minimal changes. So I usually
go with that.

There are performance implications though. Every time you update or delete a
record in postgres it generates old garbage tuples that need to get cleaned
up. If you're doing a big batch job it can overflow the space set aside to
recover them which means instead of VACUUM you might have to do a VACUUM FULL
or else play with the fsm parameters for your database.

It's not clear from your description if you're regenerating the entire table
or just a specific group. If you're looping through all the groups
regenerating all of them you should perhaps consider using TRUNCATE instead of
delete. TRUNCATE is a bigger hammer and avoids generating any free space to
clean up.

Alternatively you should consider running VACUUM ANALYZE after every group
update you do and possibly a VACUUM FULL at an hour you can withstand some
downtime.

I'm not sure what you mean by losing your indexes. They'll be just as valid
afterwards as they were before, though they might become less efficient for
basically the same reasons as the table above. You may find running the
periodic REINDEX during downtime helps.

If you're updating the majority of the table and can do it during downtime you
might consider dropping the index while performing the update and then
recreating it at the end. But that's optional.

--
greg



В списке pgsql-sql по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: Retrieving Definition for Composite Type
Следующее
От: Christopher Smith
Дата:
Сообщение: date and plpgsql error