Deleting Rows From Large Tables

Поиск
Список
Период
Сортировка
От Rob Emery
Тема Deleting Rows From Large Tables
Дата
Msg-id CAPCETps_ur3KM=8z0fQovKG-eKDVfsp9Xjt-9PEQd5BVMusr4Q@mail.gmail.com
обсуждение исходный текст
Ответы Re: Deleting Rows From Large Tables  (Jeff Janes <jeff.janes@gmail.com>)
Re: Deleting Rows From Large Tables  (Greg Spiegelberg <gspiegelberg@gmail.com>)
Re: Deleting Rows From Large Tables  (Greg Smith <greg@2ndQuadrant.com>)
Список pgsql-performance
Hi All,

We've got 3 quite large tables that due to an unexpected surge in
usage (!) have grown to about 10GB each, with 72, 32 and 31 million
rows in. I've been tasked with cleaning out about half of them, the
problem I've got is that even deleting the first 1,000,000 rows seems
to take an unreasonable amount of time. Unfortunately this is on quite
an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres
8.4; which serves other things like our logging systems.

If I run a sustained (more than about 5 minutes) delete it'll have a
detrimental effect on the other services. I'm trying to batch up the
deletes into small chunks of approximately 1 month of data ; even this
seems to take too long, I originally reduced this down to a single
day's data and had the same problem. I can keep decreasing the size of
the window I'm deleting but I feel I must be doing something either
fundamentally wrong or over-complicating this enormously. I've
switched over to retrieving a list of IDs to delete, storing them in
temporary tables and deleting based on the primary keys on each of the
tables with something similar to this:

BEGIN TRANSACTION;

CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT);
CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT);

INSERT INTO table_a_ids_to_delete
    SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at
< '2007-01-01T00:00:00';

INSERT INTO table_b_ids_to_delete
    SELECT table_b_id FROM table_a_table_b_xref
    INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id =
table_a_table_b.quote_id);

DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete
    WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id;

DELETE FROM table_b USING table_b_ids_to_delete
    WHERE table_b.id = table_b_ids_to_delete.id;

DELETE FROM table_a USING table_a_ids_to_delete
    WHERE table_a.id =  table_a_ids_to_delete.id;

COMMIT;

There're indices on table_a on the queried columns, table_b's primary
key is it's id, and table_a_table_b_xref has an index on (table_a_id,
table_b_id). There're FK defined on the xref table, hence why I'm
deleting from it first.

Does anyone have any ideas as to what I can do to make the deletes any
faster? I'm running out of ideas!

Thanks in advance,

--
Rob Emery


В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rees
Дата:
Сообщение: Re: Reliability with RAID 10 SSD and Streaming Replication
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Reliability with RAID 10 SSD and Streaming Replication