how to delete many rows from a huge table?

Поиск
Список
Период
Сортировка
От mARK bLOORE
Тема how to delete many rows from a huge table?
Дата
Msg-id d9425f7b0905251229g289edc15u4bf9dded7592846@mail.gmail.com
обсуждение исходный текст
Ответы Re: how to delete many rows from a huge table?  (Scott Marlowe <scott.marlowe@gmail.com>)
Список pgsql-general
I have a table of about 1 G rows, and I want to delete about 5 M rows,
listed in another table.
The big table is

             Table "public.backlinks"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Triggers:
    insert_backlinks_trigger BEFORE INSERT ON backlinks FOR EACH ROW
EXECUTE PROCEDURE backlinks_insert_trigger()
Has OIDs: no

It is divided into 64 partitions, like

            Table "public.backlinks_0"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Indexes:
    "backlinks_0_pkey" PRIMARY KEY, btree (key, backlink_hash),
tablespace "fastdisk"
Check constraints:
    "backlinks_0_key_check" CHECK (partition(key) = 0)
Inherits: backlinks
Has OIDs: no

The partitions are very evenly filled.  The partition function just
takes the first byte modulo 64.

The table listing what I want to delete has just the key values.

If I just do

DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;

then it grinds for an hour or so and runs out of memory.

If I do

DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
bl LIMIT 40 OFFSET 0);

it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
minutes til I kill it.  EXPLAIN says that it is doing a sequential
scan on several of the partitions in the 50 case, but not the 40.
Auto-vacuuming is running on the DB.

If I put that DELETE into a procedure and loop on the offset, it acts
like the 50 case, even if i set the LIMIT to 1.

I am currently running a shell loop doing the delete in chunks of 40,
but it has been running all weekend and is only half way through.

What am I doing wrong?

--
mARK bLOORE <mbloore@gmail.com>

В списке pgsql-general по дате отправления:

Предыдущее
От: iSTRONG
Дата:
Сообщение: Re: PGS Tuning Wizard destroys my login
Следующее
От: Intengu Technologies
Дата:
Сообщение: Re: Assistance in importing a csv file into Postgresql