Re: Best way to delete big amount of records from big table

Поиск
Список
Период
Сортировка
От Laurenz Albe
Тема Re: Best way to delete big amount of records from big table
Дата
Msg-id 5e44d4e8ae73193efa82fb2744dbd95cc5130669.camel@cybertec.at
обсуждение исходный текст
Ответ на Best way to delete big amount of records from big table  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
Ответы Re: Best way to delete big amount of records from big table  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
Список pgsql-performance
On Fri, 2020-03-27 at 15:13 +0100, Ekaterina Amez wrote:
> I'm trying to clean up a database with millions of records of 
> useless-but-don't-remove-just-in-case data. [...]
> 
> But also I'm cleaning tables with 150million records where I'm going to 
> remove 60% of existing data and after a few tests I'm not sure what's 
> the best approach as all seem to take similar time to run. These tables 
> are grouped in 4 tables group with master, detail, master_history, 
> detail_history structure. None of the tables have primary key nor 
> foreign key or any constraint but the sequence used for what should be 
> the PK column, though this column is not defined as PK.

You should define primary and foreign keys if you can, but I guess
I don't have to tell you that.

> I've decided to delete from the last one in chunks (10 days of data per 
> chunk but it coud be any other quantity) so I've created a function.  
> I've tested it with indexes (in master_hist for filtering data and in 
> detail_hist for the fk and pk), without indexes, after analyzing table, 
> and no matter what I always end up with more or less the same execution 
> time. I can afford the time it's getting to run but I'd like to know if 
> it's there a better way to do this.

There is no need to delete in batches unless you have a need to keep
transactions short (danger of deadlock because the data are still
modified, or you cannot afford to block autovacuum that long).

If you can drop the indexes while you do it (downtime), go for it.
Perhaps there is a way to use partial indexes that exclude all the
data that you have to delete, then work could go on as normal.

> I'm testing on version 9.2 BUT 
> production server is 8.4 (legacy application, supposed to be in at least 
> 9.2 but recently discovered it was 8.4, planning upgrade but not now). 
> Config parameters are default ones.

Now that is a seriously bad idea.  You should test on the same version
as you have running in production.  And you should insist in an upgrade.
People who insist in running ancient software often insist in ancient
hardware as well, and both is a good way to get data corruption.
If the system blows up, they are going to blame you.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




В списке pgsql-performance по дате отправления:

Предыдущее
От: Michael Lewis
Дата:
Сообщение: Re: Best way to delete big amount of records from big table
Следующее
От: Ekaterina Amez
Дата:
Сообщение: Re: Best way to delete big amount of records from big table