Обсуждение: Purging few months old data and vacuuming in production

Поиск
Список
Период
Сортировка

Purging few months old data and vacuuming in production

От
Ranjith Paliyath
Дата:
Hi,

We have a PostgreSQL (slightly old version, something like - PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc
(GCC)4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit) production, where one particular table and its related 5 tables need to
bepurged of 3 months prior data. Each of these tables' daily record increment is on an average 2 to 3 million.  

Approach needed is to do a daily purge of 90days prior data.  Probable purge processing window is expected to be 2hrs.
Observedtest timing for deletion is exceeding 2-3hrs and we are trying to do vacuuming after the deletes, which is
againtaking exceeding another 2hrs.  
There is a suggestion for re-creating the tables with partitions, and as purge approach could then be a
deletion/droppingof these partitions, which would not really require a vacuuming later on.  

When we go for a Daily purge approach it should not put a strain on other processes which could be affecting this same
setof tables, like these tables should not get locked because of the purge. 

Questions are -
(a) Should we recommend PostgreSQL upgrade, if possible, to v15.1? Could this bring in some benefits related to
vacuuming?
(b) Would partitioning be an optimal approach?

Thank you,
Regards


This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the
individualor entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this
communicationmay be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient,
pleasenotify the sender by replying to this email immediately and delete this email (and any attachment thereto) from
yourcomputer system...Thank You. 



Re: Purging few months old data and vacuuming in production

От
Inzamam Shafiq
Дата:
Hi,

I think partitioning will be a good option, you can easily detach partitions will minimal time.

Regards,

Inzamam Shafiq
Sr. DBA

From: Ranjith Paliyath <ranjithp@suntecgroup.com>
Sent: Friday, December 30, 2022 11:39 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Purging few months old data and vacuuming in production
 
Hi,

We have a PostgreSQL (slightly old version, something like - PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit) production, where one particular table and its related 5 tables need to be purged of 3 months prior data. Each of these tables' daily record increment is on an average 2 to 3 million.

Approach needed is to do a daily purge of 90days prior data.  Probable purge processing window is expected to be 2hrs. Observed test timing for deletion is exceeding 2-3hrs and we are trying to do vacuuming after the deletes, which is again taking exceeding another 2hrs.
There is a suggestion for re-creating the tables with partitions, and as purge approach could then be a deletion/dropping of these partitions, which would not really require a vacuuming later on.

When we go for a Daily purge approach it should not put a strain on other processes which could be affecting this same set of tables, like these tables should not get locked because of the purge.

Questions are -
(a) Should we recommend PostgreSQL upgrade, if possible, to v15.1? Could this bring in some benefits related to vacuuming?
(b) Would partitioning be an optimal approach? 

Thank you,
Regards


This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this communication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient, please notify the sender by replying to this email immediately and delete this email (and any attachment thereto) from your computer system...Thank You.


Re: Purging few months old data and vacuuming in production

От
Amitabh Kant
Дата:


On Fri, Dec 30, 2022 at 12:09 PM Ranjith Paliyath <ranjithp@suntecgroup.com> wrote:
Hi,

We have a PostgreSQL (slightly old version, something like - PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit) production, where one particular table and its related 5 tables need to be purged of 3 months prior data. Each of these tables' daily record increment is on an average 2 to 3 million.

Approach needed is to do a daily purge of 90days prior data.  Probable purge processing window is expected to be 2hrs. Observed test timing for deletion is exceeding 2-3hrs and we are trying to do vacuuming after the deletes, which is again taking exceeding another 2hrs.
There is a suggestion for re-creating the tables with partitions, and as purge approach could then be a deletion/dropping of these partitions, which would not really require a vacuuming later on.

When we go for a Daily purge approach it should not put a strain on other processes which could be affecting this same set of tables, like these tables should not get locked because of the purge.

Questions are -
(a) Should we recommend PostgreSQL upgrade, if possible, to v15.1? Could this bring in some benefits related to vacuuming?
(b) Would partitioning be an optimal approach? 

Thank you,
Regards


This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this communication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient, please notify the sender by replying to this email immediately and delete this email (and any attachment thereto) from your computer system...Thank You.



Partitioning would definitely help, if you partition by date .. Also, if my memory serves me right, v 15 would definitely help if you have large number of partitions, which in your case it's most likely to be the case.

Deleting from single table would be putting a strain on your existing table.

Amitabh 

Re: Purging few months old data and vacuuming in production

От
Ron
Дата:
On 12/30/22 00:39, Ranjith Paliyath wrote:
> Hi,
>
> We have a PostgreSQL (slightly old version, something like - PostgreSQL 11.2

You know, of course, that you should update to the latest version. It's 
quick and painless.

> on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit) production, where one
particulartable and its related 5 tables need to be purged of 3 months prior data. Each of these tables' daily record
incrementis on an average 2 to 3 million.
 

270M rows isn't that much.  Deleting 3M rows should not take 3 hours, even 
when there are six tables.

Are the tables tied together by FK?

How big are the rows?

> Approach needed is to do a daily purge of 90days prior data.  Probable purge processing window is expected to be
2hrs.Observed test timing for deletion is exceeding 2-3hrs and we are trying to do vacuuming after the deletes, which
isagain taking exceeding another 2hrs.
 

Is there an index on the date field?

Can you drop unneeded indices during the window, and then rebuild them 
afterward?

How beefy is your hardware?

> There is a suggestion for re-creating the tables with partitions, and as purge approach could then be a
deletion/droppingof these partitions, which would not really require a vacuuming later on.
 
>
> When we go for a Daily purge approach it should not put a strain on other processes which could be affecting this
sameset of tables, like these tables should not get locked because of the purge.
 
>
> Questions are -
> (a) Should we recommend PostgreSQL upgrade, if possible, to v15.1? Could this bring in some benefits related to
vacuuming?
> (b) Would partitioning be an optimal approach?

We tried this with Postgresql 12.x declarative partitioning on tables with 
synthetic keys.  Query performance suffered, since the date field needs to 
be added to the PK, and Pg tends to scan all the partitions, even when the 
date field is part of the WHERE clause. Thus, we departitioned all but the 
two with large bytea columns.

-- 
Born in Arizona, moved to Babylonia.



Re: Purging few months old data and vacuuming in production

От
Ranjith Paliyath
Дата:
Thank you for the details, experience shared and the suggestions.
Apologies for the delay in collecting the response for the queries.

(1)Are the tables tied together by FK?
  - Overall there are 9 tables (sorry not 6 as mentioned originally) that are being purged. Only 4 tables would be
havingFK relationship.   

(2)How big are the rows?
  - The 9 tables now occupy almost 2TB space. Below is the rowsize (in bytes) and record-count details -

    236    188,055,675
    297    296,941,261
    371    58,673,649
      95    57,477,553
    904    296,743,680
    234    188,161,891
    414    430,411,653
    707    735,895,015
    128    155,104,922

(3)Is there an index on the date field?
  - Yes. But only in one table, which is the main table (records to purge in rest of the tables is based on this
table). 

(4)Can you drop unneeded indices during the window, and then rebuild them afterward?
  - Not sure, if the time window within which the purge process must complete would be sufficient to do drop and
rebuildof indices. 

(5)How beefy is your hardware?
  - No. of cores - 64
    Memory - 128GB
    Disk - SSD, Total capacity - 8.5TB

Thank you...This electronic mail (including any attachment thereto) may be confidential and privileged and is intended
onlyfor the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of
thiscommunication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient,
pleasenotify the sender by replying to this email immediately and delete this email (and any attachment thereto) from
yourcomputer system...Thank You. 



Re: Purging few months old data and vacuuming in production

От
Ron
Дата:
On 1/6/23 02:44, Ranjith Paliyath wrote:
> Thank you for the details, experience shared and the suggestions.
> Apologies for the delay in collecting the response for the queries.
>
> (1)Are the tables tied together by FK?
>    - Overall there are 9 tables (sorry not 6 as mentioned originally) that are being purged. Only 4 tables would be
havingFK relationship.
 
>
> (2)How big are the rows?
>    - The 9 tables now occupy almost 2TB space. Below is the rowsize (in bytes) and record-count details -
>    
>      236    188,055,675
>      297    296,941,261
>      371    58,673,649
>        95    57,477,553
>      904    296,743,680
>      234    188,161,891
>      414    430,411,653
>      707    735,895,015
>      128    155,104,922
>
> (3)Is there an index on the date field?
>    - Yes. But only in one table, which is the main table (records to purge in rest of the tables is based on this
table).

Can you do online purging?

For example, get a list of the main table's primary keys to be deleted, and 
then nibble away at them all day: in one transaction delete all the records 
for one logically related set of records.  Do that N million times, and 
you've purged the data without impacting production.


-- 
Born in Arizona, moved to Babylonia.



Re: Purging few months old data and vacuuming in production

От
Ranjith Paliyath
Дата:
Thank you very much for the response.

    > Can you do online purging?

    > For example, get a list of the main table's primary keys to be deleted, and
    > then nibble away at them all day: in one transaction delete all the records
    > for one logically related set of records.  Do that N million times, and
    > you've purged the data without impacting production.

So, with this approach, is the advantage like, manual vacuuming worry may be set aside, because auto-vacuuming would
dealwith the dead rows? This is because the deletion step is executed record by record in main table, with its
connectedrecord(s) delete executions in rest of tables? Due to the infra capability that is there in this instance, the
impactcould be almost none!!?? 

Thank you...

This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the
individualor entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this
communicationmay be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient,
pleasenotify the sender by replying to this email immediately and delete this email (and any attachment thereto) from
yourcomputer system...Thank You. 



Re: Purging few months old data and vacuuming in production

От
Ron
Дата:
On 1/6/23 08:27, Ranjith Paliyath wrote:
Thank you very much for the response.
    > Can you do online purging?
    > For example, get a list of the main table's primary keys to be deleted, and    > then nibble away at them all day: in one transaction delete all the records    > for one logically related set of records.  Do that N million times, and    > you've purged the data without impacting production.

So, with this approach, is the advantage like, manual vacuuming worry may be set aside, because auto-vacuuming would deal with the dead rows?

Theoretically, manual vacuuming is never necessary.  I'd occasionally do manual vacuums (after purging a couple of weeks of data, for example).

Disable autovacuum on a table, vacuum it, then reenable autovacuum.

ALTER TABLE table_name SET (autovacuum_enabled = false);
VACUUM table_name;
ALTER TABLE table_name SET (autovacuum_enabled = true);


 This is because the deletion step is executed record by record in main table, with its connected record(s) delete executions in rest of tables? 

I don't know if you have ON DELETE CASCADE.  Even if you do, you'll have to manually delete the tables not linked by FK.  I'd write a PL/pgSQL procedure: pass in a PK and then delete records from the 9 tables in the proper order so as to not throw FK constraint errors.

Due to the infra capability that is there in this instance,

What is "infra capability"?

the impact could be almost none!!??

It'll use some resources, because it's a thread deleting records, but most of the records and index nodes won't be where new records are being inserted.

Note, though, that this will generate a lot of WAL records.

--
Born in Arizona, moved to Babylonia.

RE: Purging few months old data and vacuuming in production

От
Ranjith Paliyath
Дата:
Thank you very much again.


        > So, with this approach, is the advantage like, manual vacuuming worry may be set aside, because
auto-vacuumingwould deal with the dead rows?
 

    > Theoretically, manual vacuuming is never necessary.  I'd occasionally do manual vacuums (after purging a couple
ofweeks of data, for example).
 

    > Disable autovacuum on a table, vacuum it, then reenable autovacuum.
    >
    > ALTER TABLE table_name SET (autovacuum_enabled = false);
    > VACUUM table_name;
    > ALTER TABLE table_name SET (autovacuum_enabled = true);

Ok. For the record by record delete approach, autovacuum-ing could be the natural option.

        > This is because the deletion step is executed record by record in main table, with its connected record(s)
deleteexecutions in rest of tables? 
 

    > I don't know if you have ON DELETE CASCADE.  Even if you do, you'll have to manually delete the tables not linked
byFK.  I'd write a PL/pgSQL procedure: pass in a PK and then delete records from the 9 tables in the proper order so as
tonot throw FK constraint errors. 
 

Ok, in the case of our specific 9 tables it would finding and deleting linked records in 8 tables based on the record
chosenin the main table. That is going and deleting records one by one.
 


        > Due to the infra capability that is there in this instance,

    > What is "infra capability"?

You had a query like how beefy the hardware is - was trying to refer to the hardware capability.

        > the impact could be almost none!!??

    > It'll use some resources, because it's a thread deleting records, but most of the records and index nodes won't
bewhere new records are being inserted.
 

    > Note, though, that this will generate a lot of WAL records.

Ok, thanks. 
We were weighing on pros and cons of the table partitioning approach. But, input on the experience you had with the
partitionedapproach is something we'll need to very much consider. We'll try to see if the per record delete could be
triedout once, and how it affects the DB load, with its present WAL setting.
 

Thank you...



Re: Purging few months old data and vacuuming in production

От
"Peter J. Holzer"
Дата:
On 2023-01-07 05:33:33 +0000, Ranjith Paliyath wrote:
>
>         > This is because the deletion step is executed record
>         > by record in main table, with its connected record(s)
>         > delete executions in rest of tables?
>
>     > I don't know if you have ON DELETE CASCADE.  Even if you do,
>     > you'll have to manually delete the tables not linked by FK. 
>     > I'd write a PL/pgSQL procedure: pass in a PK and then delete
>     > records from the 9 tables in the proper order so as to not
>     > throw FK constraint errors.
>
> Ok, in the case of our specific 9 tables it would finding and deleting
> linked records in 8 tables based on the record chosen in the main
> table. That is going and deleting records one by one.

If I understood correctly, you have to delete about 3 million records
(worst case) from the main table each day. Including the other 8 tables
those are 27 million DELETE queries each of which deletes only a few
records. That's about 300 queries per second. I'd be worried about
impacting performance on other queries at this rate.

I'd go for a middle ground: Instead of expiring once per day, use a
shorter interval, maybe once per hour or once per minute. That will
(probably) make each expire job really quick but still create much less
load overall.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Purging few months old data and vacuuming in production

От
Ron
Дата:
On 1/7/23 05:29, Peter J. Holzer wrote:
[snip]
If I understood correctly, you have to delete about 3 million records
(worst case) from the main table each day. Including the other 8 tables
those are 27 million DELETE queries each of which deletes only a few
records. That's about 300 queries per second. I'd be worried about
impacting performance on other queries at this rate.

300 records/second.  Fewer DELETE statements if there are one-many relationships with the child tables.

I'd go for a middle ground: Instead of expiring once per day, use a
shorter interval, maybe once per hour or once per minute. That will
(probably) make each expire job really quick but still create much less
load overall.

--
Born in Arizona, moved to Babylonia.

Re: Purging few months old data and vacuuming in production

От
"Peter J. Holzer"
Дата:
On 2023-01-07 07:40:01 -0600, Ron wrote:
> On 1/7/23 05:29, Peter J. Holzer wrote:
>     If I understood correctly, you have to delete about 3 million records
>     (worst case) from the main table each day. Including the other 8 tables
>     those are 27 million DELETE queries each of which deletes only a few
>     records. That's about 300 queries per second. I'd be worried about
>     impacting performance on other queries at this rate.
>
>
> 300 records/second.  Fewer DELETE statements if there are one-many
> relationships with the child tables.

Nope:

| Each of these tables' daily record increment is on an average 2 to 3
| million

I am assuming that the main table is typical, so there will be 2 to 3
million DELETEs from the main table and also from each of the other 8
tables (which may delete 0, 1, or more records). Also, it was mentioned
that only some of these tables have a direct FK relationship, so the
DELETE queries against the other tables may be (much) more expensive
than a simple `delete from my_table where main_id = :1`.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Purging few months old data and vacuuming in production

От
Николай Кобзарев
Дата:



Суббота, 7 января 2023, 20:23 +03:00 от Peter J. Holzer <hjp-pgsql@hjp.at>:
On 2023-01-07 07:40:01 -0600, Ron wrote:
> On 1/7/23 05:29, Peter J. Holzer wrote:
> If I understood correctly, you have to delete about 3 million records
> (worst case) from the main table each day. Including the other 8 tables
> those are 27 million DELETE queries each of which deletes only a few
> records. That's about 300 queries per second. I'd be worried about
> impacting performance on other queries at this rate.
>
>
> 300 records/second.  Fewer DELETE statements if there are one-many
> relationships with the child tables.

Nope:

| Each of these tables' daily record increment is on an average 2 to 3
| million

I am assuming that the main table is typical, so there will be 2 to 3
million DELETEs from the main table and also from each of the other 8
tables (which may delete 0, 1, or more records). Also, it was mentioned
that only some of these tables have a direct FK relationship, so the
DELETE queries against the other tables may be (much) more expensive
than a simple `delete from my_table where main_id = :1`.

        hp

--
   _ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"


So one may consider deleting from child tables, and only after that delete from main table, avoiding enforcing foreign key during delete. Also consider deletes by relatively small chunks, in loop.