Обсуждение: time taking deletion on large tables

Поиск
Список
Период
Сортировка

time taking deletion on large tables

От
Atul Kumar
Дата:
Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this:  it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below



"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"


please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?





Regards,
Atul



Re: time taking deletion on large tables

От
Ravikumar Reddy
Дата:
Hi Atul,

Please try the code below. Execute all the statements in one transaction.

select * into new_table from old_table where type = 'abcz';
truncate table   old_table;
inesrt into  old_table select * from new_table;




On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871@gmail.com> wrote:
Hi,

The feed_posts table has over 50 Million rows.

When I m deleting all rows of a certain type that are over 60 days old.

When I try to do a delete like this:  it hangs for an entire day, so I
need to kill it with pg_terminate_backend(pid).

DELETE FROM feed_posts
WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
AND created_at > '2020-05-11 00:00:00'
AND created_at < '2020-05-12 00:00:00';

So– I need help in figuring out how to do large deletes on a
production database during normal hours.

explain plan is given below



"Delete on feed_posts (cost=1156.57..195748.88 rows=15534 width=6)"
" -> Bitmap Heap Scan on feed_posts (cost=1156.57..195748.88
rows=15534 width=6)"
" Recheck Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp
without time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp
without time zone))"
" Filter: (feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'::uuid)"
" -> Bitmap Index Scan on feed_posts_created_at (cost=0.00..1152.68
rows=54812 width=0)"
" Index Cond: ((created_at >= '2020-05-11 00:00:00'::timestamp without
time zone) AND (created_at <= '2020-05-12 00:00:00'::timestamp without
time zone))"


please help me on deleting the rows, Do I need to anything in postgres
configuration ?
or in table structure ?





Regards,
Atul




--
Regards,
Ravikumar S,
Ph: 8106741263

Re: time taking deletion on large tables

От
Justin Pryzby
Дата:
On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
> Please try the code below. Execute all the statements in one transaction.
> 
> select * into new_table from old_table where type = 'abcz';
> truncate table   old_table;
> inesrt into  old_table select * from new_table;

This looks like advice for when most of the rows are being deleted, but I don't
think that's true here.  It'd need to LOCK old_table, first, right?  Also,
truncate isn't MVCC safe.

Atul: What server version?  Do you have an index on feed_definition_id ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions

If explain (analyze,buffers) SELECT runs in a reasonable time for that query,
include its output.

On Thu, Dec 3, 2020 at 8:16 PM Atul Kumar <akumar14871@gmail.com> wrote:
> The feed_posts table has over 50 Million rows.
>
> When I m deleting all rows of a certain type that are over 60 days old.
>
> When I try to do a delete like this:  it hangs for an entire day, so I
> need to kill it with pg_terminate_backend(pid).
>
> DELETE FROM feed_posts
> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
> AND created_at > '2020-05-11 00:00:00'
> AND created_at < '2020-05-12 00:00:00';
>
> So– I need help in figuring out how to do large deletes on a
> production database during normal hours.
>
> please help me on deleting the rows, Do I need to anything in postgres
> configuration ?
> or in table structure ?



Re: time taking deletion on large tables

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
>> When I try to do a delete like this:  it hangs for an entire day, so I
>> need to kill it with pg_terminate_backend(pid).
>> 
>> DELETE FROM feed_posts
>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
>> AND created_at > '2020-05-11 00:00:00'
>> AND created_at < '2020-05-12 00:00:00';

90% of the "delete takes forever" complaints that we hear trace down to
having a foreign key reference to the deletion-target table that's not
backed by an index on the referencing column.  Then you end up getting
a seqscan on the referencing table to look for rows referencing a
row-to-be-deleted.  And then another one for the next row.  Etc.

You could try "explain analyze" on a query deleting just a single
one of these rows and see if an RI enforcement trigger is what's
eating the time.

            regards, tom lane



Re: time taking deletion on large tables

От
Andrew Dunstan
Дата:
On 12/3/20 11:16 AM, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> On Thu, Dec 03, 2020 at 08:43:57PM +0530, Ravikumar Reddy wrote:
>>> When I try to do a delete like this:  it hangs for an entire day, so I
>>> need to kill it with pg_terminate_backend(pid).
>>>
>>> DELETE FROM feed_posts
>>> WHERE feed_definition_id = 'bf33573d-936e-4e55-8607-72b685d2cbae'
>>> AND created_at > '2020-05-11 00:00:00'
>>> AND created_at < '2020-05-12 00:00:00';
> 90% of the "delete takes forever" complaints that we hear trace down to
> having a foreign key reference to the deletion-target table that's not
> backed by an index on the referencing column.  Then you end up getting
> a seqscan on the referencing table to look for rows referencing a
> row-to-be-deleted.  And then another one for the next row.  Etc.
>
> You could try "explain analyze" on a query deleting just a single
> one of these rows and see if an RI enforcement trigger is what's
> eating the time.
>
>             



Yeah. IIRC some other RDBMS systems actually create such an index if it
doesn't already exist. Maybe we should have a warning when setting up an
FK constraint if the referencing fields aren't usefully indexed.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com