Обсуждение: delete taking long time
Table "public.connection"
Column | Type | Modifiers
-------------------+--------------------------------+-----------
uid | character varying(18) | not null
ts | timestamp(6) without time zone | not null
host_origin | inet | not null
port_origin | integer | not null
host_destination | inet | not null
port_destination | integer | not null
protocol | character varying(12) |
service | character varying(12) |
duration | interval second(6) |
origin_bytes | bigint |
response_bytes | bigint |
connection_state | character varying(8) |
local_origin | boolean |
local_response | boolean |
missed_bytes | bigint |
history | text |
origin_packets | bigint |
origin_ip_bytes | bigint |
response_packets | bigint |
response_ip_bytes | bigint |
Indexes:
"connection_pkey" PRIMARY KEY, btree (uid)
Foreign-key constraints:
"connection_protocol_fkey" FOREIGN KEY (protocol) REFERENCES protocol(name)
"connection_service_fkey" FOREIGN KEY (service) REFERENCES service(name)
Referenced by:
TABLE "dns" CONSTRAINT "dns_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "files" CONSTRAINT "files_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "http" CONSTRAINT "http_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "notice" CONSTRAINT "notice_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "snmp" CONSTRAINT "snmp_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "ssl" CONSTRAINT "ssl_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
TABLE "weird" CONSTRAINT "weird_uid_fkey" FOREIGN KEY (uid) REFERENCES connection(uid)
I am trying to delete the connections with date 2016-03-10 by using the following:
delete from connection where uid in (select uid from connection where ts > '2016-03-10 00:30:00');
There are around 800.000 records matching this rule, and seems to be taking an awful lot of time - 4 hours and counting. What could be the reason for such a performance hit and how could I optimise this for future cases?
Regards.
> > I am trying to delete the connections with date 2016-03-10 by using the > following: > > > delete from connection where uid in (select uid from connection where ts > > '2016-03-10 00:30:00'); try to rewrite that to : delete from connection where ts > '2016-03-10 00:30:00'; It's simpler - and (maybe) faster. > There are around 800.000 records matching this rule, and seems to be taking > an awful lot of time - 4 hours and counting. What could be the reason for > such a performance hit and how could I optimise this for future cases? > > Regards. the db has to touch such many rows, and has to write the transaction log. And update every index. And it has to check the referenced tables for the constraints. Do you have proper indexes? How large is shared_buffers set? Can you show us the explain (analyse)? Regards, Andreas -- Andreas Kretschmer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
> I am trying to delete the connections with date 2016-03-10 by using the
> following:
>
>
> delete from connection where uid in (select uid from connection where ts >
> '2016-03-10 00:30:00');
try to rewrite that to :
delete from connection where ts > '2016-03-10 00:30:00';
It's simpler - and (maybe) faster.
> There are around 800.000 records matching this rule, and seems to be taking
> an awful lot of time - 4 hours and counting. What could be the reason for
> such a performance hit and how could I optimise this for future cases?
>
> Regards.
the db has to touch such many rows, and has to write the transaction log. And
update every index. And it has to check the referenced tables for the
constraints. Do you have proper indexes?
On Tuesday, March 15, 2016, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, March 15, 2016, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
>
> I am trying to delete the connections with date 2016-03-10 by using the
> following:
>
>
> delete from connection where uid in (select uid from connection where ts >
> '2016-03-10 00:30:00');
try to rewrite that to :
delete from connection where ts > '2016-03-10 00:30:00';
It's simpler - and (maybe) faster.It also gives a different answer...
> > Given the lack of indexes on the one table that is shown I suspect this is > the most likely cause (FK + indexes) > right, there should be an index on ts. If it is a 9.5 AND the rows are ordered on ts a BRIN-index would be fine. If not - a normal index. How many rows contains the table? -- Andreas Kretschmer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>From: David G. Johnston
>>Sent: Tuesday, March 15, 2016 6:25 PM
> There are around 800.000 records matching this rule, and seems to be taking
> an awful lot of time - 4 hours and counting. What could be the reason for
> such a performance hit and how could I optimise this for future cases?
the db has to touch such many rows, and has to write the transaction log. And
update every index. And it has to check the referenced tables for the
constraints. Do you have proper indexes?
Given the lack of indexes on the one table that is shown I suspect this is the most likely cause (FK + indexes)
David J.
<<
There are SEVEN FKs against that table…I would bet that’s 50% of the duration. The lack of an index, perhaps an issue, but
With that many FK references plus that many rows…the transaction log could easily blow out and start paging to disk.
When deleting more than perhaps 20k rows, I will normally write a delete loop, grabbing roughly 20-50k rows at time (server capacity
dependent), deleting that set, grabbing another set, etc. That allows the set to commit, releasing pressure on the tran log.
You can easily experiment and see how long 10k rows take to delete. If still long, the FKs are the issue…you may need to script them out,
drop them, run the deletes, then rebuild them.
Mike S.
select count(uid) from connection where uid in (select uid from connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10 01:10:00');
count
-------
2156
(1 row)
explain (analyze) delete from connection where uid in (select uid from connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10 01:10:00');
-------------------------------------------------------------------------------------------------------------------------------------------------
Delete on connection (cost=0.43..174184.31 rows=7756 width=12) (actual time=529.739..529.739 rows=0 loops=1)
-> Nested Loop (cost=0.43..174184.31 rows=7756 width=12) (actual time=0.036..526.295 rows=2156 loops=1)
-> Seq Scan on connection connection_1 (cost=0.00..115684.55 rows=7756 width=24) (actual time=0.020..505.012 rows=2156 loops=1)
Filter: ((ts > '2016-03-10 01:00:00'::timestamp without time zone) AND (ts < '2016-03-10 01:10:00'::timestamp without time zone))
Rows Removed by Filter: 3108811
-> Index Scan using connection_pkey on connection (cost=0.43..7.53 rows=1 width=24) (actual time=0.009..0.010 rows=1 loops=2156)
Index Cond: ((uid)::text = (connection_1.uid)::text)
Planning time: 0.220 ms
Trigger for constraint dns_uid_fkey: time=133.046 calls=2156
Trigger for constraint files_uid_fkey: time=39780.799 calls=2156
Trigger for constraint http_uid_fkey: time=99300.851 calls=2156
Trigger for constraint notice_uid_fkey: time=128.653 calls=2156
Trigger for constraint snmp_uid_fkey: time=59.491 calls=2156
Trigger for constraint ssl_uid_fkey: time=74.052 calls=2156
Trigger for constraint weird_uid_fkey: time=25868.651 calls=2156
Execution time: 165880.419 ms
(16 rows)
>
> Given the lack of indexes on the one table that is shown I suspect this is
> the most likely cause (FK + indexes)
>
> right, there should be an index on ts. If it is a 9.5 AND the rows are ordered
> on ts a BRIN-index would be fine.
> If not - a normal index. How many rows contains the table?
count
---------
3108811
(1 row)
> There are SEVEN FKs against that table…I would bet that’s 50% of the duration. The lack of an index,
> perhaps an issue, but
> With that many FK references plus that many rows…the transaction log could easily blow out and start
> paging to disk.
I think you are right, fk seem to take the biggest chunk of time from the hole delete operation. I made a test with 10.000 rows, it took 12 minutes. 20.000 rows took about 25 minutes to delete.
Regards to all.
>
> I am trying to delete the connections with date 2016-03-10 by using the
> following:
>
>
> delete from connection where uid in (select uid from connection where ts >
> '2016-03-10 00:30:00');
try to rewrite that to :
delete from connection where ts > '2016-03-10 00:30:00';
It's simpler - and (maybe) faster.
> There are around 800.000 records matching this rule, and seems to be taking
> an awful lot of time - 4 hours and counting. What could be the reason for
> such a performance hit and how could I optimise this for future cases?
>
> Regards.
the db has to touch such many rows, and has to write the transaction log. And
update every index. And it has to check the referenced tables for the
constraints. Do you have proper indexes?
How large is shared_buffers set?
Can you show us the explain (analyse)?
Regards, Andreas
--
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
ivo liondov <ivo.liondov@gmail.com> wrote: > > explain (analyze) delete from connection where uid in (select uid from > connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10 01:10:00'); > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > Delete on connection (cost=0.43..174184.31 rows=7756 width=12) (actual time= > 529.739..529.739 rows=0 loops=1) > > -> Nested Loop (cost=0.43..174184.31 rows=7756 width=12) (actual time= > 0.036..526.295 rows=2156 loops=1) > > -> Seq Scan on connection connection_1 (cost=0.00..115684.55 rows= > 7756 width=24) (actual time=0.020..505.012 rows=2156 loops=1) > > Filter: ((ts > '2016-03-10 01:00:00'::timestamp without time > zone) AND (ts < '2016-03-10 01:10:00'::timestamp without time zone)) there is no index on the ts-column. > > Rows Removed by Filter: 3108811 > > -> Index Scan using connection_pkey on connection (cost=0.43..7.53 > rows=1 width=24) (actual time=0.009..0.010 rows=1 loops=2156) > > Index Cond: ((uid)::text = (connection_1.uid)::text) > > Planning time: 0.220 ms > > Trigger for constraint dns_uid_fkey: time=133.046 calls=2156 > > Trigger for constraint files_uid_fkey: time=39780.799 calls=2156 > > Trigger for constraint http_uid_fkey: time=99300.851 calls=2156 > > Trigger for constraint notice_uid_fkey: time=128.653 calls=2156 > > Trigger for constraint snmp_uid_fkey: time=59.491 calls=2156 > > Trigger for constraint ssl_uid_fkey: time=74.052 calls=2156 > > Trigger for constraint weird_uid_fkey: time=25868.651 calls=2156 > > Execution time: 165880.419 ms i guess there are no indexes for this tables and the relevant columns > I think you are right, fk seem to take the biggest chunk of time from the hole > delete operation. I made a test with 10.000 rows, it took 12 minutes. 20.000 > rows took about 25 minutes to delete. create the missing indexes now and come back with the new duration. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
I agree with Andreas (indexes) - 10 minutes to delete 10k rows is about 9.5 minutes too long. Either the "select" part of the query can't find the rows quickly or the FK burden is crushing the life out of it. If every involved table has an index on their Primary Key then the 10k row delete should take maybe 30-60 seconds. Highly dependent on how many FK rows are involved. And...from a db design perspective, a table referenced by 7 FKs shouldn't be having this type of delete run against it...it's just too expensive if it must happen routinely. This is where de-normalization might be called for, to collapse some of those references, or a shift to stored functions that maintain integrity versus the declared foreign keys maintaining it. Mike S. -----Original Message----- From: Andreas Kretschmer Sent: Wednesday, March 16, 2016 4:58 AM ivo liondov <ivo.liondov@gmail.com> wrote: > > explain (analyze) delete from connection where uid in (select uid from > connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10 > 01:10:00'); > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ----- > > Delete on connection (cost=0.43..174184.31 rows=7756 width=12) > (actual time= > 529.739..529.739 rows=0 loops=1) > > -> Nested Loop (cost=0.43..174184.31 rows=7756 width=12) (actual > time= > 0.036..526.295 rows=2156 loops=1) > > -> Seq Scan on connection connection_1 > (cost=0.00..115684.55 rows= > 7756 width=24) (actual time=0.020..505.012 rows=2156 loops=1) > > Filter: ((ts > '2016-03-10 01:00:00'::timestamp without > time > zone) AND (ts < '2016-03-10 01:10:00'::timestamp without time zone)) there is no index on the ts-column. > > Rows Removed by Filter: 3108811 > > -> Index Scan using connection_pkey on connection > (cost=0.43..7.53 > rows=1 width=24) (actual time=0.009..0.010 rows=1 loops=2156) > > Index Cond: ((uid)::text = (connection_1.uid)::text) > > Planning time: 0.220 ms > > Trigger for constraint dns_uid_fkey: time=133.046 calls=2156 > > Trigger for constraint files_uid_fkey: time=39780.799 calls=2156 > > Trigger for constraint http_uid_fkey: time=99300.851 calls=2156 > > Trigger for constraint notice_uid_fkey: time=128.653 calls=2156 > > Trigger for constraint snmp_uid_fkey: time=59.491 calls=2156 > > Trigger for constraint ssl_uid_fkey: time=74.052 calls=2156 > > Trigger for constraint weird_uid_fkey: time=25868.651 calls=2156 > > Execution time: 165880.419 ms i guess there are no indexes for this tables and the relevant columns > I think you are right, fk seem to take the biggest chunk of time from > the hole delete operation. I made a test with 10.000 rows, it took 12 > minutes. 20.000 rows took about 25 minutes to delete. create the missing indexes now and come back with the new duration. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
bro=# select count(ts) from connection where ts >'2016-03-10 00:00:00';
count
--------
546997
(1 row)
bro=# select now(); delete from connection where uid in (select uid from connection where ts > '2016-03-10 00:00:00' ); select now();
now
-------------------------------
2016-03-16 14:02:43.172617+00
(1 row)
DELETE 546997
now
-------------------------------
2016-03-16 14:04:25.608695+00
(1 row)
Took less than 2 minutes. Thanks for the help.
ivo liondov <ivo.liondov@gmail.com> wrote:
>
> explain (analyze) delete from connection where uid in (select uid from
> connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10 01:10:00');
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> Delete on connection (cost=0.43..174184.31 rows=7756 width=12) (actual time=
> 529.739..529.739 rows=0 loops=1)
>
> -> Nested Loop (cost=0.43..174184.31 rows=7756 width=12) (actual time=
> 0.036..526.295 rows=2156 loops=1)
>
> -> Seq Scan on connection connection_1 (cost=0.00..115684.55 rows=
> 7756 width=24) (actual time=0.020..505.012 rows=2156 loops=1)
>
> Filter: ((ts > '2016-03-10 01:00:00'::timestamp without time
> zone) AND (ts < '2016-03-10 01:10:00'::timestamp without time zone))
there is no index on the ts-column.
>
> Rows Removed by Filter: 3108811
>
> -> Index Scan using connection_pkey on connection (cost=0.43..7.53
> rows=1 width=24) (actual time=0.009..0.010 rows=1 loops=2156)
>
> Index Cond: ((uid)::text = (connection_1.uid)::text)
>
> Planning time: 0.220 ms
>
> Trigger for constraint dns_uid_fkey: time=133.046 calls=2156
>
> Trigger for constraint files_uid_fkey: time=39780.799 calls=2156
>
> Trigger for constraint http_uid_fkey: time=99300.851 calls=2156
>
> Trigger for constraint notice_uid_fkey: time=128.653 calls=2156
>
> Trigger for constraint snmp_uid_fkey: time=59.491 calls=2156
>
> Trigger for constraint ssl_uid_fkey: time=74.052 calls=2156
>
> Trigger for constraint weird_uid_fkey: time=25868.651 calls=2156
>
> Execution time: 165880.419 ms
i guess there are no indexes for this tables and the relevant columns
> I think you are right, fk seem to take the biggest chunk of time from the hole
> delete operation. I made a test with 10.000 rows, it took 12 minutes. 20.000
> rows took about 25 minutes to delete.
create the missing indexes now and come back with the new duration.
Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
> having this type
> of delete run against it...it's just too expensive if it must happen
> routinely.
I agree with Andreas (indexes) - 10 minutes to delete 10k rows is about 9.5
minutes too long.
Either the "select" part of the query can't find the rows quickly or the FK
burden is crushing the life out of it.
If every involved table has an index on their Primary Key then the 10k row
delete should take
maybe 30-60 seconds. Highly dependent on how many FK rows are involved.
And...from a db design perspective, a table referenced by 7 FKs shouldn't be
having this type
of delete run against it...it's just too expensive if it must happen
routinely. This is where
de-normalization might be called for, to collapse some of those references,
or a shift to
stored functions that maintain integrity versus the declared foreign keys
maintaining it.
Mike S.
-----Original Message-----
From: Andreas Kretschmer
Sent: Wednesday, March 16, 2016 4:58 AM
ivo liondov <ivo.liondov@gmail.com> wrote:
>
> explain (analyze) delete from connection where uid in (select uid from
> connection where ts > '2016-03-10 01:00:00' and ts < '2016-03-10
> 01:10:00');
>
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----
>
> Delete on connection (cost=0.43..174184.31 rows=7756 width=12)
> (actual time=
> 529.739..529.739 rows=0 loops=1)
>
> -> Nested Loop (cost=0.43..174184.31 rows=7756 width=12) (actual
> time=
> 0.036..526.295 rows=2156 loops=1)
>
> -> Seq Scan on connection connection_1
> (cost=0.00..115684.55 rows=
> 7756 width=24) (actual time=0.020..505.012 rows=2156 loops=1)
>
> Filter: ((ts > '2016-03-10 01:00:00'::timestamp without
> time
> zone) AND (ts < '2016-03-10 01:10:00'::timestamp without time zone))
there is no index on the ts-column.
>
> Rows Removed by Filter: 3108811
>
> -> Index Scan using connection_pkey on connection
> (cost=0.43..7.53
> rows=1 width=24) (actual time=0.009..0.010 rows=1 loops=2156)
>
> Index Cond: ((uid)::text = (connection_1.uid)::text)
>
> Planning time: 0.220 ms
>
> Trigger for constraint dns_uid_fkey: time=133.046 calls=2156
>
> Trigger for constraint files_uid_fkey: time=39780.799 calls=2156
>
> Trigger for constraint http_uid_fkey: time=99300.851 calls=2156
>
> Trigger for constraint notice_uid_fkey: time=128.653 calls=2156
>
> Trigger for constraint snmp_uid_fkey: time=59.491 calls=2156
>
> Trigger for constraint ssl_uid_fkey: time=74.052 calls=2156
>
> Trigger for constraint weird_uid_fkey: time=25868.651 calls=2156
>
> Execution time: 165880.419 ms
i guess there are no indexes for this tables and the relevant columns
> I think you are right, fk seem to take the biggest chunk of time from
> the hole delete operation. I made a test with 10.000 rows, it took 12
> minutes. 20.000 rows took about 25 minutes to delete.
create the missing indexes now and come back with the new duration.
Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
ivo liondov <ivo.liondov@gmail.com> wrote: > Took less than 2 minutes. Thanks for the help. you are welcome. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°