Обсуждение: Slow deletes
Can anyone explain why these deletes are extremely slow? ==================================== => select version(); version --------------------------------------------------------------------- PostgreSQL 7.2.1 on i386-unknown-openbsd3.0, compiled by GCC 2.95.3 (1 row) ==================================== => \d syslog_event Table "syslog_event" Column | Type | Modifiers -----------------+--------------------------+------------------------------------------------------- event_id | bigint | not null default nextval('syslog_event_id_seq'::text) signature_name | text | facility | character(10) | priority | character(10) | timestamp | timestamp with time zone | host_id | bigint | not null raw_message | text | clean_message | text | marked_message | text | remote_event_id | bigint | not null Indexes: syslog_event_event_id_idx, syslog_event_idx_tmp, syslog_event_timestamp_idx Primary key: syslog_event_pkey Unique keys: syslog_event_host_id_key Triggers: RI_ConstraintTrigger_13220921, RI_ConstraintTrigger_13220965, RI_ConstraintTrigger_13220967, syslog_event_after_insert_trg ==================================== => \d syslog_event_pkey Index "syslog_event_pkey" Column | Type ----------+-------- event_id | bigint unique btree (primary key) ==================================== => explain delete from syslog_event where event_id = 1001; NOTICE: QUERY PLAN: Seq Scan on syslog_event (cost=0.00..342277.67 rows=1 width=6) EXPLAIN ==================================== There are over 5,000,000 rows in the table. The triggers are only for inserts or for existence constraints. There are other tables, but none have existence constraints to this table. I don't quite understand why, when there exists an (unique) index, this would use a sequential scan. (Note: I am actually trying to delete many rows, but this cost is for a single row). Regards, Ed
Edmund Dengler <edmundd@eSentire.com> writes: > Can anyone explain why these deletes are extremely slow? > => explain delete from syslog_event where event_id = 1001; > NOTICE: QUERY PLAN: > Seq Scan on syslog_event (cost=0.00..342277.67 rows=1 width=6) > There are over 5,000,000 rows in the table. Seqscan on a 5M-row table will take a little while... Your problem is that it's not using an indexscan, and the reason for that is that '1001' is taken as an integer not a bigint. The system is not smart about optimizing cross-datatype comparisons into indexscans. You could write delete from syslog_event where event_id = 1001::int8; (or use CAST if you want to be pedantically standards-compliant). Alternatively, consider whether event_id really needs to be bigint. There's a clear notational advantage in plain integer. Yes, it'd be nice if "bigintcol = 1001" acted more reasonably, and someday we'll make it happen ... but doing so without breaking the system's type-extensibility features is not trivial. regards, tom lane
Thanks! That seems to have been the issue! => explain delete from syslog_event where event_id = 1000::int8; NOTICE: QUERY PLAN: Index Scan using syslog_event_event_id_idx on syslog_event (cost=0.00..3.02 rows=1 width=6) EXPLAIN Deleting a single row now takes only about 5 seconds. Regards, Ed On Mon, 12 Aug 2002, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Can anyone explain why these deletes are extremely slow? > > > => explain delete from syslog_event where event_id = 1001; > > NOTICE: QUERY PLAN: > > > Seq Scan on syslog_event (cost=0.00..342277.67 rows=1 width=6) > > > There are over 5,000,000 rows in the table. > > Seqscan on a 5M-row table will take a little while... > > Your problem is that it's not using an indexscan, and the reason > for that is that '1001' is taken as an integer not a bigint. The > system is not smart about optimizing cross-datatype comparisons > into indexscans. You could write > > delete from syslog_event where event_id = 1001::int8; > > (or use CAST if you want to be pedantically standards-compliant). > Alternatively, consider whether event_id really needs to be bigint. > There's a clear notational advantage in plain integer. > > Yes, it'd be nice if "bigintcol = 1001" acted more reasonably, > and someday we'll make it happen ... but doing so without breaking > the system's type-extensibility features is not trivial. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Edmund Dengler <edmundd@eSentire.com> writes: > Thanks! That seems to have been the issue! > => explain delete from syslog_event where event_id = 1000::int8; > NOTICE: QUERY PLAN: > Index Scan using syslog_event_event_id_idx on syslog_event > (cost=0.00..3.02 rows=1 width=6) That's better. > Deleting a single row now takes only about 5 seconds. Hm, still seems like a huge amount of time to delete a single row. I suspect query-optimization problems inside your triggers. Can't say much without more details though. regards, tom lane
Anyway to conveniently dump the triggers (3 are constraint check, my own personal one is suppose to be on inserts only)? Regards, Ed P.S. Trying to delete a couple of thousand rows, and while the explain is indeed faster (and I believe it is getting to the actual delete part faster), it is still taking quite a long time. On Mon, 12 Aug 2002, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Thanks! That seems to have been the issue! > > => explain delete from syslog_event where event_id = 1000::int8; > > NOTICE: QUERY PLAN: > > > Index Scan using syslog_event_event_id_idx on syslog_event > > (cost=0.00..3.02 rows=1 width=6) > > That's better. > > > Deleting a single row now takes only about 5 seconds. > > Hm, still seems like a huge amount of time to delete a single row. > I suspect query-optimization problems inside your triggers. > Can't say much without more details though. > > regards, tom lane >
Edmund Dengler <edmundd@eSentire.com> writes: > Anyway to conveniently dump the triggers (3 are constraint check, my own > personal one is suppose to be on inserts only)? The RI on-delete trigger must be the issue then. Lack of indexes, or linking to a foreign key of a different column type are the trouble causes I've seen there ... regards, tom lane
Aha, I did have a dependency I missed. I did a pull from a dump: CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "syslog_event" FROM "syslog_event_message_event" NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_cascade_del" ('<unnamed>', 'syslog_event_message_event', 'syslog_event', 'UNSPECIFIED', 'event_id', 'event_id'); Both columns are the same type (int8). => \d syslog_event_message_event Table "syslog_event_message_event" Column | Type | Modifiers ------------+--------+----------- message_id | bigint | not null event_id | bigint | not null Primary key: syslog_event_message_event_pkey Triggers: RI_ConstraintTrigger_13220957 The table contains no rows (previously deleted, vacuumed, and analyzed). By the by, it seems from the various timings of this (and knowing how long it takes to dump/rebuild the database in general), I could have solved my specific problem by dumping the database, deleting the required rows using sed/grep/vi, and rebuilt the database. While this may be a bit of an exaggeration, it is not far off! Somehow, this just seems wrong. I must be doing something strange to have such slow deletes, and I can't seem to find it. Regards, Ed On Mon, 12 Aug 2002, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Anyway to conveniently dump the triggers (3 are constraint check, my own > > personal one is suppose to be on inserts only)? > > The RI on-delete trigger must be the issue then. Lack of indexes, or > linking to a foreign key of a different column type are the trouble > causes I've seen there ... > > regards, tom lane >
On Mon, Aug 12, 2002 at 10:34:23PM -0400, Tom Lane wrote: > Edmund Dengler <edmundd@eSentire.com> writes: > > Can anyone explain why these deletes are extremely slow? > > > => explain delete from syslog_event where event_id = 1001; > > NOTICE: QUERY PLAN: > > > Seq Scan on syslog_event (cost=0.00..342277.67 rows=1 width=6) > > > There are over 5,000,000 rows in the table. > > Seqscan on a 5M-row table will take a little while... > > Your problem is that it's not using an indexscan, and the reason > for that is that '1001' is taken as an integer not a bigint. The > system is not smart about optimizing cross-datatype comparisons > into indexscans. You could write > > delete from syslog_event where event_id = 1001::int8; > > (or use CAST if you want to be pedantically standards-compliant). > Alternatively, consider whether event_id really needs to be bigint. > There's a clear notational advantage in plain integer. > > Yes, it'd be nice if "bigintcol = 1001" acted more reasonably, > and someday we'll make it happen ... but doing so without breaking > the system's type-extensibility features is not trivial. Actually, an easier way to do it is by saying: delete from syslog_event where event_id = '1001' which works fine, since the quoted value is listed as 'unknown' rather than 'int4' I considered a patch to make the parser treat all numbers as unknown, but that would break any code that relies on numbers-are-int4 (think function and operator type resolution). Sometimes you can't win. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
Edmund Dengler <edmundd@eSentire.com> writes: > Aha, I did have a dependency I missed. I did a pull from a dump: > CREATE CONSTRAINT TRIGGER "<unnamed>" AFTER DELETE ON "syslog_event" > FROM "syslog_event_message_event" NOT DEFERRABLE INITIALLY IMMEDIATE > FOR EACH ROW EXECUTE PROCEDURE > "RI_FKey_cascade_del" ('<unnamed>', 'syslog_event_message_event', > 'syslog_event', 'UNSPECIFIED', 'event_id', 'event_id'); > Both columns are the same type (int8). > => \d syslog_event_message_event > Table "syslog_event_message_event" > Column | Type | Modifiers > ------------+--------+----------- > message_id | bigint | not null > event_id | bigint | not null > Primary key: syslog_event_message_event_pkey > Triggers: RI_ConstraintTrigger_13220957 > The table contains no rows (previously deleted, vacuumed, and analyzed). I can't tell from this which column is the primary key? (Fortunately someone improved the \d output so that that will be apparent in 7.3...) Generally you want to be sure that indexes are available on both the referenced and referencing columns of a foreign-key relationship... regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I considered a patch to make the parser treat all numbers as unknown, but > that would break any code that relies on numbers-are-int4 (think function > and operator type resolution). Right ... if there were a simple solution we'd have done it already. The line of thought that's been discussed in the past is to add an UNKNOWNNUMERIC placeholder type, with similar behavior to UNKNOWN except that it's used for something that was a numeric-looking literal rather than a string-looking literal. I don't recall that anyone worked out a complete proposal --- check the archives if you want to work on this. regards, tom lane