Обсуждение: Performance problems deleting data
Hello --------------------------- Postgresql version: 8.1.10 4GB RAM 2x HP 72GB 10K SAS RAID1/smartarray --------------------------- I have a colleague that is having som performance problems from time to time when deleting some rows from a table. We found out that the database having this problem had a severe bloat problem in many tables and indexes (they were running only autovacuum) and some misconfiguration in postgresql.conf. What we did to fix the situation was: 1) Stop the application accessing the database. 2) Change these parameters in postgresql.conf: --------------------------------- shared_buffers = 108157 work_mem = 16384 maintenance_work_mem = 262144 max_fsm_pages = 800000 wal_buffers = 64 checkpoint_segments = 128 random_page_cost = 2.0 effective_cache_size = 255479 default_statistics_target = 400 --------------------------------- 3) Update /etc/sysctl.conf with new values for kernel.shmmax and kernel.shmall 3) Run 'VACUUM FULL VERBOSE' 4) Run 'REINDEX DATABASE <dbname>' 5) Run 'ANALYZE VERBOSE' 6) Define a 'VACUUM VERBOSE ANALYZE' in crontab 7) Start the application. These changes helped a lot, the size of the database when down from 7GB to 1GB and most of the deletes work as they are suppose to. But from time to time a single deletion takes a lot of time to finish. The output from explain analyze doesn't show anything wrong, as long as I can see. The definition of the table 'module' is: ------------------------------------------------------------------------- manage=# \d module Table "public.module" Column | Type | Modifiers -----------+-----------------------------+----------------------------------------------------------- moduleid | integer | not null default nextval('module_moduleid_seq'::regclass) deviceid | integer | not null netboxid | integer | not null module | integer | not null model | character varying | descr | character varying | up | character(1) | not null default 'y'::bpchar downsince | timestamp without time zone | Indexes: "module_pkey" PRIMARY KEY, btree (moduleid) "module_deviceid_key" UNIQUE, btree (deviceid) "module_netboxid_key" UNIQUE, btree (netboxid, module) Check constraints: "module_up" CHECK (up = 'y'::bpchar OR up = 'n'::bpchar) Foreign-key constraints: "$1" FOREIGN KEY (deviceid) REFERENCES device(deviceid) ON UPDATE CASCADE ON DELETE CASCADE "$2" FOREIGN KEY (netboxid) REFERENCES netbox(netboxid) ON UPDATE CASCADE ON DELETE CASCADE Rules: close_alerthist_modules AS ON DELETE TO module DO UPDATE alerthist SET end_time = now() WHERE (alerthist.eventtypeid::text = 'moduleState'::text OR alerthist.eventtypeid::text = 'linkState'::text) AND alerthist.end_time = 'infinity'::timestamp without time zone AND alerthist.deviceid = old.deviceid ------------------------------------------------------------------------- manage=# EXPLAIN ANALYZE DELETE FROM module WHERE deviceid='7298'; QUERY PLAN ------------------------------------------------------------------------- Nested Loop (cost=0.00..14.63 rows=1 width=67) (actual time=2.365..2.365 rows=0 loops=1) -> Index Scan using alerthist_end_time_btree on alerthist (cost=0.00..10.65 rows=1 width=67) (actual time=2.363..2.363 rows=0 loops=1) Index Cond: (end_time = 'infinity'::timestamp without time zone) Filter: ((((eventtypeid)::text = 'moduleState'::text) OR ((eventtypeid)::text = 'linkState'::text)) AND (7298 = deviceid)) -> Index Scan using module_deviceid_key on module (cost=0.00..3.96 rows=1 width=4) (never executed) Index Cond: (deviceid = 7298) Total runtime: 2.546 ms Index Scan using module_deviceid_key on module (cost=0.00..3.96 rows=1 width=6) (actual time=0.060..0.061 rows=1 loops=1) Index Cond: (deviceid = 7298) Trigger for constraint $1: time=3.422 calls=1 Trigger for constraint $1: time=0.603 calls=1 Total runtime: 2462558.813 ms (13 rows) ------------------------------------------------------------------------- Any ideas why it is taking 2462558.813 ms to finish when the total time for the deletion is 2.546 ms + 3.422 ms + 0.603ms? The deletion of a row in the 'module' table involves several deletions/updates in many other tables in the database related by foreign keys (with ON DELETE CASCADE) and triggers. I suppose that an open transaction in one of these not directly releated tables to 'module' could lock the deletion without showing in EXPLAIN ANALYZE?. The two 'Trigger for constraint' in the EXPLAIN ANALYZE output only show two tables having an attribute as a foreign key in 'module', but if these two tables have to wait for other tables, that would not show anywhere? (only in pg_locks) Thanks in advance regards -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > manage=# EXPLAIN ANALYZE DELETE FROM module WHERE deviceid='7298'; > QUERY PLAN > ------------------------------------------------------------------------- > Nested Loop (cost=0.00..14.63 rows=1 width=67) (actual > time=2.365..2.365 rows=0 loops=1) > -> Index Scan using alerthist_end_time_btree on alerthist > (cost=0.00..10.65 rows=1 width=67) (actual time=2.363..2.363 rows=0 loops=1) > Index Cond: (end_time = 'infinity'::timestamp without time zone) > Filter: ((((eventtypeid)::text = 'moduleState'::text) OR > ((eventtypeid)::text = 'linkState'::text)) AND (7298 = deviceid)) > -> Index Scan using module_deviceid_key on module (cost=0.00..3.96 > rows=1 width=4) (never executed) > Index Cond: (deviceid = 7298) > Total runtime: 2.546 ms > Index Scan using module_deviceid_key on module (cost=0.00..3.96 rows=1 > width=6) (actual time=0.060..0.061 rows=1 loops=1) > Index Cond: (deviceid = 7298) > Trigger for constraint $1: time=3.422 calls=1 > Trigger for constraint $1: time=0.603 calls=1 > Total runtime: 2462558.813 ms > (13 rows) > ------------------------------------------------------------------------- > Any ideas why it is taking 2462558.813 ms to finish when the total time > for the deletion is 2.546 ms + 3.422 ms + 0.603ms? That's just bizarre. So far as I can see from the 8.1 EXPLAIN code, the only place the extra time could be spent is in ExecutorStart, ExecutorEnd, or the top level of ExecutorRun, none of which should take any noticeable amount of runtime in a trivial query like this. The only thing I can think of is that ExecutorStart would have been where we'd acquire RowExclusiveLock on "module", while the previous rule-generated query would only take AccessShareLock. So if for instance some other transaction had ShareLock (perhaps from CREATE INDEX) and just sat a long time before committing, perhaps this would be explainable. I'm not too sure about that explanation though because I think the parser should have already taken RowExclusiveLock when it was doing parse analysis. Is the problem repeatable? Is the delay consistent? What do you see in pg_locks while it's delaying? Also watch "vmstat 1" output --- is it consuming CPU and/or I/O? regards, tom lane
Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > >> Any ideas why it is taking 2462558.813 ms to finish when the total time >> for the deletion is 2.546 ms + 3.422 ms + 0.603ms? > Hei Tom, I got this information from my colleague: > Is the problem repeatable? Repeatable as in about 30+ times every day, the deletion of a row takes more than 100 seconds. I have not found a way to provoke it though. > Is the delay consistent? No. I see frequently everything from below the 8 seconds log_min_duration_statement to about 4900 seconds. As for distribution, about half of the 30+ takes more than 500 seconds to complete, the rest (obviously) between 100 and 500 seconds. > What do you see in pg_locks while it's delaying? locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted ---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+------------------+--------- relation | 16393 | 16784 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16784 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 17176 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 16794 | | | | | | | 82180131 | 19907 | AccessShareLock | t relation | 16393 | 16794 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16977 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16977 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 16800 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16800 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 17174 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t transactionid | | | | | 80430155 | | | | 80430155 | 29569 | ExclusiveLock | t relation | 16393 | 17164 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16816 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16816 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 16812 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16812 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 17174 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16977 | | | | | | | 82180131 | 19907 | AccessShareLock | t relation | 16393 | 16977 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16784 | | | | | | | 82180131 | 19907 | AccessShareLock | t relation | 16393 | 16784 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16766 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16766 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 16977 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16977 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 17164 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16766 | | | | | | | 82180131 | 19907 | AccessShareLock | t relation | 16393 | 16766 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 10342 | | | | | | | 82180134 | 31646 | AccessShareLock | t relation | 16393 | 16794 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16794 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 16835 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16835 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 17176 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16800 | | | | | | | 82180131 | 19907 | AccessShareLock | t relation | 16393 | 16800 | | | | | | | 82180131 | 19907 | RowExclusiveLock | t relation | 16393 | 16821 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16821 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 17174 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 16730 | | | | | | | 80430155 | 29569 | AccessShareLock | t transactionid | | | | | 82179669 | | | | 82179669 | 19906 | ExclusiveLock | t relation | 16393 | 16800 | | | | | | | 82179843 | 19890 | AccessShareLock | t relation | 16393 | 16800 | | | | | | | 82179843 | 19890 | RowExclusiveLock | t relation | 16393 | 16784 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16784 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 16766 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16766 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t relation | 16393 | 16794 | | | | | | | 82179669 | 19906 | AccessShareLock | t relation | 16393 | 16794 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t transactionid | | | | | 82180134 | | | | 82180134 | 31646 | ExclusiveLock | t transactionid | | | | | 82179843 | | | | 82179843 | 19890 | ExclusiveLock | t relation | 16393 | 17176 | | | | | | | 82179669 | 19906 | RowExclusiveLock | t transactionid | | | | | 82180131 | | | | 82180131 | 19907 | ExclusiveLock | t relation | 16393 | 17164 | | | | | | | 82180131 | 19907 | AccessShareLock | t (54 rows) > Also watch "vmstat 1" output --- is it consuming CPU and/or I/O? > > CPU 50% idle, rest mainly used in "system". Virtually no IO. No blocked processes. An impressive amount of context switches. No swap. An strace(1) of the postgres process may give a hint about the "system" part; this is what it does over and over and over again. The filename does change to a different file in the same directory every now and then, but not often. semop(4227102, 0xbf8ef23a, 1) = 0 semop(4227102, 0xbf8ef67a, 1) = 0 open("pg_subtrans/047B", O_RDWR|O_LARGEFILE) = 12 _llseek(12, 139264, [139264], SEEK_SET) = 0 read(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 close(12) = 0 regards -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez wrote: > CPU 50% idle, rest mainly used in "system". Virtually no IO. No > blocked processes. An impressive amount of context switches. No swap. > > An strace(1) of the postgres process may give a hint about the "system" > part; this is what it does over and over and over again. The filename > does change to a different file in the same directory every now and > then, but not often. > > semop(4227102, 0xbf8ef23a, 1) = 0 > semop(4227102, 0xbf8ef67a, 1) = 0 > open("pg_subtrans/047B", O_RDWR|O_LARGEFILE) = 12 > _llseek(12, 139264, [139264], SEEK_SET) = 0 > read(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 8192) = 8192 > close(12) = 0 Hmm, severe usage of subtransactions? Does it ever write to these pg_subtrans files? I wonder if it's related to the ON DELETE rule that updates alerthist. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.