Обсуждение: Performance problems deleting data

Поиск
Список
Период
Сортировка

Performance problems deleting data

От
Rafael Martinez
Дата:
Hello

---------------------------
Postgresql version: 8.1.10
4GB RAM
2x HP 72GB 10K SAS RAID1/smartarray
---------------------------

I have a colleague that is having som performance problems from time to
time when deleting some rows from a table.

We found out that the database having this problem had a severe bloat
problem in many tables and indexes (they were running only autovacuum)
and some misconfiguration in postgresql.conf.

What we did to fix the situation was:

1) Stop the application accessing the database.
2) Change these parameters in postgresql.conf:
---------------------------------
shared_buffers = 108157
work_mem = 16384
maintenance_work_mem = 262144

max_fsm_pages = 800000

wal_buffers = 64
checkpoint_segments = 128

random_page_cost = 2.0
effective_cache_size = 255479

default_statistics_target = 400
---------------------------------

3) Update /etc/sysctl.conf with new values for kernel.shmmax and
kernel.shmall

3) Run 'VACUUM FULL VERBOSE'
4) Run 'REINDEX DATABASE <dbname>'
5) Run 'ANALYZE VERBOSE'
6) Define a 'VACUUM VERBOSE ANALYZE' in crontab
7) Start the application.

These changes helped a lot, the size of the database when down from 7GB
to 1GB and most of the deletes work as they are suppose to. But from
time to time a single deletion takes a lot of time to finish. The output
from explain analyze doesn't show anything wrong, as long as I can see.

The definition of the table 'module' is:
-------------------------------------------------------------------------
manage=# \d module
                    Table "public.module"
  Column   |            Type             | Modifiers
-----------+-----------------------------+-----------------------------------------------------------
 moduleid  | integer                     | not null default
nextval('module_moduleid_seq'::regclass)
 deviceid  | integer                     | not null
 netboxid  | integer                     | not null
 module    | integer                     | not null
 model     | character varying           |
 descr     | character varying           |
 up        | character(1)                | not null default 'y'::bpchar
 downsince | timestamp without time zone |
Indexes:
    "module_pkey" PRIMARY KEY, btree (moduleid)
    "module_deviceid_key" UNIQUE, btree (deviceid)
    "module_netboxid_key" UNIQUE, btree (netboxid, module)
Check constraints:
    "module_up" CHECK (up = 'y'::bpchar OR up = 'n'::bpchar)
Foreign-key constraints:
    "$1" FOREIGN KEY (deviceid) REFERENCES device(deviceid) ON UPDATE
CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (netboxid) REFERENCES netbox(netboxid) ON UPDATE
CASCADE ON DELETE CASCADE
Rules:
    close_alerthist_modules AS
    ON DELETE TO module DO  UPDATE alerthist SET end_time = now()
  WHERE (alerthist.eventtypeid::text = 'moduleState'::text OR
alerthist.eventtypeid::text = 'linkState'::text) AND alerthist.end_time
= 'infinity'::timestamp without time zone AND alerthist.deviceid =
old.deviceid
-------------------------------------------------------------------------


manage=# EXPLAIN ANALYZE DELETE FROM module WHERE deviceid='7298';
                        QUERY PLAN
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..14.63 rows=1 width=67) (actual
time=2.365..2.365 rows=0 loops=1)
   ->  Index Scan using alerthist_end_time_btree on alerthist
(cost=0.00..10.65 rows=1 width=67) (actual time=2.363..2.363 rows=0 loops=1)
         Index Cond: (end_time = 'infinity'::timestamp without time zone)
         Filter: ((((eventtypeid)::text = 'moduleState'::text) OR
((eventtypeid)::text = 'linkState'::text)) AND (7298 = deviceid))
   ->  Index Scan using module_deviceid_key on module  (cost=0.00..3.96
rows=1 width=4) (never executed)
         Index Cond: (deviceid = 7298)
 Total runtime: 2.546 ms

 Index Scan using module_deviceid_key on module  (cost=0.00..3.96 rows=1
width=6) (actual time=0.060..0.061 rows=1 loops=1)
   Index Cond: (deviceid = 7298)
 Trigger for constraint $1: time=3.422 calls=1
 Trigger for constraint $1: time=0.603 calls=1
 Total runtime: 2462558.813 ms
(13 rows)
-------------------------------------------------------------------------

Any ideas why it is taking 2462558.813 ms to finish when the total time
for the deletion is 2.546 ms + 3.422 ms + 0.603ms?

The deletion of a row in the 'module' table involves several
deletions/updates in many other tables in the database related by
foreign keys (with ON DELETE CASCADE) and triggers.

I suppose that an open transaction in one of these not directly releated
tables to 'module' could lock the deletion without showing in EXPLAIN
ANALYZE?. The two 'Trigger for constraint' in the EXPLAIN ANALYZE output
only show two tables having an attribute as a foreign key in 'module',
but if these two tables have to wait for other tables, that would not
show anywhere? (only in pg_locks)

Thanks in advance
regards
--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: Performance problems deleting data

От
Tom Lane
Дата:
Rafael Martinez <r.m.guerrero@usit.uio.no> writes:
> manage=# EXPLAIN ANALYZE DELETE FROM module WHERE deviceid='7298';
>                         QUERY PLAN
> -------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..14.63 rows=1 width=67) (actual
> time=2.365..2.365 rows=0 loops=1)
>    ->  Index Scan using alerthist_end_time_btree on alerthist
> (cost=0.00..10.65 rows=1 width=67) (actual time=2.363..2.363 rows=0 loops=1)
>          Index Cond: (end_time = 'infinity'::timestamp without time zone)
>          Filter: ((((eventtypeid)::text = 'moduleState'::text) OR
> ((eventtypeid)::text = 'linkState'::text)) AND (7298 = deviceid))
>    ->  Index Scan using module_deviceid_key on module  (cost=0.00..3.96
> rows=1 width=4) (never executed)
>          Index Cond: (deviceid = 7298)
>  Total runtime: 2.546 ms

>  Index Scan using module_deviceid_key on module  (cost=0.00..3.96 rows=1
> width=6) (actual time=0.060..0.061 rows=1 loops=1)
>    Index Cond: (deviceid = 7298)
>  Trigger for constraint $1: time=3.422 calls=1
>  Trigger for constraint $1: time=0.603 calls=1
>  Total runtime: 2462558.813 ms
> (13 rows)
> -------------------------------------------------------------------------

> Any ideas why it is taking 2462558.813 ms to finish when the total time
> for the deletion is 2.546 ms + 3.422 ms + 0.603ms?

That's just bizarre.  So far as I can see from the 8.1 EXPLAIN code,
the only place the extra time could be spent is in ExecutorStart,
ExecutorEnd, or the top level of ExecutorRun, none of which should
take any noticeable amount of runtime in a trivial query like this.

The only thing I can think of is that ExecutorStart would have been
where we'd acquire RowExclusiveLock on "module", while the previous
rule-generated query would only take AccessShareLock.  So if for
instance some other transaction had ShareLock (perhaps from CREATE
INDEX) and just sat a long time before committing, perhaps this
would be explainable.  I'm not too sure about that explanation
though because I think the parser should have already taken
RowExclusiveLock when it was doing parse analysis.

Is the problem repeatable?  Is the delay consistent?  What do
you see in pg_locks while it's delaying?  Also watch "vmstat 1"
output --- is it consuming CPU and/or I/O?

            regards, tom lane

Re: Performance problems deleting data

От
Rafael Martinez
Дата:
Tom Lane wrote:
> Rafael Martinez <r.m.guerrero@usit.uio.no> writes:
>
>> Any ideas why it is taking 2462558.813 ms to finish when the total time
>> for the deletion is 2.546 ms + 3.422 ms + 0.603ms?
>

Hei Tom, I got this information from my colleague:


> Is the problem repeatable?

Repeatable as in about 30+ times every day, the deletion of a row takes
more than 100 seconds.  I have not found a way to provoke it though.

> Is the delay consistent?

No.  I see frequently everything from below the 8 seconds
log_min_duration_statement to about 4900 seconds.  As for distribution,
about half of the 30+ takes more than 500 seconds to complete, the rest
(obviously) between 100 and 500 seconds.

> What do you see in pg_locks while it's delaying?

   locktype    | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction |  pid  |       mode       |
granted

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-------+------------------+---------
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    17176 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    17174 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 transactionid |          |          |      |       |      80430155 |
      |       |          |    80430155 | 29569 | ExclusiveLock    | t
 relation      |    16393 |    17164 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16816 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16816 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    16812 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16812 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    17174 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16977 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    17164 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    10342 |      |       |               |
      |       |          |    82180134 | 31646 | AccessShareLock  | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    16835 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16835 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    17176 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82180131 | 19907 | RowExclusiveLock | t
 relation      |    16393 |    16821 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16821 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    17174 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    16730 |      |       |               |
      |       |          |    80430155 | 29569 | AccessShareLock  | t
 transactionid |          |          |      |       |      82179669 |
      |       |          |    82179669 | 19906 | ExclusiveLock    | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82179843 | 19890 | AccessShareLock  | t
 relation      |    16393 |    16800 |      |       |               |
      |       |          |    82179843 | 19890 | RowExclusiveLock | t
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16784 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16766 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82179669 | 19906 | AccessShareLock  | t
 relation      |    16393 |    16794 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 transactionid |          |          |      |       |      82180134 |
      |       |          |    82180134 | 31646 | ExclusiveLock    | t
 transactionid |          |          |      |       |      82179843 |
      |       |          |    82179843 | 19890 | ExclusiveLock    | t
 relation      |    16393 |    17176 |      |       |               |
      |       |          |    82179669 | 19906 | RowExclusiveLock | t
 transactionid |          |          |      |       |      82180131 |
      |       |          |    82180131 | 19907 | ExclusiveLock    | t
 relation      |    16393 |    17164 |      |       |               |
      |       |          |    82180131 | 19907 | AccessShareLock  | t
(54 rows)


> Also watch "vmstat 1" output --- is it consuming CPU and/or I/O?
>
>

CPU 50% idle, rest mainly used in "system".  Virtually no IO.  No
blocked processes.  An impressive amount of context switches.  No swap.

An strace(1) of the postgres process may give a hint about the "system"
part; this is what it does over and over and over again.  The filename
does change to a different file in the same directory every now and
then, but not often.

semop(4227102, 0xbf8ef23a, 1)           = 0
semop(4227102, 0xbf8ef67a, 1)           = 0
open("pg_subtrans/047B", O_RDWR|O_LARGEFILE) = 12
_llseek(12, 139264, [139264], SEEK_SET) = 0
read(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
close(12)                               = 0


regards
--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

Re: Performance problems deleting data

От
Alvaro Herrera
Дата:
Rafael Martinez wrote:

> CPU 50% idle, rest mainly used in "system".  Virtually no IO.  No
> blocked processes.  An impressive amount of context switches.  No swap.
>
> An strace(1) of the postgres process may give a hint about the "system"
> part; this is what it does over and over and over again.  The filename
> does change to a different file in the same directory every now and
> then, but not often.
>
> semop(4227102, 0xbf8ef23a, 1)           = 0
> semop(4227102, 0xbf8ef67a, 1)           = 0
> open("pg_subtrans/047B", O_RDWR|O_LARGEFILE) = 12
> _llseek(12, 139264, [139264], SEEK_SET) = 0
> read(12, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 8192) = 8192
> close(12)                               = 0

Hmm, severe usage of subtransactions?  Does it ever write to these
pg_subtrans files?  I wonder if it's related to the ON DELETE rule that
updates alerthist.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.