zabbix on postgresql - very slow delete of events

Поиск
Список
Период
Сортировка
От Kristian Ejvind
Тема zabbix on postgresql - very slow delete of events
Дата
Msg-id 3EC00A37-57CF-4B43-A737-968A87B02641@contoso.com
обсуждение исходный текст
Ответы Re: zabbix on postgresql - very slow delete of events  (Kenneth Marshall <ktm@rice.edu>)
Re: zabbix on postgresql - very slow delete of events  (Maxim Boguk <maxim.boguk@gmail.com>)
Список pgsql-performance

Hi

 

This will be a rather lengthy post, just to give the full (I hope) picture. We're using Zabbix for monitoring and I'm having problems

understanding why the deletion of rows in the events table is so slow.

 

Zabbix: 4.2 (never mind the name of the db - it is 4.2)

new values per second: ~400

hosts: ~600

items: ~45000

 

OS: CentOS Linux release 7.6.1810 (Core)

Postgresql was installed from the yum repo on postgresql.org

 

zabbix_34=> select version();

                                                 version

---------------------------------------------------------------------------------------------------------

PostgreSQL 10.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit

(1 row)

 

The database is analyzed + vacuumed nightly. The server runs Zabbix and the database, has 16 GB memory, 4 vCPUs (modern hardware).

Some parameters:

 

shared_buffers = 3GB

work_mem = 10MB    (I also tested with work_mem = 128MB - no difference)       

effective_cache_size = 6 GB

effective_io_concurrency = 40

checkpoint_timeout = 5 min (default)

max_wal_size = 1 GB (default)

checkpoint_completion_target = 0.8

 

pg_wal is already on a separate device.

 

events table: ~25 million rows / 2.9 GB

event_recovery table: ~12 million rows / 550 MB

alerts table: ~600000 rows / 530 MB

 

Generally the database is quite snappy and shows no indication of problems. But now I've seen that housekeeping of events is

very slow - a single (normally hourly) run can take more than one day to finish, so events keep stacking up in the table. A typical slow

delete statement, from the postgres log:

 

postgresql-10-20190717-031404.log:2019-07-17 03:37:43 CEST [80965]: [4-1] user=zabbix,db=zabbix_34,app=[unknown],client=[local]: LOG: duration: 27298798.930 ms statement: delete from events where (eventid between 5580621 and 5580681 or eventid between 5580689 and 5580762 or eventid between 5580769 and 5580844 or eventid between 5580851 and 5580867 or eventid between 5580869 and 5580926 or eventid between 5580933 and 5580949 or eventid between 5580963 and 5581024

--- 8< --- a lot of similar eventids snipped away -----

or eventid between 5586799 and 5586839 or eventid in (5581385,5581389,5581561,5581563,5581564,5581580,5 581582,5581584,5581585,5581635))

 

I've analyzed the deletion of a single row in events. First, some table information:

 

 

zabbix_34=> \d events

                                 Table "zabbix.events"

    Column    |          Type           | Collation | Nullable |        Default

--------------+-------------------------+-----------+----------+-----------------------

eventid      | numeric                 |           | not null |

source       | bigint                  |           | not null | '0'::bigint

object       | bigint                  |           | not null | '0'::bigint

objectid     | numeric                 |           | not null | '0'::numeric

clock        | bigint                  |           | not null | '0'::bigint

value        | bigint                  |           | not null | '0'::bigint

acknowledged | bigint                  |           | not null | '0'::bigint

ns           | bigint                  |           | not null | '0'::bigint

name         | character varying(2048) |           | not null | ''::character varying

severity     | integer                 |           | not null | 0

Indexes:

    "idx_29337_primary" PRIMARY KEY, btree (eventid)

    "events_1" btree (source, object, objectid, clock)

    "events_2" btree (source, object, clock)

    "events_clk_3" btree (clock)

Referenced by:

    TABLE "acknowledges" CONSTRAINT "c_acknowledges_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

    TABLE "alerts" CONSTRAINT "c_alerts_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

    TABLE "alerts" CONSTRAINT "c_alerts_5" FOREIGN KEY (p_eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "event_recovery" CONSTRAINT "c_event_recovery_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "event_recovery" CONSTRAINT "c_event_recovery_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "event_recovery" CONSTRAINT "c_event_recovery_3" FOREIGN KEY (c_eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "event_suppress" CONSTRAINT "c_event_suppress_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "event_tag" CONSTRAINT "c_event_tag_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "problem" CONSTRAINT "c_problem_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

    TABLE "problem" CONSTRAINT "c_problem_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

 

 

 

zabbix_34=> \d event_recovery

              Table "zabbix.event_recovery"

    Column     |  Type  | Collation | Nullable | Default

---------------+--------+-----------+----------+---------

eventid       | bigint |           | not null |

r_eventid     | bigint |           | not null |

c_eventid     | bigint |           |          |

correlationid | bigint |           |          |

userid        | bigint |           |          |

Indexes:

    "event_recovery_pkey" PRIMARY KEY, btree (eventid)

    "event_recovery_1" btree (r_eventid)

    "event_recovery_2" btree (c_eventid)

Foreign-key constraints:

    "c_event_recovery_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE

    "c_event_recovery_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE

    "c_event_recovery_3" FOREIGN KEY (c_eventid) REFERENCES events(eventid) ON DELETE CASCADE

 

 

 

zabbix_34=> \d alerts

                                 Table "zabbix.alerts"

    Column     |          Type           | Collation | Nullable |        Default

---------------+-------------------------+-----------+----------+-----------------------

alertid       | numeric                 |           | not null |

actionid      | numeric                 |           | not null |

eventid       | numeric                 |           | not null |

userid        | numeric                 |           |          |

clock         | bigint                  |           | not null | '0'::bigint

mediatypeid   | numeric                 |           |          |

sendto        | character varying(1024) |           | not null | ''::character varying

subject       | character varying(255)  |           | not null | ''::character varying

message       | text                    |           | not null | ''::text

status        | bigint                  |           | not null | '0'::bigint

retries       | bigint                  |           | not null | '0'::bigint

error         | character varying(2048) |           | not null | ''::character varying

esc_step      | bigint                  |           | not null | '0'::bigint

alerttype     | bigint                  |           | not null | '0'::bigint

p_eventid     | bigint                  |           |          |

acknowledgeid | bigint                  |           |          |

Indexes:

    "idx_29120_primary" PRIMARY KEY, btree (alertid)

    "alerts_1" btree (actionid)

    "alerts_2" btree (clock)

    "alerts_3" btree (eventid)

    "alerts_4" btree (status)

    "alerts_5" btree (mediatypeid)

    "alerts_6" btree (userid)

    "alerts_7" btree (p_eventid)

Foreign-key constraints:

    "c_alerts_1" FOREIGN KEY (actionid) REFERENCES actions(actionid) ON UPDATE RESTRICT ON DELETE CASCADE

    "c_alerts_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE

    "c_alerts_3" FOREIGN KEY (userid) REFERENCES users(userid) ON UPDATE RESTRICT ON DELETE CASCADE

    "c_alerts_4" FOREIGN KEY (mediatypeid) REFERENCES media_type(mediatypeid) ON UPDATE RESTRICT ON DELETE CASCADE

    "c_alerts_5" FOREIGN KEY (p_eventid) REFERENCES events(eventid) ON DELETE CASCADE

    "c_alerts_6" FOREIGN KEY (acknowledgeid) REFERENCES acknowledges(acknowledgeid) ON DELETE CASCADE

 

 

Let's look at what's in the tables for event 7123123:

 

zabbix_34=> select * from events where eventid=7123123;

eventid | source | object | objectid |   clock    | value | acknowledged |   ns    |                 name                 | severity

---------+--------+--------+----------+------------+-------+--------------+---------+--------------------------------------+----------

7123123 |      3 |      0 |    27562 | 1525264196 |     1 |            0 | 1980875 | Cannot calculate trigger expression. |        0

(1 row)

 

zabbix_34=> select * from event_recovery where eventid=7123123;

eventid | r_eventid | c_eventid | correlationid | userid

---------+-----------+-----------+---------------+--------

7123123 |   7124371 |           |               |

(1 row)

 

zabbix_34=> select * from alerts where eventid=7123123;

alertid | actionid | eventid | userid | clock | mediatypeid | sendto | subject | message | status | retries | error | esc_step | aler

ttype | p_eventid | acknowledgeid

---------+----------+---------+--------+-------+-------------+--------+---------+---------+--------+---------+-------+----------+-----

------+-----------+---------------

(0 rows)

 

 

All these queries execute well below 1 ms, using indexes.

 

Let's delete one row. See explain results here: https://explain.depesz.com/s/aycf . 5 seconds to delete a single row, wow!

This shows that it is the foreign key constraints on event_recovery and alerts that take a lot of time.

But why? I far as I can see, the delete is fully CPU bound during execution.

 

Deleting the corresponding row directly from event_recovery or alerts executes in less than 0.1 ms.  

 

Any ideas?

 

I've observed that alerts and event_recovery tables both have more than one foreign key that references events, if that matters.

 

Regards

Kristian Ejvind

 

 

 

Kristian Ejvind
Linux System Administrator
IT Operations | Technical Operations
 

Resurs Bank
Ekslingan 8
Box 222 09, SE-25467 Helsingborg
 

Mobil: +46 728571483
Växel: +46 42 38 20 00
E-post:Kristian.Ejvind@resurs.se
Webb:www.resursbank.se
 

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Speeding up query pulling comments from pg_catalog
Следующее
От: Kenneth Marshall
Дата:
Сообщение: Re: zabbix on postgresql - very slow delete of events