Re: Outage caused by query in IPC:MessageQueueReceive wait????
От | Chris Hoover |
---|---|
Тема | Re: Outage caused by query in IPC:MessageQueueReceive wait???? |
Дата | |
Msg-id | 290BB1BE-4F67-4CE5-84B3-67DA8B43DAFA@aweber.com обсуждение исходный текст |
Ответ на | Outage caused by query in IPC:MessageQueueReceive wait???? (Chris Hoover <chrish@aweber.com>) |
Список | pgsql-admin |
Unfortunately I don’t. I do snapshot pg_locks and pg_stat_statements every minute (along with several other catalogs). I was able to back trace to this single query in a transaction causing issue:
```
snapped_at | 2024-06-14 10:00:00.024303-04
pid | 13900
...
backend_start | 2024-06-14 09:59:09.604535-04
xact_start | 2024-06-14 09:59:50.026618-04
query_start | 2024-06-14 09:59:50.881825-04
state_change | 2024-06-14 09:59:50.881826-04
wait_event_type | IPC
wait_event | MessageQueueReceive
state | active
backend_xid | 2334413815
backend_xmin | 2334413815
...
```
Here are the minute by minute log showing the query stuck for 16 minutes:
```
select snapped_at, pid, wait_event_type, wait_event from pg_stat_activity_snapshot_table where snapped_at >= '2024-06-14 09:50' and snapped_at < '2024-06-14 10:20' and pid = 13900 order by snapped_at
;
snapped_at | pid | wait_event_type | wait_event
-------------------------------+-------+-----------------+---------------------
2024-06-14 10:00:00.024303-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:01:00.013122-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:02:00.011187-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:03:00.010872-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:04:00.013126-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:05:00.019508-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:06:00.010661-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:07:00.011247-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:08:00.010288-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:09:00.011029-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:10:00.021545-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:11:00.011503-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:12:00.010562-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:13:00.011171-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:14:00.01087-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:15:00.01752-04 | 13900 | IPC | MessageQueueReceive
2024-06-14 10:16:00.010518-04 | 13900 | IPC | MessageQueueReceive
(17 rows)
```
Thanks,
Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish@aweber.com
On Jun 14, 2024, at 2:27 PM, Paul Barrett <tinojam@gmail.com> wrote:Hi Chris,Do you have log_lock_waits enabled? If yes, check the Postgres log to see what process was causing the block/lock. I hope this helps.Thanks,Paul BOn Fri, Jun 14, 2024 at 2:19 PM Chris Hoover <chrish@aweber.com> wrote:Yes, it appeared to be in a transaction. The pg_stat_activity.xact_start was before the query_start. Unfortunately, I don’t have the other queries. (I snapshot pg_stat_activity every minute, so it missed any prior queries to the blocker).On Jun 14, 2024, at 2:16 PM, Ozgur Kulu <kuluozgur@gmail.com> wrote:Hi chris,Is this query in transaction block ?14 Haz 2024 Cum 20:03 tarihinde Chris Hoover <chrish@aweber.com> şunu yazdı:Forgot to mention, this is on PG 16.1On Jun 14, 2024, at 12:52 PM, Chris Hoover <chrish@aweber.com> wrote:All,We had an outage today that appears to have been caused by a query in "IPC:MessageQueueReceive” wait state blocking another query from completing a truncate.Is there anyway to figure out what exactly it was waiting on other than an IPC message? I’m trying to get to the true root cause of our outage.
В списке pgsql-admin по дате отправления:
Предыдущее
От: Chris HooverДата:
Сообщение: Re: Outage caused by query in IPC:MessageQueueReceive wait????