Re: Outage caused by query in IPC:MessageQueueReceive wait????

Поиск
Список
Период
Сортировка
От Chris Hoover
Тема Re: Outage caused by query in IPC:MessageQueueReceive wait????
Дата
Msg-id 290BB1BE-4F67-4CE5-84B3-67DA8B43DAFA@aweber.com
обсуждение исходный текст
Ответ на Outage caused by query in IPC:MessageQueueReceive wait????  (Chris Hoover <chrish@aweber.com>)
Список pgsql-admin
Unfortunately I don’t.  I do snapshot pg_locks and pg_stat_statements every minute (along with several other catalogs).  I was able to back trace to this single query in a transaction causing issue:

```
snapped_at       | 2024-06-14 10:00:00.024303-04
pid                     | 13900
...
backend_start    | 2024-06-14 09:59:09.604535-04
xact_start           | 2024-06-14 09:59:50.026618-04
query_start        | 2024-06-14 09:59:50.881825-04
state_change       | 2024-06-14 09:59:50.881826-04
wait_event_type   | IPC
wait_event           | MessageQueueReceive
state                    | active
backend_xid        | 2334413815
backend_xmin     | 2334413815
...
```

Here are the minute by minute log showing the query stuck for 16 minutes:
```
select snapped_at, pid, wait_event_type, wait_event from pg_stat_activity_snapshot_table where snapped_at >= '2024-06-14 09:50' and snapped_at < '2024-06-14 10:20' and pid = 13900 order by snapped_at
;
          snapped_at           |  pid  | wait_event_type |     wait_event
-------------------------------+-------+-----------------+---------------------
 2024-06-14 10:00:00.024303-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:01:00.013122-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:02:00.011187-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:03:00.010872-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:04:00.013126-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:05:00.019508-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:06:00.010661-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:07:00.011247-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:08:00.010288-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:09:00.011029-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:10:00.021545-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:11:00.011503-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:12:00.010562-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:13:00.011171-04 | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:14:00.01087-04  | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:15:00.01752-04  | 13900 | IPC             | MessageQueueReceive
 2024-06-14 10:16:00.010518-04 | 13900 | IPC             | MessageQueueReceive
(17 rows)
```

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269
Email: chrish@aweber.com



On Jun 14, 2024, at 2:27 PM, Paul Barrett <tinojam@gmail.com> wrote:

Hi Chris,
Do you have log_lock_waits enabled? If yes, check the Postgres log to see what process was causing the block/lock. I hope this helps.

Thanks,
Paul B

On Fri, Jun 14, 2024 at 2:19 PM Chris Hoover <chrish@aweber.com> wrote:
Yes,  it appeared to be in a transaction.  The pg_stat_activity.xact_start was before the query_start.  Unfortunately, I don’t have the other queries.  (I snapshot pg_stat_activity every minute, so it missed any prior queries to the blocker).

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269



On Jun 14, 2024, at 2:16 PM, Ozgur Kulu <kuluozgur@gmail.com> wrote:

Hi chris, 

Is this query in transaction block ? 

14 Haz 2024 Cum 20:03 tarihinde Chris Hoover <chrish@aweber.com> şunu yazdı:
Forgot to mention, this is on PG 16.1

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269



On Jun 14, 2024, at 12:52 PM, Chris Hoover <chrish@aweber.com> wrote:

All,

We had an outage today that appears to have been caused by a query in "IPC:MessageQueueReceive” wait state blocking another query from completing a truncate.  

Is there anyway to figure out what exactly it was waiting on other than an IPC message?  I’m trying to get to the true root cause of our outage.

Thanks,


Chris Hoover
Senior DBA
AWeber.com
Cell: (803) 528-2269






В списке pgsql-admin по дате отправления:

Предыдущее
От: Chris Hoover
Дата:
Сообщение: Re: Outage caused by query in IPC:MessageQueueReceive wait????
Следующее
От: Rui DeSousa
Дата:
Сообщение: Re: Keepalive