[PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

Поиск
Список
Период
Сортировка
От Michael Zhilin
Тема [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
Дата
Msg-id ab1c0a7d-e789-5ef5-1180-42708ac6fe2d@postgrespro.ru
обсуждение исходный текст
Ответы Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
Hi,

I would like to submit patch for column wait_event of view 
pg_stat_activity. Please find it attached.

The view pg_stat_activity provides snapshot of actual backends' state 
with following columns:
   - wait_event contains event name for which backend is waiting for;
   - state of backend, for instance active or idle.

We observe that in high load applications the value of column wait_event 
of pg_stat_activity is incorrect. For instance, it can be "ClientRead" 
for active backends. According to source code, the wait event 
"ClientRead" is possible only for idle or 'idle in transaction' 
backends. So if backend is active, the wait_event can't be 'ClientRead'.

The key parameter to reproduce defect is high value of max_connections 
(more than 1000). Let's do the following:
   - create new database (initdb)
   - set max_connections to 10000
   - trigger pgbench in select-only mode (200 connections, 1 job)
   - fetch data from pg_stat_activity

The following query can be used to detect problem:

select state, wait_event, backend_type, count(1)
    from pg_stat_activity
   group by 1,2,3;

The pgbench command is following:

pgbench -n -c 200 -j 1 -T 60 -P 1 -M prepared -S postgres

Before patch, the output looks like:

postgres=# select state, wait_event, backend_type,  count(1) from 
pg_stat_activity group by 1,2,3;
   state  |     wait_event      |         backend_type         | count
--------+---------------------+------------------------------+-------
   idle   |                     | client backend               |     3
   active |                     | client backend               |     1
          | WalWriterMain       | walwriter                    |     1
          | CheckpointerMain    | checkpointer                 |     1
          | LogicalLauncherMain | logical replication launcher |     1
          | AutoVacuumMain      | autovacuum launcher          |     1
          | BgWriterHibernate   | background writer            |     1
   active | ClientRead          | client backend               |     4
   idle   | ClientRead          | client backend               |   193
(9 rows)

Time: 4.406 ms

Please pay attention to lines with state 'active' and wait_event 
'ClientRead'. According to output above, we see 4 backends with such 
combination of state and wait_event.

After patch, the output is better:

postgres=# select state, wait_event, backend_type,  count(1) from 
pg_stat_activity group by 1,2,3;
   state  |     wait_event      |         backend_type         | count
--------+---------------------+------------------------------+-------
          |                     | walwriter                    |     1
   active |                     | client backend               |     5
          | LogicalLauncherMain | logical replication launcher |     1
          | AutoVacuumMain      | autovacuum launcher          |     1
          |                     | background writer            |     1
   idle   | ClientRead          | client backend               |   196
          |                     | checkpointer                 |     1
(7 rows)

Time: 1.520 ms

The lines with active-ClientRead and idle-nowait are disappeared and 
output looks expecting: 5 active backend with no wait, 196 idle 
connections with wait event ClientRead.

The output is incorrect because state & wait information are gathered at 
different times. At first, the view gathers backends' information into 
local structures and then it iterates over backends to enrich data by 
wait event. To read wait event it tries to get LWLock per backend, so 
iterating over backends takes some time (few milliseconds). As result 
backend wait events may be changed for quick queries.

The idea of patch is to avoid iterating over backend and gather all 
information at once.

As well, patch changes way to allocate memory for local structure. 
Before it estimates maximum size of required memory and allocate it at 
once. It could result into allocation of dozens/hundreds of megabytes 
for nothing. Now it allocates memory by chunks to reduce overall amount 
of allocated memory and reduce time for allocation.

In example above, the timing is reduced from 4.4ms to 1.5ms (3 times).

The patch is for PostgreSQL version 15. If fix is OK and is required for 
previous versions, please let know.
It's worth to mention Yury Sokolov as co-author of patch.

Please feel free to ask any questions.

Best regards,
-- 
Michael Zhilin
Postgres Professional
+7(925)3366270
https://www.postgrespro.ru
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Reid Thompson
Дата:
Сообщение: Patch to address creation of PgStat* contexts with null parent context
Следующее
От: Michel Pelletier
Дата:
Сообщение: Re: PATCH: Add Table Access Method option to pgbench