Обсуждение: High insert rate server, unstable insert latency and load peaks withbuffer_content and XidGenLock LWlocks with Postgresql 12 version

Поиск
Список
Период
Сортировка
Hi,

I have performance issues which I never seen before in my 20+ years experience with PostgreSQL.

With database on dedicated server I encountered unusual load profile:
multi thread (200 connections static size pool via pgbouncer) insert only into single table around 15.000 insert/s.

Usually insert took 0.025ms and amount active backends (via pg_stat_activity) usually stay in 1-5-10 range.
But every so while (few times per minute actually) number of active backend go up to all 200 allowed connections.
Which lead to serious latency in latency sensitive load.

No problem with IO latency or CPU usage found during performance analyze.
syncronous_commit = off

To analyze what going with locks I run 
\o ~/tmp/watch_events.log
select wait_event_type,wait_event,count(*) from pg_stat_activity where state='active' and backend_type='client backend' group by 1,2 order by 3 desc
\watch 0.1

Normal output when all goes well:
 wait_event_type | wait_event | count
-----------------+------------+-------
 Client          | ClientRead |     5
                 |            |     4
(few processes running queries and few processes doing network IO)

Bad case (few times per minute, huge latency peak, some inserts took up to 100ms to run):
 wait_event_type |   wait_event   | count
-----------------+----------------+-------
 LWLock          | buffer_content |   178
 LWLock          | XidGenLock     |    21
 IO              | SLRUSync       |     1
                 |                |     1

So there are almost all backends waiting on buffer_content lock and some backends waiting for XidGenLock .
And always one backend in SLRUSync.

If anyone can have any good idea whats going on in that case and how I can fix it - any ideas welcome.
So far I out of ideas.


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"

On Fri, Apr 10, 2020 at 12:51:03AM +1000, Maxim Boguk wrote:
> With database on dedicated server I encountered unusual load profile:
> multi thread (200 connections static size pool via pgbouncer) insert only
> into single table around 15.000 insert/s.
> 
> Usually insert took 0.025ms and amount active backends (via
> pg_stat_activity) usually stay in 1-5-10 range.
> But every so while (few times per minute actually) number of active backend
> go up to all 200 allowed connections.
> Which lead to serious latency in latency sensitive load.
> 
> No problem with IO latency or CPU usage found during performance analyze.
> syncronous_commit = off

Can you share other settings ?  shared_buffers, checkpoint_*, bgwriter_* and
max_wal_size ?  And version()

> And always one backend in SLRUSync.
> 
> If anyone can have any good idea whats going on in that case and how I can
> fix it - any ideas welcome.
> So far I out of ideas.

This might be useful: pg_stat_bgwriter view.

I suggest to follow others advice and make a cronjob to do this every ~5 minutes:
| INSERT INTO jrn_pg_stat_bgwriter SELECT now(), * FROM pg_stat_bgwriter;
and write a window function to show values/time, or rrd graphs or whatever.

-- 
Justin





On Fri, Apr 10, 2020 at 1:16 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Apr 10, 2020 at 12:51:03AM +1000, Maxim Boguk wrote:
> With database on dedicated server I encountered unusual load profile:
> multi thread (200 connections static size pool via pgbouncer) insert only
> into single table around 15.000 insert/s.
>
> Usually insert took 0.025ms and amount active backends (via
> pg_stat_activity) usually stay in 1-5-10 range.
> But every so while (few times per minute actually) number of active backend
> go up to all 200 allowed connections.
> Which lead to serious latency in latency sensitive load.
>
> No problem with IO latency or CPU usage found during performance analyze.
> syncronous_commit = off

Can you share other settings ?  shared_buffers, checkpoint_*, bgwriter_* and
max_wal_size ?  And version()


version -  PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
shared_buffers 140GB
checkpoint_timeout  1h
checkpoint_flush_after 0
checkpoint_completion_target 0.9
bgwriter_delay 10ms
bgwriter_flush_after 0
bgwriter_lru_maxpages 10000
bgwriter_lru_multiplier 10
max_wal_size 128GB

Checkpoints happens every 1h and lag spiked doesn't depend on checkpointer activity.
buffers_checkpoint 92% writes, buffers_clean 2% writes, buffers_backend 6% writes (over course of 5 minutes).
Nothing especially suspicious on graphical monitoring of these values as well.


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7  985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"