old_snapshot_threshold bottleneck on replica

Поиск

Список

Период

Сортировка

От	Maxim Orlov
Тема	old_snapshot_threshold bottleneck on replica
Дата	23 января 2023 г. 17:40:15
Msg-id	CACG=ezYV+EvO135fLRdVn-ZusfVsTY6cH1OZqWtezuEYH6ciQA@mail.gmail.com обсуждение исходный текст
Ответы	Re: old_snapshot_threshold bottleneck on replica Re: old_snapshot_threshold bottleneck on replica
Список	pgsql-hackers

Дерево обсуждения

Hi!

One of our customers stumble onto a significant performance degradation while running multiple OLAP-like queries on a replica.

After some investigation, it became clear that the problem is in accessing old_snapshot_threshold parameter.

Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. This is not a problem on primary

server, since we rarely call GetOldSnapshotThresholdTimestamp:

5028 void
5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
5030 {
5031 ····if (RelationAllowsEarlyPruning(relation)
5032 ········&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
5033 ········ereport(ERROR,
5034 ················(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
5035 ················ errmsg("snapshot too old")));

But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp much often. So, this become a

bottleneck. The customer solve this issue by setting old_snapshot_threshold to 0. But, I think, we can

do something about it.

Some more investigation:

-- On primary --

$ ./bin/psql postgres -c "create database benchmark"
CREATE DATABASE
$ ./bin/pgbench -i -Uorlov -s300 benchmark
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
...
creating tables...
generating data (client-side)...
30000000 of 30000000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).

-- On secondary --

$ touch 1.sql
$ vim 1.sql
$ cat 1.sql
\set bid random(1, 300)
BEGIN;
SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
END;

$ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
pgbench (16devel)
progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
...
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed

$ perf record -F 99 -a -g --call-graph=dwarf sleep 5

$ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file

$ grep s_lock file | wc -l

My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 0001 patch.

With patch 0001 we got:

$ grep s_lock file2 | wc -l 
8

Maybe, we shall go farther and remove mutex_threshold here? This will lead to inconsistency of

threshold_timestamp and threshold_xid, but is this really a problem?

Thoughts?

Best regards,

Maxim Orlov.

Вложения

0001-PGPRO-7624-use-atomic-old_snapshot_threshold.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Dmitry Dolgov
Дата: 23 января 2023 г., 17:25:54
Сообщение: Re: Schema variables - new implementation for Postgres 15 (typo)

Следующее

От: Jelte Fennema
Дата: 23 января 2023 г., 17:49:15
Сообщение: Re: run pgindent on a regular basis / scripted manner

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

old_snapshot_threshold bottleneck on replica

Вложения

Предыдущее

Следующее