Re: Optimize LISTEN/NOTIFY

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Optimize LISTEN/NOTIFY
Дата
Msg-id 8c71183a-0d28-4bcf-a806-78446ff95404@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Optimize LISTEN/NOTIFY  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Optimize LISTEN/NOTIFY
Список pgsql-hackers
On Sat, Oct 11, 2025, at 09:43, Joel Jacobson wrote:
> On Sat, Oct 11, 2025, at 08:43, Joel Jacobson wrote:
>> In addition to previously suggested optimization, there is another major
...
>> I'm not entirely sure this approach is correct though

Having investigated this, the "direct advancement" approach seems
correct to me.

(I understand the exclusive lock in PreCommit_Notify on NotifyQueueLock
is of course needed because there are other operations that don't
acquire the heavyweight-lock, that take shared/exclusive lock on
NotifyQueueLock to read/modify QUEUE_HEAD, so the exclusive lock on
NotifyQueueLock in PreCommit_Notify is needed, since it modifies the
QUEUE_HEAD.)

Given all the experiments since my earlier message, here is a fresh,
self-contained write-up:

This series has two patches:

* 0001-optimize_listen_notify-v16.patch:
Improve test coverage of async.c. Adds isolation specs covering
previously untested paths (subxact LISTEN reparenting/merge/abort,
simple NOTIFY reparenting, notification_match dedup, and an array-growth
case used by the follow-on patch.

* 0002-optimize_listen_notify-v16.patch:
Optimize LISTEN/NOTIFY by maintaining a shared channel map and using
direct advancement to avoid useless wakeups.

Problem
-------

Today SignalBackends wakes all listeners in the same database, with no
knowledge of which backends listen on which channels. When some backends
are listening on different channels, each NOTIFY causes unnecessary
wakeups and context switches, which can become the bottleneck in
workloads.

Overview of the solution (patch 0002)
-------------------------------------

* Introduce a lazily-created DSA+dshash map (dboid, channel) ->
  [ProcNumber] (channelHash). AtCommit_Notify maintains it for
  LISTEN/UNLISTEN, and SignalBackends consults it to signal only
  listeners on the channels notified within the transaction.
* Add a per-backend wakeupPending flag to suppress duplicate signals.
* Direct advancement: while queuing, PreCommit_Notify records the queue
  head before and after our writes. Writers are globally serialized, so
  the interval [oldHead, newHead) contains only our entries.
  SignalBackends advances any backend still at oldHead directly to
  newHead, avoiding a pointless wakeup.
* Keep the queue healthy by signaling backends that have fallen too far
  behind (lag >= QUEUE_CLEANUP_DELAY) so the global tail can advance.
* pg_listening_channels and IsListeningOn now read from channelHash.
* Add LWLock tranche NOTIFY_CHANNEL_HASH and wait event
  NotifyChannelHash.

No user-visible semantic changes are intended; this is an internal
performance improvement.

Benchmark
---------

Using a patched pgbench (adds --listen-notify-benchmark; attached as
.txt to avoid confusing cfbot). Each run performs 10 000 round trips and
adds 100 idle listeners per iteration.

master (HEAD):

% ./pgbench_patched --listen-notify-benchmark --notify-round-trips=10000 --notify-idle-step=100

idle_listeners  round_trips_per_sec     max_latency_usec
             0              32123.7                  893
           100               1952.5                 1465
           200                991.4                 3438
           300                663.5                 2454
           400                494.6                 2950
           500                398.6                 3394
           600                332.8                 4272
           700                287.1                 4692
           800                252.6                 5208
           900                225.4                 5614
          1000                202.5                 6212

0002-optimize_listen_notify-v16.patch:

% ./pgbench_patched --listen-notify-benchmark --notify-round-trips=10000 --notify-idle-step=100

idle_listeners  round_trips_per_sec     max_latency_usec
             0              31832.6                 1067
           100              32341.0                 1035
           200              31562.5                 1054
           300              30040.1                 1057
           400              29287.1                 1023
           500              28191.9                 1201
           600              28166.5                 1019
           700              26994.3                 1094
           800              26501.0                 1043
           900              25974.2                 1005
          1000              25720.6                 1008

Benchmarked on MacBook Pro Apple M3 Max.

Files
-----

* 0001-optimize_listen_notify-v16.patch - tests only.
* 0002-optimize_listen_notify-v16.patch - implementation.
* pgbench-listen-notify-benchmark-patch.txt - adds --listen-notify-benchmark.

Feedback and review much welcomed.

/Joel
Вложения

В списке pgsql-hackers по дате отправления: