Re: Optimize LISTEN/NOTIFY
От | Joel Jacobson |
---|---|
Тема | Re: Optimize LISTEN/NOTIFY |
Дата | |
Msg-id | af75d742-1b74-43aa-8777-e1de7a36fdba@app.fastmail.com обсуждение исходный текст |
Ответ на | Re: Optimize LISTEN/NOTIFY (Rishu Bagga <rishu.postgres@gmail.com>) |
Список | pgsql-hackers |
On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote: > Hi Joel, > > Thanks for sharing the patch. > I have a few questions based on a cursory first look. > >> If a single listener is found, we signal only that backend. >> Otherwise, we fall back to the existing broadcast behavior. > > The idea of not wanting to wake up all backends makes sense to me, > but I don’t understand why we want this optimization only for the case > where there is a single backend listening on a channel. > > Is there a pattern of usage in LISTEN/NOTIFY where users typically > have either just one or several backends listening on a channel? > > If we are doing this optimization, why not maintain a list of backends > for each channel, and only wake up those channels? Thanks for the thoughtful question. You've hit on the central design trade-off in this optimization: how to provide targeted signaling for some workloads without degrading performance for others. While we don't have telemetry on real-world usage patterns of LISTEN/NOTIFY, it seems likely that most applications fall into one of three categories, which I've been thinking of in networking terms: 1. Broadcast-style ("hub mode") Many backends listening on the *same* channel (e.g., for cache invalidation). The current implementation is already well-optimized for this, behaving like an Ethernet hub that broadcasts to all ports. Waking all listeners is efficient because they all need the message. 2. Targeted notifications ("switch mode") Each backend listens on its own private channel (e.g., for session events or worker queues). This is where the current implementation scales poorly, as every NOTIFY wakes up all listeners regardless of relevance. My patch is designed to make this behave like an efficient Ethernet switch. 3. Selective multicast-style ("group mode") A subset of backends shares a channel, but not all. This is the tricky middle ground. Your question, "why not maintain a list of backends for each channel, and only wake up those channels?" is exactly the right one to ask. A full listener list seems like the obvious path to optimizing for *all* cases. However, the devil is in the details of concurrency and performance. Managing such a list would require heavier locking, which would create a new bottleneck and degrade the scalability of LISTEN/UNLISTEN operations—especially for the "hub mode" case where many backends rapidly subscribe to the same popular channel. This patch makes a deliberate architectural choice: Prioritize a massive, low-risk win for "switch mode" while rigorously protecting the performance of "hub mode". It introduces a targeted fast path for single-listener channels and cleanly falls back to the existing, well-performing broadcast model for everything else. This brings us back to "group mode", which remains an open optimization problem. A possible approach could be to track listeners up to a small threshold *K* (e.g., store up to 4 ProcNumber's in the hash entry). If the count exceeds *K*, we would flip a "broadcast" flag and revert to hub-mode behavior. However, this path has a critical drawback: 1. Performance Penalty for Hub Mode With the current patch, after the second listener joins a channel, the has_multiple_listeners flag is set. Every subsequent listener can acquire a shared lock, see the flag is true, and immediately continue. This is a highly concurrent, read-only operation that does not require mutating shared state. In contrast, the K-listener approach would force every new listener (from the third up to the K-th) to acquire an exclusive lock to mutate the shared listener array**. This would serialize LISTEN operations on popular channels, creating the very contention point this patch successfully avoids and directly harming the hub-mode use case that currently works well. 2. Uncertainty Compounding this, without clear data on typical "group" sizes, choosing a value for *K* is a shot in the dark. A small *K* might not help much, while a large *K* would increase the shared memory footprint and worsen the serialization penalty. For these reasons, attempting to build a switch that also optimizes for multicast risks undermining the architectural clarity and performance of both the switch and hub models. This patch, therefore, draws a clean line. It provides a precise, low-cost path for switch-mode workloads and preserves the existing, well-performing path for hub-mode workloads. While this leaves "group mode" unoptimized for now, it ensures we make two common use cases better without making any use case worse. The new infrastructure is flexible, leaving the door open should a better approach for "group mode" emerge in the future—one that doesn't compromise the other two. Benchmarks updated showing master vs 0001-optimize_listen_notify-v3.patch: https://github.com/joelonsql/pg-bench-listen-notify/raw/master/plot.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_connections_equal_jobs.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_fixed_connections.png I've not included the benchmark CSV data in this mail, since it's quite heavy, 160kB, and I couldn't see any significant performance changes since v2. /Joel
В списке pgsql-hackers по дате отправления: