Обсуждение: Missing wait events (gap analysis)
Hi hi
Many tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and "CPU" (perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle world, and now I see it in many more places).
I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an assumption that can make analysis inaccurate, because there may be a lot of places in the code that are not covered by wait events, but technically should -- and such places cannot be named "CPU".
I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more wait events. Here is the first result: https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md
Before moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in this direction?
Nik
Hi hiMany tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and "CPU" (perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle world, and now I see it in many more places).I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an assumption that can make analysis inaccurate, because there may be a lot of places in the code that are not covered by wait events, but technically should -- and such places cannot be named "CPU".I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more wait events. Here is the first result: https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.mdBefore moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in this direction?
Definitely will make sense to have more insights into wait for the other side of the COPY pipe, TOAST compression.
Other spots that may be invisible but helpful to keep track of are serialization/deserialization that happens on IN/OUT functions (so many surprises when EXPLAIN ANALYZE doesn't account for time to actually serialize the output for large PostGIS geometries! and that stuff like timestamptz in is also surprisingly slow), and when passing stuff around between parallel workers.
On Sun, 23 Nov 2025, 11:28 Darafei "Komяpa" Praliaskouski, <me@komzpa.net> wrote: > > Other spots that may be invisible but helpful to keep track of are serialization/deserialization that happens on IN/OUTfunctions (so many surprises when EXPLAIN ANALYZE doesn't account for time to actually serialize the output for largePostGIS geometries! and that stuff like timestamptz in is also surprisingly slow), Are you aware of the SERIALIZE option to EXPLAIN (...)? It was added in PG 17 to make sure that the overhead of serializing the data for transmission to a client could also be measured and inspected by the user. To keep on topic to this thread about wait events: I don't think that we should add wait events around in/out functions, because in/out functions may call into detoasting, which calls into buffer IO functions, which would reset the backend's wait event status. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
On Sat, 22 Nov 2025 at 01:43, Nikolay Samokhvalov <nik@postgres.ai> wrote: > > Hi hi > > Many tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and "CPU"(perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle world,and now I see it in many more places). > > I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an assumptionthat can make analysis inaccurate, because there may be a lot of places in the code that are not covered by waitevents, but technically should -- and such places cannot be named "CPU". Then, isn't that an issue with the reporting tool(s)? > I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more waitevents. Here is the first result: https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md Did you review this yourself, and include only those places that are actually relevant for wait events? I'm not opposed to using AI systems for analysis, for understanding the code, or for finding issues, but posting tool output without you first understanding all the proposed changes is a recipe for wasting everyone's time. > Before moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in thisdirection? I don't think it's a bad idea to add wait events in potential wait points in code. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)
On 2025-Nov-24, Matthias van de Meent wrote: > On Sat, 22 Nov 2025 at 01:43, Nikolay Samokhvalov <nik@postgres.ai> wrote: > > Before moving forward with proposals of specific patches, I wanted > > to hear opinions -- does it make sense to work in this direction? > > I don't think it's a bad idea to add wait events in potential wait > points in code. There are things that I think it makes sense to cover, such as DNS lookups, calls to external libraries for authentication, and so on. I'm not so sure that it is useful to distinguish things like one type of DNS lookup from another. Low-level operations such as file unlinking also sounds like a reasonable thing to report separately, as long as it doesn't break reporting for something else ... -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Hi, On 2025-11-21 18:43:31 -0600, Nikolay Samokhvalov wrote: > Many tools that implement wait event analysis, when visualizing samples > with "wait_event is null" use green color and "CPU" (perhaps, it started > with RDS Performance Insights and PASH Viewer and, I suppose, originally > came from the Oracle world, and now I see it in many more places). > > I don't have any concerns with green color, but always had a feeling that > "coalesce(wait_event, 'CPU')" is an assumption that can make analysis > inaccurate, because there may be a lot of places in the code that are not > covered by wait events, but technically should -- and such places cannot be > named "CPU". > > I asked Claude Code to analyze Postgres source code and find such places, > that we could potentially cover with more wait events. Here is the first > result: > https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md > > Before moving forward with proposals of specific patches, I wanted to hear > opinions -- does it make sense to work in this direction? Some of this seems sensible. However, I vehemently oppose turning wait events into a poor emulation of a CPU profiler. I think it would lead us down a bad path to add wait events for CPU activity. It'd just lead us to adding them everywhere, ending up with wait events (CPU activity is not a wait!) having significant costs. Greetings, Andres Freund