Обсуждение: Missing wait events (gap analysis)

Поиск
Список
Период
Сортировка

Missing wait events (gap analysis)

От
Nikolay Samokhvalov
Дата:
Hi hi

Many tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and "CPU" (perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle world, and now I see it in many more places).

I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an assumption that can make analysis inaccurate, because there may be a lot of places in the code that are not covered by wait events, but technically should -- and such places cannot be named "CPU".

I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more wait events. Here is the first result: https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md

Before moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in this direction?

Nik

Re: Missing wait events (gap analysis)

От
Darafei "Komяpa" Praliaskouski
Дата:
Hello,

On Sat, Nov 22, 2025 at 4:43 AM Nikolay Samokhvalov <nik@postgres.ai> wrote:
Hi hi

Many tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and "CPU" (perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle world, and now I see it in many more places).

I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an assumption that can make analysis inaccurate, because there may be a lot of places in the code that are not covered by wait events, but technically should -- and such places cannot be named "CPU".

I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more wait events. Here is the first result: https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md

Before moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in this direction?

Definitely will make sense to have more insights into wait for the other side of the COPY pipe, TOAST compression.

Other spots that may be invisible but helpful to keep track of are serialization/deserialization that happens on IN/OUT functions (so many surprises when EXPLAIN ANALYZE doesn't account for time to actually serialize the output for large PostGIS geometries! and that stuff like timestamptz in is also surprisingly slow), and when passing stuff around between parallel workers.

Re: Missing wait events (gap analysis)

От
Matthias van de Meent
Дата:
On Sun, 23 Nov 2025, 11:28 Darafei "Komяpa" Praliaskouski,
<me@komzpa.net> wrote:
>
> Other spots that may be invisible but helpful to keep track of are serialization/deserialization that happens on
IN/OUTfunctions (so many surprises when EXPLAIN ANALYZE doesn't account for time to actually serialize the output for
largePostGIS geometries! and that stuff like timestamptz in is also surprisingly slow), 

Are you aware of the SERIALIZE option to EXPLAIN (...)? It was added
in PG 17 to make sure that the overhead of serializing the data for
transmission to a client could also be measured and inspected by the
user.

To keep on topic to this thread about wait events: I don't think that
we should add wait events around in/out functions, because in/out
functions may call into detoasting, which calls into buffer IO
functions, which would reset the backend's wait event status.


Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)



Re: Missing wait events (gap analysis)

От
Matthias van de Meent
Дата:
On Sat, 22 Nov 2025 at 01:43, Nikolay Samokhvalov <nik@postgres.ai> wrote:
>
> Hi hi
>
> Many tools that implement wait event analysis, when visualizing samples with "wait_event is null" use green color and
"CPU"(perhaps, it started with RDS Performance Insights and PASH Viewer and, I suppose, originally came from the Oracle
world,and now I see it in many more places). 
>
> I don't have any concerns with green color, but always had a feeling that "coalesce(wait_event, 'CPU')" is an
assumptionthat can make analysis inaccurate, because there may be a lot of places in the code that are not covered by
waitevents, but technically should -- and such places cannot be named "CPU". 

Then, isn't that an issue with the reporting tool(s)?

> I asked Claude Code to analyze Postgres source code and find such places, that we could potentially cover with more
waitevents. Here is the first result:
https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md

Did you review this yourself, and include only those places that are
actually relevant for wait events? I'm not opposed to using AI systems
for analysis, for understanding the code, or for finding issues, but
posting tool output without you first understanding all the proposed
changes is a recipe for wasting everyone's time.

> Before moving forward with proposals of specific patches, I wanted to hear opinions -- does it make sense to work in
thisdirection? 

I don't think it's a bad idea to add wait events in potential wait
points in code.


Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)



Re: Missing wait events (gap analysis)

От
Álvaro Herrera
Дата:
On 2025-Nov-24, Matthias van de Meent wrote:

> On Sat, 22 Nov 2025 at 01:43, Nikolay Samokhvalov <nik@postgres.ai> wrote:

> > Before moving forward with proposals of specific patches, I wanted
> > to hear opinions -- does it make sense to work in this direction?
> 
> I don't think it's a bad idea to add wait events in potential wait
> points in code.

There are things that I think it makes sense to cover, such as DNS
lookups, calls to external libraries for authentication, and so on.  I'm
not so sure that it is useful to distinguish things like one type of DNS
lookup from another.  Low-level operations such as file unlinking also
sounds like a reasonable thing to report separately, as long as it
doesn't break reporting for something else ...

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)



Re: Missing wait events (gap analysis)

От
Andres Freund
Дата:
Hi,

On 2025-11-21 18:43:31 -0600, Nikolay Samokhvalov wrote:
> Many tools that implement wait event analysis, when visualizing samples
> with "wait_event is null" use green color and "CPU" (perhaps, it started
> with RDS Performance Insights and PASH Viewer and, I suppose, originally
> came from the Oracle world, and now I see it in many more places).
> 
> I don't have any concerns with green color, but always had a feeling that
> "coalesce(wait_event, 'CPU')" is an assumption that can make analysis
> inaccurate, because there may be a lot of places in the code that are not
> covered by wait events, but technically should -- and such places cannot be
> named "CPU".
> 
> I asked Claude Code to analyze Postgres source code and find such places,
> that we could potentially cover with more wait events. Here is the first
> result:
>
https://github.com/NikolayS/postgres/blob/claude/cpu-asterisk-wait-events-01CyiYYMMcFMovuqPqLNcp8T/WAIT_EVENTS_ANALYSIS.md
> 
> Before moving forward with proposals of specific patches, I wanted to hear
> opinions -- does it make sense to work in this direction?

Some of this seems sensible.  However, I vehemently oppose turning wait events
into a poor emulation of a CPU profiler. I think it would lead us down a bad
path to add wait events for CPU activity. It'd just lead us to adding them
everywhere, ending up with wait events (CPU activity is not a wait!) having
significant costs.

Greetings,

Andres Freund