Обсуждение: Refactor query normalization into core query jumbling

Поиск
Список
Период
Сортировка

Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
Hi,

A while back, there was a discussion about moving the query
normalization code out of pg_stat_statements and into core
query jumbling [0]. This conversation also led to hardening
the hooks (currently only post_parse_analyze) that receive
JumbleState [1].

For the first point, `generate_normalized_query` takes the query
string and the jumbleState with constant locations from core
query jumbling. There is nothing uniquely special about
pg_stat_statements in this code, and it can be used by other extensions.
Extensions could even pass their own query string and JumbleState
if they choose. Patch 0001 moves this function and its helpers to
queryjumblefuncs.c.

A Github search also shows that there are quite a few extensions
that may be copying this code [2] [3], which means they will lose
out on potential improvements and fixes.

As part of this patch, the moved code itself did not change,
but I did improve a comment:

```
* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, so we need to translate the provided locations
* to compensate.  (This lets us avoid re-scanning statements before the one
* of interest, so it's worth doing.)
```

This comment appeared at the top of generate_normalized_query and
fill_in_constant_lengths. I simplified it and moved a shortened
version inside the functions.

Also, I did not think `fill_in_constant_lengths` should be a global
function, as I cannot think of a good reason it would be used
on its own, though someone may have a different opinion there.

For the second point, since JumbleState can be shared by multiple
extensions, hooks should receive it as a const pointer. This
signals read-only intent and prevents extensions from
accidentally modifying it through the hook. This change is in
0002. This does mean that extensions will need to update their
hooks, but I do not see that as an issue for a major version.

Thoughts?

[0] https://postgr.es/m/aQA9v9nLu5qsX8IE%40paquier.xyz
[1] https://postgr.es/m/202510281023.4u5aszccvsct%40alvherre.pgsql
[2] https://github.com/search?q=fill_in_constant_lengths&type=code
[3] https://github.com/search?q=generate_normalized_query&type=code

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Dec 18, 2025 at 06:44:17PM -0600, Sami Imseih wrote:
> For the second point, since JumbleState can be shared by multiple
> extensions, hooks should receive it as a const pointer. This
> signals read-only intent and prevents extensions from
> accidentally modifying it through the hook. This change is in
> 0002. This does mean that extensions will need to update their
> hooks, but I do not see that as an issue for a major version.

I can see that 0001 is doing:

-static void fill_in_constant_lengths(JumbleState *jstate, const char *query,

and then:

+static void
+fill_in_constant_lengths(const JumbleState *jstate, const char *query,

Should the const addition be in 0002? But...

While looking at fill_in_constant_lengths(), I can see that it is doing:

locs = jstate->clocations;

and then things like:

locs[i].length = -1

or

locs[i].length = strlen(yyextra.scanbuf + loc)

While this is technically correct so the compiler does not complain (because
clocations is a non const pointer in JumbleState and the added const does not
apply to what clocations points to), I think that adding const here is misleading.

Indeed, the function clearly modifies data accessible through the parameter.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> While this is technically correct so the compiler does not complain (because
> clocations is a non const pointer in JumbleState and the added const does not
> apply to what clocations points to), I think that adding const here is misleading.

Yes, I am not happy with this. I initially thought it would be OK to
modify the JumbleState as long as it is done in a core function, but
after much thought, this is neither a good idea nor safe. If a pointer
is marked as a Constant, we should not modify it.

So, I went back to think about this, and the core problem as I see it
is that multiple hooks on the chain can modify the constant lengths. For
example, pg_stat_statements can now modify the constant lengths in
JumbleState via fill_in_constant_lengths, and the same JumbleState can
have its constant locations modified in a different way.

At this time, constant lengths are the only part of the JumbleState that
is not set by core. So, I don't think post_parse_analyze receiving
JumbleState as a constant is the right approach, nor do we need it.

I think we should allow JumbleState to define a normalization callback,
which defaults to a core normalization function rather than an
extension specific one:

```
    jstate->normalize_query = GenerateNormalizedQuery;
```

This way, any extension that wishes to return a normalized string from
the same JumbleState can invoke this callback and get consistent results.
pg_stat_statements and other extensions with a need to normalize a query
string based on the locations of a JumbleState do not need to care about the
internals of normalization, they simply invoke the callback and
receive the final
string.

So v2-0001 implements this callback and moves the normalization logic
into core. Both changes must be done in the same patch.  The comments
are also updated where they are no longer applicable or could be improved.

One additional improvement that this patch did not include is a bool in
JumbleState that tracks whether a normalized string has already been
generated. This way, repeated calls to the callback would not need to
regenerate the string; only the first call would perform the work,
while subsequent calls could simply return the previously computed
normalized string.

I do like the simplicity of this approach and it removes pg_stat_statements
from having to own the normalization code and how well different extensions
sharing the same JumbleState can play together. Not yet sure if this is the
correct direction, and I am open to other ideas.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Fri, Dec 19, 2025 at 03:36:16PM -0600, Sami Imseih wrote:
> This way, any extension that wishes to return a normalized string from
> the same JumbleState can invoke this callback and get consistent results.
> pg_stat_statements and other extensions with a need to normalize a query
> string based on the locations of a JumbleState do not need to care about the
> internals of normalization, they simply invoke the callback and
> receive the final
> string.

Hmm.  I did not wrap completely my head with your problem, but,
assuming that what you are proposing goes in the right direction, I am
wondering if we should not expose a bit more the jumble query APIs so
as the normal default callback can be reused by out-of-core rather
than hide it entirely.  This would mean exposing
GenerateNormalizedQuery(), which also giving a way for callers of
JumbleQuery() to pass down a custom callback?  This would imply
thinking harder about the initialization state we expect in the
structure, but I think that we should try to design things so as
extensions do not need to copy-paste more code from the core tree at
the end, just less of it.

Of course, this sentence is written with the same line of thoughts as
previously mentioned in the other thread we have discussed: extensions
should not be allowed to update a JumbleState after it's been set by
the backend code, so as once the same JumbleState pointer is passed
down across multiple extensions they don't get confused.  If an
extension wants to use their own policy within the JumbleState, they
had better recreate a new independent one if they are unhappy about
has been generated previously.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> > This way, any extension that wishes to return a normalized string from
> > the same JumbleState can invoke this callback and get consistent results.
> > pg_stat_statements and other extensions with a need to normalize a query
> > string based on the locations of a JumbleState do not need to care about the
> > internals of normalization, they simply invoke the callback and
> > receive the final
> > string.
>
> Hmm.  I did not wrap completely my head with your problem, but,
> assuming that what you are proposing goes in the right direction,

The first goal is to move all query-normalization-related infrastructure
that pg_stat_statements (and other extensions) rely on into core, so
extensions no longer need to copy or reimplement normalization logic and
can all depend on a single, shared implementation.

In addition, query normalization necessarily modifies JumbleState (to
record constant locations and lengths). This responsibility should not
fall to extensions and should instead be delegated to core. I will argue
that the current design, in which extensions handle this directly, is a
layering violation.

As a first step, we can move generate_normalized_query to core as a global
function, allowing extensions to simply call it.

> I am wondering if we should not expose a bit more the jumble query APIs so
> as the normal default callback can be reused by out-of-core rather
> than hide it entirely.  This would mean exposing
> GenerateNormalizedQuery(), which also giving a way for callers of
> JumbleQuery() to pass down a custom callback?  This would imply
> thinking harder about the initialization state we expect in the
> structure, but I think that we should try to design things so as
> extensions do not need to copy-paste more code from the core tree at
> the end, just less of it.

... and this will be taking the next step which is providing callbacks
and making
more jumbling utilities global. This will require more discussion, but I
would think we would expose InitJumble() and it will do the bare minimum
to initialize a JumbleState, and some fields that can define callbacks after
the fact. There will be a callback for a normalization function and a
callback function that will allow the user to implement jumbling functions
for nodes that are currently not included in queryjumblefuncs.switch.c, or
perhaps they can override the existing logic in this generated file.

> Of course, this sentence is written with the same line of thoughts as
> previously mentioned in the other thread we have discussed: extensions
> should not be allowed to update a JumbleState after it's been set by
> the backend code, so as once the same JumbleState pointer is passed
> down across multiple extensions they don't get confused.  If an
> extension wants to use their own policy within the JumbleState, they
> had better recreate a new independent one if they are unhappy about
> has been generated previously.

Yes, correct. If we provide the interface to create an additional JumbleState,
they can create an independent state.

For this thread, I would like to focus on the first goal.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
"zengman"
Дата:
Hi,

Though this may be tangential to the current topic, I have long been wanting to revise the two instances of
`Assert(len_to_wrt>= 0);`
 
in the code to the implementation below. Would you kindly advise if this modification is worthwhile?

```
    if (len_to_wrt > 0)
    {
        memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
        n_quer_loc += len_to_wrt;
    }
```

--
Regards,
Man Zeng
www.openhalo.org

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> Though this may be tangential to the current topic, I have long been wanting to revise the two instances of
`Assert(len_to_wrt>= 0);`
 
> in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
>
> ```
>         if (len_to_wrt > 0)
>         {
>                 memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
>                 n_quer_loc += len_to_wrt;
>         }
> ```

I am not sure I like this idea. The len_to_wrt being less than 0
indicates a bug which will be hidden behind such a loop. I
think it's better to keep the Assert as-is.

> > > This way, any extension that wishes to return a normalized string from
> > > the same JumbleState can invoke this callback and get consistent results.
> > > pg_stat_statements and other extensions with a need to normalize a query
> > > string based on the locations of a JumbleState do not need to care about the
> > > internals of normalization, they simply invoke the callback and
> > > receive the final
> > > string.
> >
> > Hmm.  I did not wrap completely my head with your problem, but,
> > assuming that what you are proposing goes in the right direction,
>
> The first goal is to move all query-normalization-related infrastructure
> that pg_stat_statements (and other extensions) rely on into core, so
> extensions no longer need to copy or reimplement normalization logic and
> can all depend on a single, shared implementation.
>
> In addition, query normalization necessarily modifies JumbleState (to
> record constant locations and lengths). This responsibility should not
> fall to extensions and should instead be delegated to core. I will argue
> that the current design, in which extensions handle this directly, is a
> layering violation.
>
> As a first step, we can move generate_normalized_query to core as a global
> function, allowing extensions to simply call it.

v3 implements this approach without a callback. This establishes a clear
boundary: core owns JumbleState modifications, extensions consume the
results through the API.

I kept the post_parse_analyze hook signature unchanged since
GenerateNormalizedQuery still needs to modify JumbleState
(to populate constant lengths). Making it const would be misleading.

For the second part of making more jumbling utilities global, this can
be taken up in another thread. I am now thinking we would be better
off actually taking all the generic jumbling routines and separating
them into their own C file. So we can have a new file like primjumble.c
which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
can worry about its core business of jumbling a Query tree. Anyhow,
these are just some high level thoughts on this part for now.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
"zengman"
Дата:
> I am not sure I like this idea. The len_to_wrt being less than 0
> indicates a bug which will be hidden behind such a loop. I
> think it's better to keep the Assert as-is.

Well, my main concern is that assertions are unlikely to be enabled in production environments. 
Also, the new patch looks really good. Besides, it seems you accidentally forgot to cc Michael.

--
Regards,
Man Zeng
www.openhalo.org

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Wed, Dec 24, 2025 at 10:50:16AM +0800, zengman wrote:
> Besides, it seems you accidentally forgot to cc Michael.

This is not a problem.  I still get the messages by being registered
to this list :D
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
> > Though this may be tangential to the current topic, I have long been wanting to revise the two instances of
`Assert(len_to_wrt>= 0);` 
> > in the code to the implementation below. Would you kindly advise if this modification is worthwhile?
> >
> > ```
> >         if (len_to_wrt > 0)
> >         {
> >                 memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
> >                 n_quer_loc += len_to_wrt;
> >         }
> > ```
>
> I am not sure I like this idea. The len_to_wrt being less than 0
> indicates a bug which will be hidden behind such a loop. I
> think it's better to keep the Assert as-is.

This set of asserts are critical to keep.  An incorrect computation is
a synonym of an out-of-bound write in this case, which would classify
any problem as something that could be worth a CVE.

> v3 implements this approach without a callback. This establishes a clear
> boundary: core owns JumbleState modifications, extensions consume the
> results through the API.
>
> I kept the post_parse_analyze hook signature unchanged since
> GenerateNormalizedQuery still needs to modify JumbleState
> (to populate constant lengths). Making it const would be misleading.
>
> For the second part of making more jumbling utilities global, this can
> be taken up in another thread. I am now thinking we would be better
> off actually taking all the generic jumbling routines and separating
> them into their own C file. So we can have a new file like primjumble.c
> which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
> can worry about its core business of jumbling a Query tree. Anyhow,
> these are just some high level thoughts on this part for now.

Hmm.  Moving from pgss to the core backup the code that updates the
clocations of an existing JumbleState does not solve the issue that
we should not modify the internals of the JumbleState computed.  So
this does not move the ball at all.

The updates of clocations depend on GenerateNormalizedQuery(), which
depends on the pre-work done in CleanQuerytext() to get a query string
and its location for a multi-query string case.  If we really want to
make a JumbleState not touchable at all, we could either push more
work of CleanQuerytext() and GenerateNormalizedQuery() into core.

Or  a second thing that we can do, which would be actually much
simpler, is to rework entirely fill_in_constant_lengths() so as we
generate a *copy* of the location data PGSS wants rather than force it
into a JumbleState that would be pushed across more stacks of the
post-parse-analyze hook.  Doing that would allow us to pull the plug
into making JumbleState and the original copies of clocations a set of
consts when given to extensions.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> > For the second part of making more jumbling utilities global, this can
> > be taken up in another thread. I am now thinking we would be better
> > off actually taking all the generic jumbling routines and separating
> > them into their own C file. So we can have a new file like primjumble.c
> > which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
> > can worry about its core business of jumbling a Query tree. Anyhow,
> > these are just some high level thoughts on this part for now.
>
> Hmm.  Moving from pgss to the core backup the code that updates the
> clocations of an existing JumbleState does not solve the issue that
> we should not modify the internals of the JumbleState computed.  So
> this does not move the ball at all.

This is for the point "I am wondering if we should not expose a bit more
the jumble query APIs" [0], which as I mention is a separate topic and is
not about extensions being able to modify JumbleState.

> The updates of clocations depend on GenerateNormalizedQuery(), which
> depends on the pre-work done in CleanQuerytext() to get a query string
> and its location for a multi-query string case.  If we really want to
> make a JumbleState not touchable at all, we could either push more
> work of CleanQuerytext() and GenerateNormalizedQuery() into core.

When you say “push more work into core,” I understand that to mean this work
should occur during jumbling. If so, there are several complications.

1/ CleanQueryText() requires access to the query text, which we do not have
during core jumbling as of 2ecbb0a4935.

2/ GenerateNormalizedQuery() should be invoked on demand by the extension
that needs the normalized string, for example when pg_stat_statements needs
to store a query string.

Both operations are expensive, especially GenerateNormalizedQuery(). Doing
this work on every JumbleQuery() would introduce unnecessary overhead.

> Or  a second thing that we can do, which would be actually much
> simpler, is to rework entirely fill_in_constant_lengths() so as we
> generate a *copy* of the location data PGSS wants rather than force it
> into a JumbleState

yes, that's an idea that did cross my mind. I have hesitation of copying
clocations, but that may not be such a big deal, since it will only be
occurring on demand when the extension requests a normalized string.

> that would be pushed across more stacks of the
> post-parse-analyze hook.  Doing that would allow us to pull the plug
> into making JumbleState and the original copies of clocations a set of
> consts when given to extensions.

Yes correct. The hook signature will turn into, so all extensions now
just have a constant pointer to the jumblestate. They can of course
work hard to cast away constants, but at that point, they are doing
things the wrong way.

```
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,

        Query *query,

        const JumbleState *jstate);
```

This of course will be a big breaking change to all the extensions out
there using this hook. This is not just about a signature change of
the hook, but an author now has to figure out how to deal with a
constant JumbleState in their code, which may not be trivial.

My proposal in v3 was a bit more softer, and keeps the hook
backwards compatible. Basically, we keep JumbleState a non-constant,
but provide core APIs for any  operation, such as generate_normalized_query,
that needs to modify the state. So, my approach was not about enforcing a
read-only JumbleState, but about providing the API to dissuade an author
from modifying a JumbleState.
If we need a stronger API contract, then we can go with the hook receving
JumbleState as a constant and implement the copy of clocations to return
back to extensions that need to normalize a query. We also have to keep
in mind that if there are future requirements where the state has to be
modified by an extension, we will need to implement more copy functions
for those field.

[0] https://www.postgresql.org/message-id/aUXeTEpSymo6n7lF%40paquier.xyz

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih <samimseih@gmail.com> wrote:
> that would be pushed across more stacks of the
> post-parse-analyze hook.  Doing that would allow us to pull the plug
> into making JumbleState and the original copies of clocations a set of
> consts when given to extensions.

Yes correct. The hook signature will turn into, so all extensions now
just have a constant pointer to the jumblestate. They can of course
work hard to cast away constants, but at that point, they are doing
things the wrong way.

```
 typedef void (*post_parse_analyze_hook_type) (ParseState *pstate,

        Query *query,

        const JumbleState *jstate);
```

This of course will be a big breaking change to all the extensions out
there using this hook. This is not just about a signature change of
the hook, but an author now has to figure out how to deal with a
constant JumbleState in their code, which may not be trivial.

I wonder if there is a single extension out there that actually utilizes JumbleState in that hook -
it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because pg_stat_statements
needed it, but even Julien at the time was critical of adding it, mainly considering it for pgss needs if I read
the archive correctly [0]. CCing Julien to correct me if I misinterpret the historic record here.

Reading through the discussion in this thread here I do wonder a bit if we shouldn't just take that out of the
hook again, and instead provide separate functions for getting the normalized query string (generating it the
first time its called).

My proposal in v3 was a bit more softer, and keeps the hook
backwards compatible. Basically, we keep JumbleState a non-constant,
but provide core APIs for any  operation, such as generate_normalized_query,
that needs to modify the state. So, my approach was not about enforcing a
read-only JumbleState, but about providing the API to dissuade an author
from modifying a JumbleState.

Given the lack of public APIs to modify JumbleState today, I don't see how an extension would do
modifications in a meaningful way, short of copying the code. I think we should be a bit bolder here in
enforcing a convention, either clearly making it read-only or dropping the argument again.

Thinking through a couple of theoretical use cases:

1) An extension that wants to display normalized query strings

This seems to be the biggest kind of what I can find with code search. Extensions like pg_stat_monitor [1], that
want to do something like pg_stat_statements, and have to copy a bunch of normalization code today that is 1:1 what
Postgres does. Such extensions don't need the JumbleState argument if they can get the normalized text directly.

2) An extension that wants to capture parameter values

Some extensions may want to remember additional context for normalized queries, like pg_tracing's logic for
optionally passing parameter values (post normalization) in the trace context [2]. If we kept passing a read-only 
JumbleState then such extensions could presumably still get this, but I wonder if it wouldn't be better for core to
have a helper for this?

3) An extension that modifies the JumbleState to cause different normalization to happen

I'm not aware of any extensions doing this, and I couldn't find any either.  And since such theoretical extensions
can directly modify query->queryId in the same hook, why not do that?

4) Extensions using the presence of JumbleState to determine whether query IDs are available (?)

I noticed pg_hint_plan checks the JumbleState argument to determine whether or not to run an early check
of the hint logic. I'm a little confused by this (and if this is about query IDs being available, couldn't it just check the
Query having a non-zero queryID?), but FWIW: [3]


Thanks,
Lukas

--
Lukas Fittl

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Mon, Dec 29, 2025 at 06:34:18PM -0800, Lukas Fittl wrote:
> I noticed pg_hint_plan checks the JumbleState argument to determine whether
> or not to run an early check of the hint logic. I'm a little
> confused by this (and if this is about query IDs being available,
> couldn't it just check the  Query having a non-zero queryID?), but
> FWIW: [3]

See also the code before 32ced2e13e75.  We wanted a JumbleState in
this case to be able to generate a normalized query for the hint
table.  Since this commit we only rely on the query ID in the hint
table, so it would not matter for pg_hint_plan if the JumbleState is
removed from this portions of the hooks.

Saying that, yes I think that you are right, we should be able to
remove this dependency on JumbleState in
pg_hint_plan_post_parse_analyze() and rely on the query ID.  I'm
writing down a note about cleaning that on the HEAD branch of the
module..
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
> v3 implements this approach without a callback. This establishes a clear
> boundary: core owns JumbleState modifications, extensions consume the
> results through the API.
> 

Thanks for the new patch version.

Some random comments:

=== 1

+       SetConstantLengths((JumbleState *) jstate, query, query_loc);

This cast seems unnecessary.

=== 2

+CompLocation(const void *a, const void *b)

In the commit message I can see "Functions are renamed to match core naming
conventions" but wasn't comp_location() better? 

=== 3

+                       /*
+                        * generate the normalized query. Note that the normalized
+                        * representation may well vary depending on just which
+                        * "equivalent" query is used to create the hashtable entry. We
+                        * assume this is OK.
+                        */
+                       norm_query = GenerateNormalizedQuery(jstate, query,

Should part of this comment be on top of the GenerateNormalizedQuery()
definition instead?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> === 1
>
> +       SetConstantLengths((JumbleState *) jstate, query, query_loc);
>
> This cast seems unnecessary.

Right. This is unnecessary. I will remove in the next iteration.

> === 2
>
> +CompLocation(const void *a, const void *b)
>
> In the commit message I can see "Functions are renamed to match core naming
> conventions" but wasn't comp_location() better?

Not sure if it's better or worse. I aimed for consistency here and
used pascal case to
match.


> === 3
>
> +                       /*
> +                        * generate the normalized query. Note that the normalized
> +                        * representation may well vary depending on just which
> +                        * "equivalent" query is used to create the hashtable entry. We
> +                        * assume this is OK.
> +                        */
> +                       norm_query = GenerateNormalizedQuery(jstate, query,
>
> Should part of this comment be on top of the GenerateNormalizedQuery()
> definition instead?

No, because this talks about the hashtable entry which is specific to
pg_stat_statements.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
>> backwards compatible. Basically, we keep JumbleState a non-constant,
>> but provide core APIs for any operation, such as
>> generate_normalized_query,
>> that needs to modify the state. So, my approach was not about enforcing a
>> read-only JumbleState, but about providing the API to dissuade an author
>> from modifying a JumbleState.

> Given the lack of public APIs to modify JumbleState today, I don't see how
> an extension would do
> modifications in a meaningful way, short of copying the code. I think we
> should be a bit bolder here in
> enforcing a convention, either clearly making it read-only or dropping the
> argument again.

Based on the discussion so far I am leaning towards making JumbleState
read-only as described here [0]. I don't see how we can drop JumbleState
completely from hooks, since normalization needs to occur on-demand
by the extension.

> 1) An extension that wants to display normalized query strings
>
> This seems to be the biggest kind of what I can find with code search.
> Extensions like pg_stat_monitor [1], that
> want to do something like pg_stat_statements, and have to copy a bunch of
> normalization code today that is 1:1 what
> Postgres does. Such extensions don't need the JumbleState argument if they
> can get the normalized text directly.

Yes, I don't know how that's possible; besides generating the normalized
string during JumbleQuery and making it available to post_parse_analyze
hook ( and other executor hooks ). But this also means we are incurring
the normalization overhead for every execution.

> 2) An extension that wants to capture parameter values
>
> Some extensions may want to remember additional context for normalized
> queries, like pg_tracing's logic for
> optionally passing parameter values (post normalization) in the trace
> context [2]. If we kept passing a read-only
> JumbleState then such extensions could presumably still get this, but I
> wonder if it wouldn't be better for core to
> have a helper for this?

This could be like a core GenerateNormalizedQuery which can optionally
track the constant values. That will be an enhancement to normalization
and a new requirement.

[0] https://www.postgresql.org/message-id/CAA5RZ0sbWmqdUBFo8JXMJe72pnwjxVY58htJ6pKbwnyQuRctQw%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> >> backwards compatible. Basically, we keep JumbleState a non-constant,
> >> but provide core APIs for any operation, such as
> >> generate_normalized_query,
> >> that needs to modify the state. So, my approach was not about enforcing a
> >> read-only JumbleState, but about providing the API to dissuade an author
> >> from modifying a JumbleState.
>
> > Given the lack of public APIs to modify JumbleState today, I don't see how
> > an extension would do
> > modifications in a meaningful way, short of copying the code. I think we
> > should be a bit bolder here in
> > enforcing a convention, either clearly making it read-only or dropping the
> > argument again.
>
> Based on the discussion so far I am leaning towards making JumbleState
> read-only as described here [0]. I don't see how we can drop JumbleState
> completely from hooks, since normalization needs to occur on-demand
> by the extension.

Here is v4.

0001 - addresses the comments made by Bertrand.
0002 - makes JumbleState a constant when passed to post_parse_analyze
and updates all downstream code that consume the JumbleState. This
means we now need to copy the locations from JState into a local/temporary
array when generating the normalized string.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Tue, Dec 30, 2025 at 01:31:30PM +0900, Michael Paquier wrote:
> Saying that, yes I think that you are right, we should be able to
> remove this dependency on JumbleState in
> pg_hint_plan_post_parse_analyze() and rely on the query ID.  I'm
> writing down a note about cleaning that on the HEAD branch of the
> module..

For the note, I have looked at that again and this improvement is
right, so I have pushed something to simplify the code:
https://github.com/ossc-db/pg_hint_plan/commit/3a63a56740e48e1d205294d5df6e198716718882

Thanks for the suggestion, Lukas.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> Here is v4.
>
> 0001 - addresses the comments made by Bertrand.
> 0002 - makes JumbleState a constant when passed to post_parse_analyze
> and updates all downstream code that consume the JumbleState. This
> means we now need to copy the locations from JState into a local/temporary
> array when generating the normalized string.

PFA a mandatory rebase.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
Hi Sami,

On Mon, Mar 16, 2026 at 7:12 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> > Here is v4.
> >
> > 0001 - addresses the comments made by Bertrand.
> > 0002 - makes JumbleState a constant when passed to post_parse_analyze
> > and updates all downstream code that consume the JumbleState. This
> > means we now need to copy the locations from JState into a local/temporary
> > array when generating the normalized string.

In 0001:

> diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
> > index 87db8dc1a32..d4b26202c47 100644
> --- a/src/backend/nodes/queryjumblefuncs.c
> +++ b/src/backend/nodes/queryjumblefuncs.c
> ...
> @@ -773,3 +775,249 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate,
> +
> +    /* we don't want to re-emit any escape string warnings */
> +    yyextra.escape_string_warning = false;
> +

I don't think this is needed anymore, as of
45762084545ec14dbbe66ace1d69d7e89f8978ac.

> +/*
> + * Callback to generate a normalized version of the query string that will be used to
> + * represent all similar queries.
> + *

I don't think the term "Callback" makes sense here - I think you could
just keep the original wording.

In 0002:

> diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
> index d4b26202c47..fe8f0ccd278 100644
> --- a/src/backend/nodes/queryjumblefuncs.c
> +++ b/src/backend/nodes/queryjumblefuncs.c
> ...
> @@ -813,14 +815,20 @@ SetConstantLengths(JumbleState *jstate, const char *query,
>     core_YYSTYPE yylval;
>     YYLTYPE        yylloc;
>
> +    if (jstate->clocations_count == 0)
> +        return NULL;
> +
> +    /* Copy constant locations to avoid modifying jstate */
> +    locs = palloc(jstate->clocations_count * sizeof(LocationLen));
> +    memcpy(locs, jstate->clocations, jstate->clocations_count * sizeof(LocationLen));
> +

You could use palloc_array for locs here.

> @@ -938,12 +948,13 @@ GenerateNormalizedQuery(JumbleState *jstate, const char *query,
>                 last_off = 0,    /* Offset from start for previous tok */
>                 last_tok_len = 0;    /* Length (in bytes) of that tok */
>     int            num_constants_replaced = 0;
> +    LocationLen *locs = NULL;
>
>     /*
>      * Set constants' lengths in JumbleState, as only locations are set during
>      * DoJumble(). Note this also ensures the items are sorted by location.
>      */
> -    SetConstantLengths(jstate, query, query_loc);
> +    locs = SetConstantLengths(jstate, query, query_loc);

I think we should update the comment here to reflect the fact that
we're no longer modifying JumbleState.

Otherwise these patches look good - it'd be nice to still get this
into 19 so we have less code duplication across the different
extensions working with normalized query text.

Thanks,
Lukas

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
Thanks for looking!

> I don't think this is needed anymore, as of
> 45762084545ec14dbbe66ace1d69d7e89f8978ac.

Correct. That showed up after my last rebase.

> > +/*
> > + * Callback to generate a normalized version of the query string that will be used to
> > + * represent all similar queries.
> > + *
>
> I don't think the term "Callback" makes sense here - I think you could
> just keep the original wording.

This was a remnant of my earlier experimentation. I removed.

A few notes on the comments:

- * Note that the normalized representation may well vary depending on
- * just which "equivalent" query is used to create the hashtable entry.
- * We assume this is OK.

This was in the original generate_normalize_query and since it mentions
hashtable entry, I moved the comment (in-spirit) to where pg_stat_statements
calls GenerateNormailzeQuery

* If query_loc > 0, then "query" has been advanced by that much compared to
* the original string start, as is the case with multi-statement strings, so
* we need to translate the provided locations to compensate. (This lets us
* avoid re-scanning statements before the one of interest, so it's worth doing.)
*
This comment was originally duplicated in both SetConstantLengths, so I
just kept it as-is in SetConstantLengths and added a shorter reference in
GenerateNormalizeQuery

Also, this comment "It is the caller's job to ensure that the string
is a valid SQL statement..."
made more sense in GenerateNormalizeQuery rather than SetConstantLengths, since
GenerateNormalizeQuery is the public function.


> In 0002:
> You could use palloc_array for locs here.

done.

> I think we should update the comment here to reflect the fact that
> we're no longer modifying JumbleState.

done.

> Otherwise these patches look good - it'd be nice to still get this
> into 19 so we have less code duplication across the different
> extensions working with normalized query text.

I agree!

v6 addresses the comments.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
Hi Sami,

On Wed, Mar 25, 2026 at 8:31 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> v6 addresses the comments.
>

Great - I've tested that again and changes look good.

Two more thoughts:

1) I think "SetConstantLengths" should be renamed to
"GetConstantLengths", or "CalculateConstantLengths" in 0002, since
we're no longer modifying JumbleState

2) I wonder if we should export this function in 0002 - that would
specifically help pg_tracing, which also wants to extract inline
parameter values in addition to replacing them with a $n parameter
reference marker - I could also see that being useful for any other
extension that's interested in pulling out parameter values

I've also done some testing with two previously mentioned extensions,
pg_stat_monitor [0] and pg_tracing [1]. Both look like they can be
adapted to the new method with their regression tests succeeding.

Thanks,
Lukas

[0]: https://github.com/lfittl/pg_stat_monitor/commit/d3521de9f6736c1bb745bc31dae736540efe1781
[1]: https://github.com/lfittl/pg_tracing/commit/ecff95a87a1c60e8a5fe47662cc6a2148bd3632f

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> 1) I think "SetConstantLengths" should be renamed to
> "GetConstantLengths", or "CalculateConstantLengths" in 0002, since
> we're no longer modifying JumbleState

Makes sense. I renamed it to ComputeConstantLengths; as this will reflect
that it's more than a getter function, and it's doing actual work. Compute
also seems more consistent with other function names we have out there.

> 2) I wonder if we should export this function in 0002 - that would
> specifically help pg_tracing, which also wants to extract inline
> parameter values in addition to replacing them with a $n parameter
> reference marker - I could also see that being useful for any other
> extension that's interested in pulling out parameter values

> I've also done some testing with two previously mentioned extensions,
> pg_stat_monitor [0] and pg_tracing [1]. Both look like they can be
> adapted to the new method with their regression tests succeeding.

That did cross my mind. I agree with this.

Both of those changes belong in 0002. See the attached v7.

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Thu, Mar 26, 2026 at 10:19 AM Sami Imseih <samimseih@gmail.com> wrote:
> Both of those changes belong in 0002. See the attached v7.

Looks good!

I've also done a quick follow-up test with pg_tracing and that
simplifies its logic as expected [0] to be able to extract inline
parameter values.

Thanks,
Lukas

[0]:
https://github.com/lfittl/pg_tracing/commit/ae937fdee4aa57206b31b5746f36dd8e55cf43d1#diff-41cf816684c420b8808fb4899e38a38aaf4f875613c4785e291010aa2b0ea267

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Thu, Mar 26, 2026 at 06:50:20PM -0700, Lukas Fittl wrote:
> On Thu, Mar 26, 2026 at 10:19 AM Sami Imseih <samimseih@gmail.com> wrote:
> > Both of those changes belong in 0002. See the attached v7.
>
> Looks good!
>
> I've also done a quick follow-up test with pg_tracing and that
> simplifies its logic as expected [0] to be able to extract inline
> parameter values.

I have been looking at what you have here.  As mentioned upthread, I
am on board with aiming for JumbleState to be a const so as we enforce
as policy that no extension is allowed to touch what the core code has
computed.

However, I am not convinced by most of the patch.  The logic to
recompute the lengths of the constants is a concept proper to PGSS.
Perhaps we could reconsider moving that into core at some point, but
I am honestly not sure to see the range of benefits we'd gain from
that.

This line of arguments is stronger for the normalization of the query
string.  Why should the core code decide what a normalized string
should look like when it comes to the detection of the constants, if
any?  Instead of a dollar-quoted number, we could enforce a bunch of
things, like a '?' or a '$woozah$' at these locations.

Saying that, the key point of the patch is to take a copy of the
JumbleState locations, and recompute it for the needs of PGSS for the
sake of the normalized query representation.  Hence, why don't we just
do that at the end?  That should be enough to enforce the const marker
for the JumbleState across all the loaded extensions that want to look
at it.  This leads me to the simpler patch attached.

Comments or tomatoes?  That's simpler, and I'd be OK with just that
for v19.  That would be much better than the current state of affairs,
where PGSS decides to enforce its own ideas in the JumbleState, ideas
fed to anything looping into the post-parse-analyze hook after PGSS.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Thu, Mar 26, 2026 at 10:18 PM Michael Paquier <michael@paquier.xyz> wrote:
> This line of arguments is stronger for the normalization of the query
> string.  Why should the core code decide what a normalized string
> should look like when it comes to the detection of the constants, if
> any?  Instead of a dollar-quoted number, we could enforce a bunch of
> things, like a '?' or a '$woozah$' at these locations.

Fair enough, though I haven't seen any extensions that do that in
practice - its reasonable to have normalization result in a query
string that's parsable again and can be passed to EXPLAIN
(GENERIC_PLAN).

> Saying that, the key point of the patch is to take a copy of the
> JumbleState locations, and recompute it for the needs of PGSS for the
> sake of the normalized query representation.  Hence, why don't we just
> do that at the end?  That should be enough to enforce the const marker
> for the JumbleState across all the loaded extensions that want to look
> at it.  This leads me to the simpler patch attached.
>
> Comments or tomatoes?  That's simpler, and I'd be OK with just that
> for v19.  That would be much better than the current state of affairs,
> where PGSS decides to enforce its own ideas in the JumbleState, ideas
> fed to anything looping into the post-parse-analyze hook after PGSS.

I'm not convinced that making the const change alone is a good idea,
without also providing some helpers to reduce the repeated code in
extensions.

What if we only put the ComputeConstantLengths (as Sami had it in v7)
in core, together with making JumbleState const?

Thanks,
Lukas

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> On Thu, Mar 26, 2026 at 10:18 PM Michael Paquier <michael@paquier.xyz> wrote:
> > This line of arguments is stronger for the normalization of the query
> > string.  Why should the core code decide what a normalized string
> > should look like when it comes to the detection of the constants, if
> > any?  Instead of a dollar-quoted number, we could enforce a bunch of
> > things, like a '?' or a '$woozah$' at these locations.
>
> Fair enough, though I haven't seen any extensions that do that in
> practice - its reasonable to have normalization result in a query
> string that's parsable again and can be passed to EXPLAIN
> (GENERIC_PLAN).

with regards to generate_normalized_query, AFAICT, the most common
case is extensions are using it for is dollar quoted number, but I agree
this one is a gray area.

> What if we only put the ComputeConstantLengths (as Sami had it in v7)
> in core, together with making JumbleState const?

I agree that ComputeConstantLengths should be in core. This one is
not a gray area IMO. The query jumble already records constant locations,
but leaves the lengths unset. ComputeConstantLengths is just the
completion of that  work. There could be no other interpretation,
unlike generate_normalized_query, of what the lengths should be.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:

> On Mar 28, 2026, at 2:09, Sami Imseih <samimseih@gmail.com> wrote:
> I agree that ComputeConstantLengths should be in core. This one is
> not a gray area IMO. The query jumble already records constant locations,
> but leaves the lengths unset. ComputeConstantLengths is just the
> completion of that  work. There could be no other interpretation,
> unlike generate_normalized_query, of what the lengths should be.

This is an interesting remark. One problem with any patches presented yet is that we don’t attach the lengths as part
ofthe in-core query jumbling procedure: we plug the lengths later using the same structure as the query jumbling. It
seemsto me that this is half-baked overall. I think that we don’t want to pay the extra length computation in the core
queryjumbling at the end, then why does it make sense to include the lengths in the JumbleState structure at all?
Shouldn’twe use a different structure filled inside PGSS for this purpose rather than reuse the same thing for PGSS and
thein-core part (don’t have the code in front of me now). 
--
Michael





Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Fri, Mar 27, 2026 at 6:42 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> > On Mar 28, 2026, at 2:09, Sami Imseih <samimseih@gmail.com> wrote:
> > I agree that ComputeConstantLengths should be in core. This one is
> > not a gray area IMO. The query jumble already records constant locations,
> > but leaves the lengths unset. ComputeConstantLengths is just the
> > completion of that  work. There could be no other interpretation,
> > unlike generate_normalized_query, of what the lengths should be.
>
> This is an interesting remark. One problem with any patches presented yet is that we don’t attach the lengths as part
ofthe in-core query jumbling procedure: we plug the lengths later using the same structure as the query jumbling. It
seemsto me that this is half-baked overall. I think that we don’t want to pay the extra length computation in the core
queryjumbling at the end, then why does it make sense to include the lengths in the JumbleState structure at all?
Shouldn’twe use a different structure filled inside PGSS for this purpose rather than reuse the same thing for PGSS and
thein-core part (don’t have the code in front of me now). 

I see where you're coming from on that, but I don't think we can
remove anything here in practice:

typedef struct LocationLen
{
    int            location; /* Required */
    int            length; /* Set by _jumbleElements */
    bool        squashed; /* Set by RecordConstLocation called from
_jumbleElements */
    bool        extern_param; /* Set by RecordConstLocation called
from  _jumbleParam */
} LocationLen;

typedef struct JumbleState
{
    unsigned char *jumble; /* Required */
    Size        jumble_len; /* Required */
    LocationLen *clocations; /* Required to track constant locations */
    int            clocations_buf_size; /* Required to track constant
locations */
    int            clocations_count; /* Required to track constant locations */
    int            highest_extern_param_id; /* Set by _jumbleParam */
    bool        has_squashed_lists; /* Set by _jumbleElements */
    unsigned int pending_nulls; /* Required */
    Size        total_jumble_len; /* Required */
} JumbleState;

The only refactoring I could think of is to write out the
_jumbleElements information separately, then we could actually drop
length, and maybe squashed, from LocationLen. But I'm not sure that
really buys us much? It would be more clear I guess, because the mixed
locations of where length gets set is indeed a bit odd.

I still think it'd be reasonable for us to include
ComputeConstantLengths in core to complete the picture of what we're
doing with _jumbleElements and the length field already anyway. Its
basically a way to fully hydrate the partially filled out JumbleState
from the initial jumble.

Thanks,
Lukas

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> > > I agree that ComputeConstantLengths should be in core. This one is
> > > not a gray area IMO. The query jumble already records constant locations,
> > > but leaves the lengths unset. ComputeConstantLengths is just the
> > > completion of that  work. There could be no other interpretation,
> > > unlike generate_normalized_query, of what the lengths should be.
> >
> > This is an interesting remark. One problem with any patches presented yet is that we don’t attach the lengths as
partof the in-core query jumbling procedure: we plug the lengths later using the same structure as the query jumbling.
Itseems to me that this is half-baked overall. I think that we don’t want to pay the extra length computation in the
corequery jumbling at the end, then why does it make sense to include the lengths in the JumbleState structure at all?
Shouldn’twe use a different structure filled inside PGSS for this purpose rather than reuse the same thing for PGSS and
thein-core part (don’t have the code in front of me now). 
>
> I see where you're coming from on that, but I don't think we can
> remove anything here in practice:

Yes. Not unless we want to rely on the parser to track the lengths in
Const, which could be invasive, but I have not looked into it.

Paying the length computation at the end of every jumble is also
going to impact performance, so that will not be a good idea.

> I still think it'd be reasonable for us to include
> ComputeConstantLengths in core to complete the picture of what we're
> doing with _jumbleElements and the length field already anyway. Its
> basically a way to fully hydrate the partially filled out JumbleState
> from the initial jumble.

I fully agree, ComputeConstantLengths is an optional post-jumble-query step
for a consumer that wishes to calculate the lengths. The length calculation
is not unique to a plug-in, so in my mind the work it's doing is core
jumbling functionality.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Mon, Mar 30, 2026 at 02:20:47PM -0500, Sami Imseih wrote:
>> I see where you're coming from on that, but I don't think we can
>> remove anything here in practice:
>
> Yes. Not unless we want to rely on the parser to track the lengths in
> Const, which could be invasive, but I have not looked into it.

Hmm.  We may not want to get down to that, still that would be
cheaper than reprocessing the parsing of the query string twice.

>> I still think it'd be reasonable for us to include
>> ComputeConstantLengths in core to complete the picture of what we're
>> doing with _jumbleElements and the length field already anyway. Its
>> basically a way to fully hydrate the partially filled out JumbleState
>> from the initial jumble.
>
> I fully agree, ComputeConstantLengths is an optional post-jumble-query step
> for a consumer that wishes to calculate the lengths. The length calculation
> is not unique to a plug-in, so in my mind the work it's doing is core
> jumbling functionality.

Okay.  I could fall into that for this release.  Marking the
JumbleState as a const is the most important piece here.  I'm +-0
regarding this routine, but I can also see your point about how it's
useful to give at least the option to extensions to have a
recomputation of the Const lengths, the same way as PGSS.  What are
the extensions that would use that?
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> >> I see where you're coming from on that, but I don't think we can
> >> remove anything here in practice:
> >
> > Yes. Not unless we want to rely on the parser to track the lengths in
> > Const, which could be invasive, but I have not looked into it.
>
> Hmm.  We may not want to get down to that, still that would be
> cheaper than reprocessing the parsing of the query string twice.

Although unless an extension, at least pg_stat_statements, should
not be doing this double work often, and if it is it is due to heavy
churn on entries.

> >> I still think it'd be reasonable for us to include
> >> ComputeConstantLengths in core to complete the picture of what we're
> >> doing with _jumbleElements and the length field already anyway. Its
> >> basically a way to fully hydrate the partially filled out JumbleState
> >> from the initial jumble.
> >
> > I fully agree, ComputeConstantLengths is an optional post-jumble-query step
> > for a consumer that wishes to calculate the lengths. The length calculation
> > is not unique to a plug-in, so in my mind the work it's doing is core
> > jumbling functionality.
>
> Okay.  I could fall into that for this release.  Marking the
> JumbleState as a const is the most important piece here.

I do agree.

> I'm +-0 regarding this routine, but I can also see your point about how it's
> useful to give at least the option to extensions to have a
> recomputation of the Const lengths, the same way as PGSS.  What are
> the extensions that would use that?

https://github.com/search?q=fill_in_constant_lengths&type=code

A few well-known extensions/tools out there based on a Github search.

--
Sami



Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Tue, Mar 31, 2026 at 7:54 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> > I'm +-0 regarding this routine, but I can also see your point about how it's
> > useful to give at least the option to extensions to have a
> > recomputation of the Const lengths, the same way as PGSS.  What are
> > the extensions that would use that?
>
> https://github.com/search?q=fill_in_constant_lengths&type=code
>
> A few well-known extensions/tools out there based on a Github search.

See attached a v9 that extracts ComputeConstantLengths from Sami's v7
for easier discussion.

I've done an updated test with pg_stat_monitor [0] and pg_tracing [1]
with that, and that has both extensions passing regressions tests.
I've also looked but not tested a third extension (sql_firewall) and
that looks like it should just work as well.

Its worth noting that pg_tracing had a secondary use case for its
fill_in_constant_lengths function, which is to find the first
non-comment token in a query (to skip over leading comments). Not
having that causes a small regression test difference. The maintainers
will have to decide whether its worth keeping their own function, but
I think that doesn't invalidate the utility of having this in core to
me.

Thanks,
Lukas

[0]: https://github.com/lfittl/pg_stat_monitor/commit/054f347344a380f7a59cb444685db71301669c61
[1]: https://github.com/lfittl/pg_tracing/commit/d48a36c9a2c8fc284cf63ec1161558b90d8b44ef

--
Lukas Fittl

Вложения

Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Wed, Apr 01, 2026 at 12:55:28AM -0700, Lukas Fittl wrote:
> On Tue, Mar 31, 2026 at 7:54 PM Sami Imseih <samimseih@gmail.com> wrote:
>>
>> > I'm +-0 regarding this routine, but I can also see your point about how it's
>> > useful to give at least the option to extensions to have a
>> > recomputation of the Const lengths, the same way as PGSS.  What are
>> > the extensions that would use that?
>>
>> https://github.com/search?q=fill_in_constant_lengths&type=code
>>
>> A few well-known extensions/tools out there based on a Github search.

FWIW, this search points to a lot of forks

> See attached a v9 that extracts ComputeConstantLengths from Sami's v7
> for easier discussion.
>
> I've done an updated test with pg_stat_monitor [0] and pg_tracing [1]
> with that, and that has both extensions passing regressions tests.
> I've also looked but not tested a third extension (sql_firewall) and
> that looks like it should just work as well.

Still I can see the gains here, so I guess that this version works
here.  I'm happy enough to see the const with JumbleState.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> >> > I'm +-0 regarding this routine, but I can also see your point about how it's
> >> > useful to give at least the option to extensions to have a
> >> > recomputation of the Const lengths, the same way as PGSS.  What are
> >> > the extensions that would use that?
> >>
> >> https://github.com/search?q=fill_in_constant_lengths&type=code
> >>
> >> A few well-known extensions/tools out there based on a Github search.
>
> FWIW, this search points to a lot of forks
>
> > See attached a v9 that extracts ComputeConstantLengths from Sami's v7
> > for easier discussion.
> >
> > I've done an updated test with pg_stat_monitor [0] and pg_tracing [1]
> > with that, and that has both extensions passing regressions tests.
> > I've also looked but not tested a third extension (sql_firewall) and
> > that looks like it should just work as well.
>
> Still I can see the gains here, so I guess that this version works
> here.  I'm happy enough to see the const with JumbleState.

I took a look at v9 and it LGTM.

--
Sami



Re: Refactor query normalization into core query jumbling

От
Michael Paquier
Дата:
On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote:
> I took a look at v9 and it LGTM.

I can also see that v9 had the idea to discard quite a few of the
edits I did previously.  Restored that, reworded one more place that
was refering to query normalization in ComputeConstantLengths(),
applied the result.  We're in time at the end.
--
Michael

Вложения

Re: Refactor query normalization into core query jumbling

От
Lukas Fittl
Дата:
On Mon, Apr 6, 2026 at 11:28 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote:
> > I took a look at v9 and it LGTM.
>
> I can also see that v9 had the idea to discard quite a few of the
> edits I did previously.  Restored that, reworded one more place that
> was refering to query normalization in ComputeConstantLengths(),
> applied the result.  We're in time at the end.

Thanks for getting this in!

(and apologies that I dropped your edits, I was looking at Sami's v7
version when I put that together)

Thanks,
Lukas

--
Lukas Fittl



Re: Refactor query normalization into core query jumbling

От
Sami Imseih
Дата:
> On Mon, Apr 6, 2026 at 11:28 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote:
> > > I took a look at v9 and it LGTM.
> >
> > I can also see that v9 had the idea to discard quite a few of the
> > edits I did previously.  Restored that, reworded one more place that
> > was refering to query normalization in ComputeConstantLengths(),
> > applied the result.  We're in time at the end.
>
> Thanks for getting this in!

Thanks Michael and Lukas!

--
Sami