Обсуждение: BUG #19108: Stack overflow duting query parse

Поиск
Список
Период
Сортировка

BUG #19108: Stack overflow duting query parse

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19108
Logged by:          Andrey Zhidenkov
Email address:      pensnarik@gmail.com
PostgreSQL version: 15.7
Operating system:   Oracle Linux 8.4
Description:

Hello everyone.

I had a stack overflow on my PostgreSQL production installation during query
parsing:

2025-11-05 19:04:58.749 +07 [477608] LOG:  server process (PID 633135) was
terminated by signal 11: Segmentation fault

execution stack (from coredump):

(lines from 0 to 1021 are nested calls of assign_collations_walker ->
expression_tree_walker.part)
#1022 0x00000000006077d4 in assign_collations_walker ()
#1023 0x00000000007099d3 in expression_tree_walker.part ()
#1024 0x00000000006077d4 in assign_collations_walker ()
#1025 0x00000000007099d3 in expression_tree_walker.part ()
#1026 0x00000000006077d4 in assign_collations_walker ()
#1027 0x00000000007099d3 in expression_tree_walker.part ()
#1028 0x00000000006077d4 in assign_collations_walker ()
#1029 0x00000000007099d3 in expression_tree_walker.part ()
#1030 0x00000000006077d4 in assign_collations_walker ()
#1031 0x00000000007099d3 in expression_tree_walker.part ()
#1032 0x00000000006077d4 in assign_collations_walker ()
#1033 0x00000000007099d3 in expression_tree_walker.part ()
#1034 0x00000000006077d4 in assign_collations_walker ()
#1035 0x00000000007099d3 in expression_tree_walker.part ()
--Type <RET> for more, q to quit, c to continue without paging--
#1036 0x00000000006077d4 in assign_collations_walker ()
#1037 0x00000000007099d3 in expression_tree_walker.part ()
#1038 0x00000000006077d4 in assign_collations_walker ()
#1039 0x00000000007099d3 in expression_tree_walker.part ()
#1040 0x00000000006077d4 in assign_collations_walker ()
#1041 0x00000000006073bb in assign_collations_walker ()
#1042 0x0000000000607cfd in assign_query_collations_walker ()
#1043 0x000000000070a008 in query_tree_walker ()
#1044 0x00000000005e4e86 in transformStmt ()
#1045 0x00000000005e6ea1 in parse_analyze_varparams ()
#1046 0x000000000080d67f in pg_analyze_and_rewrite_varparams ()
#1047 0x000000000080e6f1 in PostgresMain ()
#1048 0x000000000078eec0 in ServerLoop ()
#1049 0x000000000078fe74 in PostmasterMain ()
#1050 0x0000000000504d6d in main ()

Context:

#0  0x0000000000607239 in assign_collations_walker (node=0x4ec8398,
context=0x7ffe86cb50c0) at parse_collate.c:271
271             loccontext.pstate = context->pstate;
(gdb) print context
$1 = (assign_collations_context *) 0x7ffe86cb50c0
(gdb) print context->pstate
$2 = (ParseState *) 0x5593950
(gdb) print *context->pstate
$3 = {parentParseState = 0x0,
  p_sourcetext = 0x589a169 "select \"public\".\"tablename_v2\".\"uuid\",
\"public\".\"tablename_v1\".\"dispatch_direction_uuid\",
\"public\".\"tablename_v1\"..., p_rtable = 0x55940e8, p_joinexprs = 0x0,
p_joinlist = 0x55941f0, p_namespace = 0x5594240, p_lateral_active = false,
p_ctenamespace = 0x0, p_future_ctes = 0x0, p_parent_cte = 0x0,
p_target_relation = 0x0, p_target_nsitem = 0x0, p_is_insert = false,
  p_windowdefs = 0x0, p_expr_kind = EXPR_KIND_NONE, p_next_resno = 13,
p_multiassign_exprs = 0x0, p_locking_clause = 0x0, p_locked_from_parent =
false, p_resolve_unknowns = true, p_queryEnv = 0x0, p_hasAggs = false,
p_hasWindowFuncs = false,
  p_hasTargetSRFs = false, p_hasSubLinks = false, p_hasModifyingCTE = false,
p_last_srf = 0x0, p_pre_columnref_hook = 0x0, p_post_columnref_hook = 0x0,
p_paramref_hook = 0x610e10 <variable_paramref_hook>,
  p_coerce_param_hook = 0x610f70 <variable_coerce_param_hook>,
p_ref_hook_state = 0x5593a60}

postgres=# show max_stack_depth ;
 max_stack_depth
-----------------
 2MB
(1 row)

postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-22), 64-bit
(1 row)

The query itself looks like this (the query is huge, about 114 kb length,
but it's not too complicated, shouldn't trigger a recursion in query parser,
I guess):

select "public"."tablename_v1"."uuid", "public"."tablename_v1"."col1"
from "public"."tablename_v2"
         join "public"."tablename_v2" on ("public"."tablename_v2"."uuid" =
"public"."tablename_v2"."first_uuid" and
                                         "public"."tablename_v2"."plan_date"
=
                                         "public"."tablename_v2"."col1")
where ("public"."tablename_v2"."node_from_id",
"public"."tablename_v2"."plan_date") in
      ((39073021, cast('2025-09-10 06:15:00+00' as timestamp)), (40493380,
cast('2025-10-23 06:15:00+00' as timestamp)),
       (38609801, cast('2025-08-27 04:30:00+00' as timestamp)), (39038517,
cast('2025-09-09 06:15:00+00' as timestamp)),
       (39404783, cast('2025-09-20 04:30:00+00' as timestamp)), (39304692,
cast('2025-09-17 06:15:00+00' as timestamp)),
       (39499548, cast('2025-09-23 04:30:00+00' as timestamp)), (38807610,
cast('2025-09-02 06:15:00+00' as timestamp)),
       (39961087, cast('2025-10-07 04:30:00+00' as timestamp)), (40100019,
cast('2025-10-11 06:15:00+00' as timestamp)),
       (39730246, cast('2025-09-30 04:30:00+00' as timestamp)), (38941372,
cast('2025-09-06 04:30:00+00' as timestamp)),
       (39868487, cast('2025-10-04 06:15:00+00' as timestamp)), (39501928,
cast('2025-09-23 06:15:00+00' as timestamp)),
       (40192757, cast('2025-10-14 04:30:00+00' as timestamp)), (40560028,
cast('2025-10-25 06:15:00+00' as timestamp)),
       (40422762, cast('2025-10-21 04:30:00+00' as timestamp)), (40560027,
cast('2025-10-25 06:15:00+00' as timestamp)),
       (39175076, cast('2025-09-13 06:15:00+00' as timestamp)), (40788852,
cast('2025-11-01 06:15:00+00' as timestamp)),
       (38578021, cast('2025-08-26 06:15:00+00' as timestamp)), (40263862,
cast('2025-10-16 06:15:00+00' as timestamp)),
       ... about 1000 lines like above ...
       (39172974, cast('2025-09-13 04:30:00+00' as timestamp)), (38711230,
cast('2025-08-30 04:30:00+00' as timestamp)),
       (40328761, cast('2025-10-18 04:30:00+00' as timestamp)), (39036138,
cast('2025-09-09 04:30:00+00' as timestamp)),
       (40654028, cast('2025-10-28 06:15:00+00' as timestamp)), (39070641,
cast('2025-09-10 04:30:00+00' as timestamp)),
       (40195131, cast('2025-10-14 06:15:00+00' as timestamp)), (39304691,
cast('2025-09-17 06:15:00+00' as timestamp)),
       (37917566, cast('2025-08-06 04:30:00+00' as timestamp)), (40557932,
cast('2025-10-25 04:30:00+00' as timestamp)),
       (40032608, cast('2025-10-09 06:15:00+00' as timestamp)), (40493381,
cast('2025-10-23 06:15:00+00' as timestamp)),
       (39501929, cast('2025-09-23 06:15:00+00' as timestamp)), (39637932,
cast('2025-09-27 06:15:00+00' as timestamp)),
       (39036136, cast('2025-09-09 04:30:00+00' as timestamp)), (40654030,
cast('2025-10-28 06:15:00+00' as timestamp)),
       (39404784, cast('2025-09-20 04:30:00+00' as timestamp)), (40192756,
cast('2025-10-14 04:30:00+00' as timestamp)),
       (39499547, cast('2025-09-23 04:30:00+00' as timestamp)), (40422763,
cast('2025-10-21 04:30:00+00' as timestamp)),
       (40788848, cast('2025-11-01 06:15:00+00' as timestamp)), (39961091,
cast('2025-10-07 04:30:00+00' as timestamp)),
       (40100038, cast('2025-10-11 06:15:00+00' as timestamp)), (40263876,
cast('2025-10-16 06:15:00+00' as timestamp)),
       (40788851, cast('2025-11-01 06:15:00+00' as timestamp)), (39267833,
cast('2025-09-16 04:30:00+00' as timestamp)));


Re: BUG #19108: Stack overflow duting query parse

От
Heikki Linnakangas
Дата:
On 10/11/2025 13:13, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      19108
> Logged by:          Andrey Zhidenkov
> Email address:      pensnarik@gmail.com
> PostgreSQL version: 15.7
> Operating system:   Oracle Linux 8.4

15.7 is quite outdated, please upgrade to the latest minor version.

That said, I don't see any changes between 15.7 and 15.13 in the 
relevant code, so it probably won't make a difference. You're missing 
out on a bunch of other important fixes though.

> Description:
> 
> Hello everyone.
> 
> I had a stack overflow on my PostgreSQL production installation during query
> parsing:
> 
> 2025-11-05 19:04:58.749 +07 [477608] LOG:  server process (PID 633135) was
> terminated by signal 11: Segmentation fault
> 
> execution stack (from coredump):
> 
> (lines from 0 to 1021 are nested calls of assign_collations_walker ->
> expression_tree_walker.part)
> #1022 0x00000000006077d4 in assign_collations_walker ()
> #1023 0x00000000007099d3 in expression_tree_walker.part ()
> #1024 0x00000000006077d4 in assign_collations_walker ()
> #1025 0x00000000007099d3 in expression_tree_walker.part ()
> #1026 0x00000000006077d4 in assign_collations_walker ()
> #1027 0x00000000007099d3 in expression_tree_walker.part ()
> #1028 0x00000000006077d4 in assign_collations_walker ()
> #1029 0x00000000007099d3 in expression_tree_walker.part ()
> #1030 0x00000000006077d4 in assign_collations_walker ()
> #1031 0x00000000007099d3 in expression_tree_walker.part ()
> #1032 0x00000000006077d4 in assign_collations_walker ()
> #1033 0x00000000007099d3 in expression_tree_walker.part ()
> #1034 0x00000000006077d4 in assign_collations_walker ()
> #1035 0x00000000007099d3 in expression_tree_walker.part ()
> --Type <RET> for more, q to quit, c to continue without paging--
> #1036 0x00000000006077d4 in assign_collations_walker ()
> #1037 0x00000000007099d3 in expression_tree_walker.part ()
> #1038 0x00000000006077d4 in assign_collations_walker ()
> #1039 0x00000000007099d3 in expression_tree_walker.part ()
> #1040 0x00000000006077d4 in assign_collations_walker ()
> #1041 0x00000000006073bb in assign_collations_walker ()
> #1042 0x0000000000607cfd in assign_query_collations_walker ()
> #1043 0x000000000070a008 in query_tree_walker ()
> #1044 0x00000000005e4e86 in transformStmt ()
> #1045 0x00000000005e6ea1 in parse_analyze_varparams ()
> #1046 0x000000000080d67f in pg_analyze_and_rewrite_varparams ()
> #1047 0x000000000080e6f1 in PostgresMain ()
> #1048 0x000000000078eec0 in ServerLoop ()
> #1049 0x000000000078fe74 in PostmasterMain ()
> #1050 0x0000000000504d6d in main ()

Weird, there is a stack-depth check in expression_tree_walker() which 
should turn this into a graceful error.

Can you create a self-contained SQL script to reproduce this, and post 
it on this thread with reply-all, please?

- Heikki




Re: BUG #19108: Stack overflow duting query parse

От
Andrey Zhidenkov
Дата:
> 15.7 is quite outdated, please upgrade to the latest minor version.

Thanks. I already upgraded the database cluster to 15.14 and switched primary to one of the new nodes.

> Can you create a self-contained SQL script to reproduce this, and post it on this thread with reply-all, please?

Unfortunately, I cannot reproduce the issue even on an existing 15.7 node (one that hasn't upgraded) (don't know if it matters, but now it's a hot standby node) but there are core dumps and debug info packages installed so I can provide more data from core dump.

P.S. We also added RAM to all cluster nodes since there were "out of memory" errors as well.

On Mon, Nov 10, 2025 at 3:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 10/11/2025 13:13, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      19108
> Logged by:          Andrey Zhidenkov
> Email address:      pensnarik@gmail.com
> PostgreSQL version: 15.7
> Operating system:   Oracle Linux 8.4

15.7 is quite outdated, please upgrade to the latest minor version.

That said, I don't see any changes between 15.7 and 15.13 in the
relevant code, so it probably won't make a difference. You're missing
out on a bunch of other important fixes though.

> Description:
>
> Hello everyone.
>
> I had a stack overflow on my PostgreSQL production installation during query
> parsing:
>
> 2025-11-05 19:04:58.749 +07 [477608] LOG:  server process (PID 633135) was
> terminated by signal 11: Segmentation fault
>
> execution stack (from coredump):
>
> (lines from 0 to 1021 are nested calls of assign_collations_walker ->
> expression_tree_walker.part)
> #1022 0x00000000006077d4 in assign_collations_walker ()
> #1023 0x00000000007099d3 in expression_tree_walker.part ()
> #1024 0x00000000006077d4 in assign_collations_walker ()
> #1025 0x00000000007099d3 in expression_tree_walker.part ()
> #1026 0x00000000006077d4 in assign_collations_walker ()
> #1027 0x00000000007099d3 in expression_tree_walker.part ()
> #1028 0x00000000006077d4 in assign_collations_walker ()
> #1029 0x00000000007099d3 in expression_tree_walker.part ()
> #1030 0x00000000006077d4 in assign_collations_walker ()
> #1031 0x00000000007099d3 in expression_tree_walker.part ()
> #1032 0x00000000006077d4 in assign_collations_walker ()
> #1033 0x00000000007099d3 in expression_tree_walker.part ()
> #1034 0x00000000006077d4 in assign_collations_walker ()
> #1035 0x00000000007099d3 in expression_tree_walker.part ()
> --Type <RET> for more, q to quit, c to continue without paging--
> #1036 0x00000000006077d4 in assign_collations_walker ()
> #1037 0x00000000007099d3 in expression_tree_walker.part ()
> #1038 0x00000000006077d4 in assign_collations_walker ()
> #1039 0x00000000007099d3 in expression_tree_walker.part ()
> #1040 0x00000000006077d4 in assign_collations_walker ()
> #1041 0x00000000006073bb in assign_collations_walker ()
> #1042 0x0000000000607cfd in assign_query_collations_walker ()
> #1043 0x000000000070a008 in query_tree_walker ()
> #1044 0x00000000005e4e86 in transformStmt ()
> #1045 0x00000000005e6ea1 in parse_analyze_varparams ()
> #1046 0x000000000080d67f in pg_analyze_and_rewrite_varparams ()
> #1047 0x000000000080e6f1 in PostgresMain ()
> #1048 0x000000000078eec0 in ServerLoop ()
> #1049 0x000000000078fe74 in PostmasterMain ()
> #1050 0x0000000000504d6d in main ()

Weird, there is a stack-depth check in expression_tree_walker() which
should turn this into a graceful error.

Can you create a self-contained SQL script to reproduce this, and post
it on this thread with reply-all, please?

- Heikki



--
С уважением, Андрей Жиденков.

Re: BUG #19108: Stack overflow duting query parse

От
Tom Lane
Дата:
Andrey Zhidenkov <pensnarik@gmail.com> writes:
>> Can you create a self-contained SQL script to reproduce this, and post it
>> on this thread with reply-all, please?

> Unfortunately, I cannot reproduce the issue even on an existing 15.7
> node (one that hasn't upgraded) (don't know if it matters, but now it's a
> hot standby node) but there are core dumps and debug info packages
> installed so I can provide more data from core dump.

Could you at least provide a not-mangled-to-the-point-of-incorrectness
version of the query?  The posted one fails with

ERROR:  table name "tablename_v2" specified more than once

and there are other visible bugs in it.  Some clarity about the column
data types would be helpful as well.

            regards, tom lane



Re: BUG #19108: Stack overflow duting query parse

От
Andrey Zhidenkov
Дата:
> Could you at least provide a not-mangled-to-the-point-of-incorrectness
version of the query?

Sure, here it is:

select "public"."dtn_v2"."uuid", "public"."dtn_v2"."first_shipment_date_route_original_departure_plan_date"
from "public"."dtn_v2"
         join "public"."shipment_v2" on ("public"."shipment_v2"."uuid" = "public"."dtn_v2"."first_shipment_uuid" and
                                         "public"."shipment_v2"."date_route_original_departure_plan_date" =
                                         "public"."dtn_v2"."first_shipment_date_route_original_departure_plan_date")
where ("public"."shipment_v2"."shipment_node_from_id", "public"."shipment_v2"."date_route_original_departure_plan_date") in
      ((39073021, cast('2025-09-10 06:15:00+00' as timestamp)), (40493380, cast('2025-10-23 06:15:00+00' as timestamp)),
       (38609801, cast('2025-08-27 04:30:00+00' as timestamp)), (39038517, cast('2025-09-09 06:15:00+00' as timestamp)),
       ...
       (40788851, cast('2025-11-01 06:15:00+00' as timestamp)), (39267833, cast('2025-09-16 04:30:00+00' as timestamp)));

BTW, there were "out of memory errors" before the segmentation fault. For instance:

could not fork autovacuum worker process: Cannot allocate memory

But I didn't think memory exhaustion could lead to errors like "segmentation fault".

Also "dnf install ..." command was also core dumped so maybe some shared memory segments were corrupted.

Thanks!

On Mon, Nov 10, 2025 at 7:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Zhidenkov <pensnarik@gmail.com> writes:
>> Can you create a self-contained SQL script to reproduce this, and post it
>> on this thread with reply-all, please?

> Unfortunately, I cannot reproduce the issue even on an existing 15.7
> node (one that hasn't upgraded) (don't know if it matters, but now it's a
> hot standby node) but there are core dumps and debug info packages
> installed so I can provide more data from core dump.

Could you at least provide a not-mangled-to-the-point-of-incorrectness
version of the query?  The posted one fails with

ERROR:  table name "tablename_v2" specified more than once

and there are other visible bugs in it.  Some clarity about the column
data types would be helpful as well.

                        regards, tom lane


--
С уважением, Андрей Жиденков.

Re: BUG #19108: Stack overflow duting query parse

От
Tom Lane
Дата:
Andrey Zhidenkov <pensnarik@gmail.com> writes:
>> Could you at least provide a not-mangled-to-the-point-of-incorrectness
> version of the query?

> Sure, here it is:
> ...

Thanks.  I made this into a full-size test case by just duplicating
one of the lines 1000 times.  (The parser is not going to notice
whether the values are equal, though possibly we'd detect that later.)
We do produce a deeply nested boolean OR tree from this WHERE
condition, but I don't see any indication that that should be a
problem.  It works just fine for me at the default max_stack_depth,
and even down to max_stack_depth = '300kB', and below that I get a
clean failure:

ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 200kB), after ensuring the platform's stack
depthlimit is adequate. 

So I seriously doubt that Postgres is doing anything wrong.  It
looks to me like your machine is/was under serious memory pressure
and that was causing the kernel to do strange things like refuse
to expand the stack as much as was promised by getrlimit().
You might want to compare the swap and OOM-killer configuration on
the problematic machine to ones that are okay.

Another, probably more far-fetched theory is some kind of hardware
issue on that machine --- bad RAM or the like.

            regards, tom lane