Обсуждение: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

Поиск
Список
Период
Сортировка

BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      19078
Logged by:          Yuri Zamyatin
Email address:      yuri@yrz.am
PostgreSQL version: 18.0
Operating system:   Debian 13.1
Description:

Hello. We are encountering segfaults from tts_minimal_store_tuple() after
upgrade. You may find the stack trace at the end of this message.

Postgresql:  PostgreSQL 18.0 (Debian 18.0-1.pgdg13+3) on
x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
Kernel: Linux 6.12.48+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.48-1
(2025-09-20) x86_64 GNU/Linux
OS: Debian 13.1 from deb.debian.org trixie, trixie-updates, trixie-security
(latest)

PostgreSQL client backend crashes with segfault (signal 11) intermittently
when executing SELECT or UPDATE query with the following circumstances:

- There is a set of queries that run into segfault. Noticeably they all do a
lookup on partitioned table with pruning (100+ partitions).
- Occurs across multiple machines (same OS and Postgres version) that handle
many connections and went through pg_upgrade.
- Interval between segfaults varies from dozens of minutes to days depending
on the size/load/configuration of the cluster.
- Happens randomly, most of the times these queries finish successfully, so
we're unable to reproduce the error in a consistent manner.
- Some of the problematic queries run based on the fixed schedule, which
means each run is more likely to fail in larger clusters.

The issue appeared after migration from pg17 (latest in pgdg) to pg18 pgdg
via pg_upgradecluster --method link.
Shortly before that, OS was upgraded from Debian 12 to Debian 13 with the
corresponding change of pgdg apt sources.
Postgresql 17 cluster was shut down during this time.
Right after cluster upgrade we updated all extensions, ran vacuumdb
--analyze-in-stages and reindexed all text-based indexes as expected.

Segmentation faults appeared after 1-5 days.

Trying to find a workaround, we did the following:

- Disabled huge pages
- Reduced checkpoint_timeout from 60min to 5min, reduced wal_max_size
- Disabled jit
- Set io_method to sync (io_uring was much slower under our workload)
- Ran REINDEX SYSTEM in each database
- Reindexed all databases
- Ran pg_repack on tables (with their children) mentioned in the problematic
queries
- Ran pg_amcheck on each database with default parameters, no corruption was
found
- Disabled enable_hashagg for some queries (just now)

Segmentation faults still happen on the same tables but less frequently.
For the cluster  with 100+ concurrent connections, 225G shared_buffers, 2000
max_connections, 256cpu,
number of crashes decreased from 30 to 8 times a day.

Interval between segfaults may be related to checkpoint_timeout.
Previously, that server used to crash every 60 minutes, and now there are
series of crashes with 5-10 min gap in them.
We could not reproduce the crash invoking CHECKPOINT manually.

Below is the query with the simplest plan out of those which crash database
(if i relaunch that query).
Although segfaults from it happen rarely and we don't have a coredump for
that yet.

> Update on tcv_scenes cs  (cost=1760.81..404518.63 rows=343 width=36)
(actual time=2358.546..2386.729 rows=1.00 loops=1)
>    Buffers: shared hit=19823 read=138644 dirtied=20
>    ->  Nested Loop  (cost=1760.81..404518.63 rows=343 width=36) (actual
time=2344.746..2372.927 rows=1.00 loops=1)
>          Buffers: shared hit=12241 read=138641 dirtied=1
>          ->  Bitmap Heap Scan on tcv_scenes cs  (cost=1760.39..209679.79
rows=346 width=38) (actual time=2344.280..2372.423 rows=1.00 loops=1)
>                Recheck Cond: ((state_id = 7) OR (state_id = 3))
>                Filter: (((state_id = 7) AND (date_cr < (now() -
'24:00:00'::interval)) AND (date_state_mo > (now() - '00:15:00'::interval)))
OR ((state_id = 3) AND (date_state_mo < (now() - '00:05:00'::interval))))
>                Rows Removed by Filter: 221134
>                Heap Blocks: exact=150638
>                Buffers: shared hit=12237 read=138641 dirtied=1
>                ->  BitmapOr  (cost=1760.39..1760.39 rows=210544 width=0)
(actual time=43.601..43.603 rows=0.00 loops=1)
>                      Buffers: shared hit=218 read=22
>                      ->  Bitmap Index Scan on icv_scenes__state
(cost=0.00..1755.70 rows=210151 width=0) (actual time=34.418..34.419
rows=221112.00 loops=1)
>                            Index Cond: (state_id = 7)
>                            Index Searches: 1
>                            Buffers: shared hit=194
>                      ->  Bitmap Index Scan on icv_scenes__state
(cost=0.00..4.51 rows=393 width=0) (actual time=9.181..9.181 rows=30759.00
loops=1)
>                            Index Cond: (state_id = 3)
>                            Index Searches: 1
>                            Buffers: shared hit=24 read=22
>          ->  Append  (cost=0.42..560.73 rows=239 width=658) (actual
time=0.094..0.128 rows=1.00 loops=1)
>                Buffers: shared hit=4
>                ->  Index Scan using tcv_scene_datas_0_pkey on
tcv_scene_datas_0 cd_1  (cost=0.42..2.32 rows=1 width=50) (never executed)
>                      Index Cond: (cv_scene_id = cs.id)
>                      Filter: (((cs.state_id = 7) AND (cs.date_cr < (now()
- '24:00:00'::interval)) AND (cs.date_state_mo > (now() -
'00:15:00'::interval)) AND ((stitcher_result)::text ~~ '%download%'::text))
OR ((cs.state_id = 3) AND (cs.date_state_mo < (now() -
'00:05:00'::interval))))
>                      Index Searches: 0
> ...<100+ partitions>...
>               ->  Index Scan using tcv_scene_datas_118500000_pkey on
tcv_scene_datas_118500000 cd_238  (cost=0.42..2.33 rows=1 width=1072)
(actual time=0.079..0.080 rows=1.00 loops=1)
>                      Index Cond: (cv_scene_id = cs.id)
>                      Filter: (((cs.state_id = 7) AND (cs.date_cr < (now()
- '24:00:00'::interval)) AND (cs.date_state_mo > (now() -
'00:15:00'::interval)) AND ((stitcher_result)::text ~~ '%download%'::text))
OR ((cs.state_id = 3) AND (cs.date_state_mo < (now() -
'00:05:00'::interval))))
>                      Index Searches: 1
>                      Buffers: shared hit=4
>                ->  Seq Scan on tcv_scene_datas_119000000 cd_239
(cost=0.00..0.00 rows=1 width=50) (never executed)
>                      Filter: ((cv_scene_id = cs.id) AND (((cs.state_id =
7) AND (cs.date_cr < (now() - '24:00:00'::interval)) AND (cs.date_state_mo >
(now() - '00:15:00'::interval)) AND ((stitcher_result)::text ~~
'%download%'::text)) OR ((cs.state_id = 3) AND (cs.date_state_mo < (now() -
'00:05:00'::interval)))))
>  Planning:
>    Buffers: shared hit=15775
>  Planning Time: 57.800 ms
>  Trigger for constraint tcv_scenes_new_state_id_fkey: time=0.965 calls=1
>  Execution Time: 2395.941 ms
> (982 rows)

More frequently segfaults occur from queries with complex plans (many levels
of aggregation, subqueries and window functions).
Below is an example. We could not find simple reproduction for that.

Overriden in postgresql.conf for that cluster:

> postgresql_effective_cache_size = 560GB
> postgresql_shared_buffers = 225GB
> temp_buffers = 128MB
> work_mem = 2GB
> maintenance_work_mem = 512MB
> vacuum_buffer_usage_limit = 128MB
> max_connections = 2000
> max_parallel_workers_per_gather = 8
> max_parallel_workers = 16
> max_parallel_maintenance_workers = 8
> max_locks_per_transaction = 128
> huge_pages = off
> io_method = sync
> file_copy_method = clone
> effective_io_concurrency = 512
> random_page_cost = 1.0
> temp_file_limit = 100GB
> wal_level = minimal
> max_wal_senders = 0
> wal_buffers = 128MB
> default_statistics_target = 1000
> checkpoint_timeout = 5min
> min_wal_size = 3GB"
> max_wal_size = 3GB"

Server log:

> 2025-10-08 10:36:24 UTC    LOG:  00000: client backend (PID 2380761) was
terminated by signal 11: Segmentation fault
> 2025-10-08 10:36:24 UTC    DETAIL:  Failed process was running: <query>
> 2025-10-08 10:36:24 UTC    LOCATION:  LogChildExit, postmaster.c:2853
> 2025-10-08 10:36:24 UTC    LOG:  00000: terminating any other active
server processes

Dmesg:

> [126364.743906] postgres[2380761]: segfault at 1b ip 0000555fe855f1c1 sp
00007ffe304155a0 error 4 in postgres[3531c1,555fe82f0000+5f3000] likely on
CPU 122 (core 58, socket 1)
> [126364.743931] Code: c9 31 d2 4c 89 63 48 66 89 4b 34 89 c1 49 83 ec 08
66 89 43 04 83 c9 04 66 89 53 06 c7 43 30 ff ff ff ff c7 43 68 00 00 00 00
<41> 8b 74 24 08 45 84 ed 0f 45 c1 4c 89 63 60 8d 56 08 66 89 43 04

Core:

> #0  tts_minimal_store_tuple (slot=0x55601c765bb0, mtup=0x1b,
shouldFree=false) at ./build/../src/backend/executor/execTuples.c:697
>         mslot = 0x55601c765bb0
>         mslot = <optimized out>
> #1  ExecStoreMinimalTuple (mtup=0x1b, slot=slot@entry=0x55601c765bb0,
shouldFree=shouldFree@entry=false) at
./build/../src/backend/executor/execTuples.c:1648
>         __func__ = "ExecStoreMinimalTuple"
>         __errno_location = <optimized out>
> #2  0x0000555fe8566ec2 in agg_retrieve_hash_table_in_memory
(aggstate=aggstate@entry=0x55601c7567d0) at
./build/../src/include/executor/executor.h:176
>         hashslot = 0x55601c765bb0
>         hashtable = 0x55601c182ac8
>         i = <optimized out>
>         econtext = 0x55601c756f00
>         peragg = 0x55601c765198
>         pergroup = <optimized out>
>         entry = 0x55601c182e48
>         firstSlot = 0x55601c763e48
>         result = <optimized out>
>         perhash = 0x55601c764e50
> #3  0x0000555fe8567ac8 in agg_retrieve_hash_table (aggstate=<optimized
out>) at ./build/../src/backend/executor/nodeAgg.c:2841
>         result = 0x0
>         result = <optimized out>
> #4  ExecAgg (pstate=0x55601c7567d0) at
./build/../src/backend/executor/nodeAgg.c:2261
>         node = 0x55601c7567d0
>         result = 0x0
> #5  0x0000555fe858c959 in ExecProcNode (node=0x55601c7567d0) at
./build/../src/include/executor/executor.h:315
> No locals.
> #6  spool_tuples (winstate=winstate@entry=0x55601c7561b8,
pos=pos@entry=57) at ./build/../src/backend/executor/nodeWindowAgg.c:1326
>         node = 0x55601eb9add8
>         outerPlan = 0x55601c7567d0
>         outerslot = <optimized out>
>         oldcontext = 0x55601b1a5fb0
> #7  0x0000555fe858cb20 in window_gettupleslot
(winobj=winobj@entry=0x55601c76b028, pos=57, slot=slot@entry=0x55601c766a20)
>     at ./build/../src/backend/executor/nodeWindowAgg.c:3145
>         winstate = 0x55601c7561b8
>         oldcontext = <optimized out>
>         __func__ = "window_gettupleslot"
> #8  0x0000555fe858ec94 in eval_windowaggregates (winstate=0x55601c7561b8)
at ./build/../src/backend/executor/nodeWindowAgg.c:936
>         ret = <optimized out>
>         aggregatedupto_nonrestarted = 0
>         econtext = 0x55601c7566c8
>         agg_row_slot = <optimized out>
>         peraggstate = <optimized out>
>         numaggs = <optimized out>
>         wfuncno = <optimized out>
>         numaggs_restart = <optimized out>
>         i = <optimized out>
>         oldContext = <optimized out>
>         agg_winobj = 0x55601c76b028
>         temp_slot = 0x55601c766b28
>         peraggstate = <optimized out>
>         wfuncno = <optimized out>
>         numaggs = <optimized out>
>         numaggs_restart = <optimized out>
>         i = <optimized out>
>         aggregatedupto_nonrestarted = <optimized out>
>         oldContext = <optimized out>
>         econtext = <optimized out>
>         agg_winobj = <optimized out>
>         agg_row_slot = <optimized out>
>         temp_slot = <optimized out>
>         __func__ = "eval_windowaggregates"
>         next_tuple = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
>         ok = <optimized out>
>         ret = <optimized out>
>         result = <optimized out>
>         isnull = <optimized out>
> #9  ExecWindowAgg (pstate=0x55601c7561b8) at
./build/../src/backend/executor/nodeWindowAgg.c:2300
>         winstate = 0x55601c7561b8
>         slot = <optimized out>
>         econtext = <optimized out>
>         i = <optimized out>
>         numfuncs = <optimized out>
>         __func__ = "ExecWindowAgg"
> #10 0x0000555fe856476c in ExecProcNode (node=0x55601c7561b8) at
./build/../src/include/executor/executor.h:315
> No locals.
> #11 fetch_input_tuple (aggstate=aggstate@entry=0x55601c755a88) at
./build/../src/backend/executor/nodeAgg.c:563
>         slot = <optimized out>
> #12 0x0000555fe8567ca9 in agg_retrieve_direct (aggstate=0x55601c755a88) at
./build/../src/backend/executor/nodeAgg.c:2450
>         econtext = 0x55601c7560b0
>         firstSlot = 0x55601c76b070
>         numGroupingSets = 1
>         node = 0x55601eb98630
>         tmpcontext = <optimized out>
>         peragg = 0x55601c76c218
>         outerslot = <optimized out>
>         nextSetSize = <optimized out>
>         pergroups = 0x55601c76d628
>         result = <optimized out>
>         hasGroupingSets = false
>         currentSet = <optimized out>
>         numReset = 1
>         i = <optimized out>
>         node = <optimized out>
>         econtext = <optimized out>
>         tmpcontext = <optimized out>
>         peragg = <optimized out>
>         pergroups = <optimized out>
>         outerslot = <optimized out>
>         firstSlot = <optimized out>
>         result = <optimized out>
>         hasGroupingSets = <optimized out>
>         numGroupingSets = <optimized out>
>         currentSet = <optimized out>
>         nextSetSize = <optimized out>
>         numReset = <optimized out>
>         i = <optimized out>
> #13 ExecAgg (pstate=0x55601c755a88) at
./build/../src/backend/executor/nodeAgg.c:2265
>         node = 0x55601c755a88
>         result = 0x0
> #14 0x0000555fe85877aa in ExecProcNode (node=<optimized out>) at
./build/../src/include/executor/executor.h:315
> No locals.
> #15 ExecScanSubPlan (node=0x55601f0e9610, econtext=0x55601ef91550,
isNull=0x55601ef8f395) at ./build/../src/backend/executor/nodeSubplan.c:275
>         subplan = <optimized out>
>         oldcontext = 0x55601f0e9610
>         slot = <optimized out>
>         astate = 0x0
>         planstate = <optimized out>
>         subLinkType = EXPR_SUBLINK
>         result = 0
>         found = false
>         l = <optimized out>
>         subplan = <optimized out>
>         planstate = <optimized out>
>         subLinkType = <optimized out>
>         oldcontext = <optimized out>
>         slot = <optimized out>
>         result = <optimized out>
>         found = <optimized out>
>         l = <optimized out>
>         astate = <optimized out>
>         __func__ = "ExecScanSubPlan"
>         l__state = <optimized out>
>         paramid = <optimized out>
>         tdesc = <error reading variable tdesc (Cannot access memory at
address 0x0)>
>         rowresult = <optimized out>
>         rownull = <optimized out>
>         col = <optimized out>
>         plst = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
>         plst__state = <optimized out>
>         paramid = <optimized out>
>         prmdata = <optimized out>
>         dvalue = <optimized out>
>         disnull = <optimized out>
>         __errno_location = <optimized out>
>         plst__state = <optimized out>
>         paramid = <optimized out>
>         prmdata = <optimized out>
>         l__state = <optimized out>
>         paramid = <optimized out>
>         prmdata = <optimized out>
> #16 ExecSubPlan (node=node@entry=0x55601ef91550,
econtext=econtext@entry=0x55601ef58798, isNull=0x55601ef8f395) at
./build/../src/backend/executor/nodeSubplan.c:89
>         subplan = <optimized out>
>         estate = 0x55601b1a60a8
>         dir = ForwardScanDirection
>         retval = <optimized out>
>         __func__ = "ExecSubPlan"
> #17 0x0000555fe854d169 in ExecEvalSubPlan (state=<optimized out>,
op=<optimized out>, econtext=0x55601ef58798) at
./build/../src/backend/executor/execExprInterp.c:5316
>         sstate = 0x55601ef91550
>         sstate = <optimized out>
> #18 ExecInterpExpr (state=0x55601ef8f390, econtext=0x55601ef58798,
isnull=<optimized out>) at
./build/../src/backend/executor/execExprInterp.c:2001
>         op = <optimized out>
>         resultslot = 0x55601ef8f180
>         innerslot = <optimized out>
>         outerslot = <optimized out>
>         scanslot = <optimized out>
>         oldslot = <optimized out>
>         newslot = <optimized out>
>         dispatch_table = {0x555fe854d9ce <ExecInterpExpr+4366>,
0x555fe854d9a3 <ExecInterpExpr+4323>, 0x555fe854d986 <ExecInterpExpr+4294>,
>           0x555fe854d969 <ExecInterpExpr+4265>, 0x555fe854d94c
<ExecInterpExpr+4236>, 0x555fe854d92f <ExecInterpExpr+4207>, 0x555fe854d90f
<ExecInterpExpr+4175>,
>           0x555fe854d8e0 <ExecInterpExpr+4128>, 0x555fe854d8b1
<ExecInterpExpr+4081>, 0x555fe854d882 <ExecInterpExpr+4034>, 0x555fe854d853
<ExecInterpExpr+3987>,
>           0x555fe854d821 <ExecInterpExpr+3937>, 0x555fe854d805
<ExecInterpExpr+3909>, 0x555fe854d7e9 <ExecInterpExpr+3881>, 0x555fe854d7cd
<ExecInterpExpr+3853>,
>           0x555fe854db2b <ExecInterpExpr+4715>, 0x555fe854db0c
<ExecInterpExpr+4684>, 0x555fe854daf4 <ExecInterpExpr+4660>, 0x555fe854dabf
<ExecInterpExpr+4607>,
>           0x555fe854da8a <ExecInterpExpr+4554>, 0x555fe854da55
<ExecInterpExpr+4501>, 0x555fe854da20 <ExecInterpExpr+4448>, 0x555fe854d9e8
<ExecInterpExpr+4392>,
>           0x555fe854dbca <ExecInterpExpr+4874>, 0x555fe854db91
<ExecInterpExpr+4817>, 0x555fe854db72 <ExecInterpExpr+4786>, 0x555fe854db47
<ExecInterpExpr+4743>,
>           0x555fe854d749 <ExecInterpExpr+3721>, 0x555fe854d729
<ExecInterpExpr+3689>, 0x555fe854d702 <ExecInterpExpr+3650>, 0x555fe854d6b0
<ExecInterpExpr+3568>,
>           0x555fe854de15 <ExecInterpExpr+5461>, 0x555fe854c985
<ExecInterpExpr+197>, 0x555fe854c990 <ExecInterpExpr+208>, 0x555fe854dddb
<ExecInterpExpr+5403>,
>           0x555fe854c94c <ExecInterpExpr+140>, 0x555fe854c957
<ExecInterpExpr+151>, 0x555fe854ddac <ExecInterpExpr+5356>, 0x555fe854dd92
<ExecInterpExpr+5330>,
>           0x555fe854dd5a <ExecInterpExpr+5274>, 0x555fe854dd47
<ExecInterpExpr+5255>, 0x555fe854de96 <ExecInterpExpr+5590>, 0x555fe854de76
<ExecInterpExpr+5558>,
>           0x555fe854de4c <ExecInterpExpr+5516>, 0x555fe854de2d
<ExecInterpExpr+5485>, 0x555fe854decd <ExecInterpExpr+5645>, 0x555fe854deb6
<ExecInterpExpr+5622>,
>           0x555fe854d7b9 <ExecInterpExpr+3833>, 0x555fe854d790
<ExecInterpExpr+3792>, 0x555fe854dcff <ExecInterpExpr+5183>, 0x555fe854dcd6
<ExecInterpExpr+5142>,
>           0x555fe854dcad <ExecInterpExpr+5101>, 0x555fe854dc71
<ExecInterpExpr+5041>, 0x555fe854dc59 <ExecInterpExpr+5017>, 0x555fe854dc43
<ExecInterpExpr+4995>,
>           0x555fe854dc17 <ExecInterpExpr+4951>, 0x555fe854dbf2
<ExecInterpExpr+4914>, 0x555fe854df3b <ExecInterpExpr+5755>, 0x555fe854dd28
<ExecInterpExpr+5224>,
>           0x555fe854def2 <ExecInterpExpr+5682>, 0x555fe854d69b
<ExecInterpExpr+3547>, 0x555fe854d663 <ExecInterpExpr+3491>, 0x555fe854d62b
<ExecInterpExpr+3435>,
>           0x555fe854d5a0 <ExecInterpExpr+3296>, 0x555fe854d58b
<ExecInterpExpr+3275>, 0x555fe833c321 <ExecInterpExpr.cold>, 0x555fe854d47f
<ExecInterpExpr+3007>,
>           0x555fe854d44b <ExecInterpExpr+2955>, 0x555fe854d436
<ExecInterpExpr+2934>, 0x555fe854d494 <ExecInterpExpr+3028>, 0x555fe854d404
<ExecInterpExpr+2884>,
>           0x555fe854d3cd <ExecInterpExpr+2829>, 0x555fe854d382
<ExecInterpExpr+2754>, 0x555fe854d355 <ExecInterpExpr+2709>, 0x555fe854d33d
<ExecInterpExpr+2685>,
>           0x555fe854d36a <ExecInterpExpr+2730>, 0x555fe854d325
<ExecInterpExpr+2661>, 0x555fe854d307 <ExecInterpExpr+2631>, 0x555fe854d2fe
<ExecInterpExpr+2622>,
>           0x555fe854c932 <ExecInterpExpr+114>, 0x555fe854c936
<ExecInterpExpr+118>, 0x555fe854d4f1 <ExecInterpExpr+3121>, 0x555fe854d4d1
<ExecInterpExpr+3089>,
>           0x555fe854d558 <ExecInterpExpr+3224>, 0x555fe854d543
<ExecInterpExpr+3203>, 0x555fe854d56f <ExecInterpExpr+3247>, 0x555fe854d2bf
<ExecInterpExpr+2559>,
>           0x555fe854d28c <ExecInterpExpr+2508>, 0x555fe854d257
<ExecInterpExpr+2455>, 0x555fe854d224 <ExecInterpExpr+2404>, 0x555fe854d2e6
<ExecInterpExpr+2598>,
>           0x555fe854d52e <ExecInterpExpr+3182>, 0x555fe854d516
<ExecInterpExpr+3158>, 0x555fe854d20f <ExecInterpExpr+2383>, 0x555fe854d1f7
<ExecInterpExpr+2359>,
>           0x555fe854d1e2 <ExecInterpExpr+2338>, 0x555fe854d1bf
<ExecInterpExpr+2303>, 0x555fe854d1a7 <ExecInterpExpr+2279>, 0x555fe854d125
<ExecInterpExpr+2149>,
>           0x555fe854d0fa <ExecInterpExpr+2106>, 0x555fe854d0e5
<ExecInterpExpr+2085>, 0x555fe854d0b2 <ExecInterpExpr+2034>, 0x555fe854d16e
<ExecInterpExpr+2222>,
>           0x555fe854d13a <ExecInterpExpr+2170>, 0x555fe854d186
<ExecInterpExpr+2246>, 0x555fe854c9c0 <ExecInterpExpr+256>, 0x555fe854d072
<ExecInterpExpr+1970>,
>           0x555fe854d051 <ExecInterpExpr+1937>, 0x555fe854d013
<ExecInterpExpr+1875>, 0x555fe854cfee <ExecInterpExpr+1838>, 0x555fe854cf2a
<ExecInterpExpr+1642>,
>           0x555fe854ce6f <ExecInterpExpr+1455>, 0x555fe854cdbe
<ExecInterpExpr+1278>, 0x555fe854ccb9 <ExecInterpExpr+1017>, 0x555fe854cbbf
<ExecInterpExpr+767>,
>           0x555fe854cab8 <ExecInterpExpr+504>, 0x555fe854ca98
<ExecInterpExpr+472>, 0x555fe854ca78 <ExecInterpExpr+440>, 0x555fe854ca48
<ExecInterpExpr+392>,
>           0x555fe854cba7 <ExecInterpExpr+743>, 0x555fe833c330
<ExecInterpExpr-2164112>}
> #19 0x0000555fe85664cf in ExecEvalExprNoReturn (state=0x55601ef8f390,
econtext=0x55601ef58798) at ./build/../src/include/executor/executor.h:419
>         retDatum = <optimized out>
>         retDatum = <optimized out>
> #20 ExecEvalExprNoReturnSwitchContext (state=0x55601ef8f390,
econtext=0x55601ef58798) at ./build/../src/include/executor/executor.h:460
>         oldContext = 0x55601b1a5fb0
>         oldContext = <optimized out>
> #21 ExecProject (projInfo=0x55601ef8f388) at
./build/../src/include/executor/executor.h:492
>         econtext = 0x55601ef58798
>         state = 0x55601ef8f390
>         slot = 0x55601ef8f180
> #22 project_aggregates (aggstate=<optimized out>) at
./build/../src/backend/executor/nodeAgg.c:1383
>         econtext = <optimized out>
> #23 project_aggregates (aggstate=<optimized out>) at
./build/../src/backend/executor/nodeAgg.c:1370
>         econtext = <optimized out>
> #24 0x0000555fe8567a79 in agg_retrieve_direct (aggstate=0x55601ef556c8) at
./build/../src/backend/executor/nodeAgg.c:2613
>         econtext = 0x55601ef58798
>         firstSlot = 0x55601ef8ef78
>         numGroupingSets = 1
>         node = <optimized out>
>         tmpcontext = <optimized out>
>         peragg = 0x55601ef8f8e0
>         outerslot = <optimized out>
>         nextSetSize = <optimized out>
>         pergroups = 0x55601ef8b9a0
>         result = <optimized out>
>         hasGroupingSets = false
>         currentSet = <optimized out>
>         numReset = <optimized out>
>         i = <optimized out>
>         node = <optimized out>
>         econtext = <optimized out>
>         tmpcontext = <optimized out>
>         peragg = <optimized out>
>         pergroups = <optimized out>
>         outerslot = <optimized out>
>         firstSlot = <optimized out>
>         result = <optimized out>
>         hasGroupingSets = <optimized out>
>         numGroupingSets = <optimized out>
>         currentSet = <optimized out>
>         nextSetSize = <optimized out>
>         numReset = <optimized out>
>         i = <optimized out>
> #25 ExecAgg (pstate=0x55601ef556c8) at
./build/../src/backend/executor/nodeAgg.c:2265
>         node = 0x55601ef556c8
>         result = 0x0
> #26 0x0000555fe855c23d in ExecScanFetch (node=<optimized out>,
epqstate=<optimized out>, accessMtd=<optimized out>, recheckMtd=<optimized
out>)
>     at ./build/../src/include/executor/execScan.h:126
> No locals.
> #27 ExecScanExtended (node=<optimized out>, accessMtd=0x555fe8588d50
<SubqueryNext>, recheckMtd=0x555fe8588d20 <SubqueryRecheck>, epqstate=0x0,
qual=0x0,
>     projInfo=0x55601ef9d680) at
./build/../src/include/executor/execScan.h:187
>         slot = <optimized out>
>         econtext = 0x55601ef58470
>         econtext = <optimized out>
>         slot = <optimized out>
> #28 ExecScan (node=0x55601ef58368, accessMtd=0x555fe8588d50
<SubqueryNext>, recheckMtd=0x555fe8588d20 <SubqueryRecheck>)
>     at ./build/../src/backend/executor/execScan.c:59
>         epqstate = 0x0
>         qual = 0x0
>         projInfo = 0x55601ef9d680
> #29 0x0000555fe8583f0e in ExecProcNode (node=0x55601ef58368) at
./build/../src/include/executor/executor.h:315
> No locals.
> #30 ExecNestLoop (pstate=<optimized out>) at
./build/../src/backend/executor/nodeNestloop.c:159
>         node = <optimized out>
>         nl = 0x55601b1224c8
>         innerPlan = 0x55601ef58368
>         outerPlan = <optimized out>
>         outerTupleSlot = <optimized out>
>         innerTupleSlot = <optimized out>
>         joinqual = <optimized out>
>         otherqual = <optimized out>
>         econtext = 0x55601eda5b60
>         lc = <optimized out>
> #31 0x0000555fe8586ce6 in ExecProcNode (node=0x55601eda5a58) at
./build/../src/include/executor/executor.h:315
> No locals.
> #32 ExecSort (pstate=0x55601eda5850) at
./build/../src/backend/executor/nodeSort.c:149
>         plannode = <optimized out>
>         outerNode = 0x55601eda5a58
>         tupDesc = <optimized out>
>         tuplesortopts = <optimized out>
>         node = 0x55601eda5850
>         estate = 0x55601b1a60a8
>         dir = ForwardScanDirection
>         tuplesortstate = 0x55601b15dfa8
>         slot = <optimized out>
> #33 0x0000555fe856476c in ExecProcNode (node=0x55601eda5850) at
./build/../src/include/executor/executor.h:315
> No locals.
> #34 fetch_input_tuple (aggstate=aggstate@entry=0x55601eda5130) at
./build/../src/backend/executor/nodeAgg.c:563
>         slot = <optimized out>
> #35 0x0000555fe8567ca9 in agg_retrieve_direct (aggstate=0x55601eda5130) at
./build/../src/backend/executor/nodeAgg.c:2450
>         econtext = 0x55601eda5748
>         firstSlot = 0x55601efa0970
>         numGroupingSets = 1
>         node = 0x55601b458fb8
>         tmpcontext = <optimized out>
>         peragg = 0x55601efa1f40
>         outerslot = <optimized out>
>         nextSetSize = <optimized out>
>         pergroups = 0x55601efa2148
>         result = <optimized out>
>         hasGroupingSets = false
>         currentSet = <optimized out>
>         numReset = 1
>         i = <optimized out>
>         node = <optimized out>
>         econtext = <optimized out>
>         tmpcontext = <optimized out>
>         peragg = <optimized out>
>         pergroups = <optimized out>
>         outerslot = <optimized out>
>         firstSlot = <optimized out>
>         result = <optimized out>
>         hasGroupingSets = <optimized out>
>         numGroupingSets = <optimized out>
>         currentSet = <optimized out>
>         nextSetSize = <optimized out>
>         numReset = <optimized out>
>         i = <optimized out>
> #36 ExecAgg (pstate=0x55601eda5130) at
./build/../src/backend/executor/nodeAgg.c:2265
>         node = 0x55601eda5130
>         result = 0x0
> #37 0x0000555fe8579bc9 in ExecProcNode (node=0x55601eda5130) at
./build/../src/include/executor/executor.h:315
> No locals.
> #38 ExecLimit (pstate=0x55601eda4e20) at
./build/../src/backend/executor/nodeLimit.c:95
>         node = 0x55601eda4e20
>         econtext = 0x55601eda5028
>         direction = <optimized out>
>         slot = <optimized out>
>         outerPlan = 0x55601eda5130
>         __func__ = "ExecLimit"
> #39 0x0000555fe855191b in ExecProcNode (node=0x55601eda4e20) at
./build/../src/include/executor/executor.h:315
> No locals.
> #40 ExecutePlan (queryDesc=0x55601b1a9f18, operation=CMD_SELECT,
sendTuples=true, numberTuples=0, direction=<optimized out>,
dest=0x55601aed55a0)
>     at ./build/../src/backend/executor/execMain.c:1697
>         estate = 0x55601b1a60a8
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         planstate = 0x55601eda4e20
>         current_tuple_count = 0
>         estate = <optimized out>
>         planstate = <optimized out>
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         current_tuple_count = <optimized out>
> #41 standard_ExecutorRun (queryDesc=0x55601b1a9f18, direction=<optimized
out>, count=0) at ./build/../src/backend/executor/execMain.c:366
>         estate = 0x55601b1a60a8
>         operation = CMD_SELECT
>         dest = 0x55601aed55a0
>         sendTuples = <optimized out>
>         oldcontext = 0x55601b0f7980
> #42 0x0000555fe872c2a7 in PortalRunSelect
(portal=portal@entry=0x55601afc2718, forward=forward@entry=true, count=0,
count@entry=9223372036854775807,
>     dest=dest@entry=0x55601aed55a0) at
./build/../src/backend/tcop/pquery.c:921
>         queryDesc = 0x55601b1a9f18
>         direction = <optimized out>
>         nprocessed = <optimized out>
>         __func__ = "PortalRunSelect"
> #43 0x0000555fe872d8a0 in PortalRun (portal=portal@entry=0x55601afc2718,
count=9223372036854775807, isTopLevel=isTopLevel@entry=true,
dest=dest@entry=0x55601aed55a0,
>     altdest=altdest@entry=0x55601aed55a0, qc=qc@entry=0x7ffe304161c0) at
./build/../src/backend/tcop/pquery.c:765
>         _save_exception_stack = 0x7ffe304162a0
>         _save_context_stack = 0x7ffe30416280
>         _local_sigjmp_buf = {{__jmpbuf = {3, -100083351355759034,
93871257954072, 93871256982944, 0, 0, -100083352486123962,
-6061881521657190842},
>             __mask_was_saved = 0, __saved_mask = {__val = {93870411175012,
1759917036, 832786, 140729708011496, 5232754935419077376, 140729708011568,
93870410174331,
>                 0, 93870411648949, 93871262790800, 52352, 93871256981664,
93870411676182, 140729708011568, 3, 140729708011568}}}}
>         _do_rethrow = <optimized out>
>         result = <optimized out>
>         nprocessed = <optimized out>
>         saveTopTransactionResourceOwner = 0x55601af25b28
>         saveTopTransactionContext = 0x55601afd83e0
>         saveActivePortal = 0x0
>         saveResourceOwner = 0x55601af25b28
>         savePortalContext = 0x0
>         saveMemoryContext = 0x55601aed50a0
>         __func__ = "PortalRun"
> #44 0x0000555fe872a65b in exec_execute_message (portal_name=0x55601aed5198
"", max_rows=<optimized out>) at ./build/../src/backend/tcop/postgres.c:2272
>         portal = 0x55601afc2718
>         sourceText = 0x55601b6f9160 "-- NO KILL
\nselect\n\n\tt.*\n\t\n\nfrom\n\n\t(select\t\n\t\treport_id,\n\t\tshop_id,\t\t\t\t\n\t\t\n\t\tmax(uncalc_cnt)
as uncalc_cnt,\n\t\tmax(bad_cnt) as bad_cnt,\n\t\tjsonb_agg(row_to_json(t.*)
order by ordering_path) as kpis,\n\t\tm"...
>         prepStmtName = 0x555fe88f7d3f "<unnamed>"
>         was_logged = false
>         cmdtaglen = 6
>         dest = DestRemoteExecute
>         completed = <optimized out>
>         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
>         portalParams = 0x55601b0f7a78
>         save_log_statement_stats = false
>         is_xact_command = false
>         msec_str =
"0\371\003\000\000\000\000\000\360cA0\376\177\000\000\000\000\000\000\000\000\000\000\265\217\212\350_U\000"
>         params_data = {portalName = 0x55601afc6100 "", params =
0x55601b0f7a78}
>         params_errcxt = {previous = 0x0, callback = 0x555fe85e7c30
<ParamsErrorCallback>, arg = 0x7ffe304161d0}
>         receiver = 0x55601aed55a0
>         execute_is_fetch = false
>         cmdtagname = <optimized out>
>         lc = <optimized out>
>         dest = <optimized out>
>         receiver = <optimized out>
>         portal = <optimized out>
>         completed = <optimized out>
>         qc = <optimized out>
>         sourceText = <optimized out>
>         prepStmtName = <optimized out>
>         portalParams = <error reading variable portalParams (Cannot access
memory at address 0x0)>
>         save_log_statement_stats = <optimized out>
>         is_xact_command = <optimized out>
>         execute_is_fetch = <optimized out>
>         was_logged = <optimized out>
>         msec_str = <optimized out>
>         params_data = <optimized out>
>         params_errcxt = <optimized out>
>         cmdtagname = <optimized out>
>         cmdtaglen = <optimized out>
>         lc = <optimized out>
>         __func__ = "exec_execute_message"
>         __errno_location = <optimized out>
>         lc__state = <optimized out>
>         stmt = <optimized out>
>         lc__state = <optimized out>
>         stmt = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
>

__
Best wishes,
Yuri Zamyatin


Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Jeff Davis
Дата:
On Thu, 2025-10-09 at 08:35 +0000, PG Bug reporting form wrote:
> - Disabled enable_hashagg for some queries (just now)

Hi,

How sure are you that the crashes are happening without HashAgg? I made
some changes in that area in v18, and a lot of the evidence points in
that direction. Please let me know if you are able to confirm that the
simpler (non-hashagg) plan you showed is actually crashing.

Regards,
    Jeff Davis




Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Yuri Zamyatin
Дата:
Hi. I was able to reproduce the crash with the simpler (non hash-agg) plan from the previous message.
Basically I launched it in multiple infinite loops that do BEGIN - UPDATE - ROLLBACK. Other clients could also modify
thetables during this time.
 
We've seen this query crash on multiple physical hosts.

Original query:

> update Tcv_scenes cs
> set
>     state_id=2,
>     stitching_server_id=null,
>     stitching_server_pid=null
> from
>     tcv_scene_datas cd -- Partition key: RANGE (cv_scene_id)
> where
>     cd.cv_scene_id=cs.id and
>     (
>         (cs.state_id=7 and cs.date_cr<now()-interval '24 hours' and cs.date_state_mo>now()-interval '15 minutes' and
cd.stitcher_result::textlike '%download%') or
 
>         (cs.state_id=3 and cs.date_state_mo<now()-interval '5 minutes')
>     )
> returning cs.id

GDB Stack Trace:

> #0  0x0000555fe8678300 in PartitionDirectoryLookup (pdir=0x0, rel=0x7f14d172b288) at
./build/../src/backend/partitioning/partdesc.c:462
>         pde = <optimized out>
>         relid = 21856
>         found = 27
> #1  0x0000555fe8558b51 in InitExecPartitionPruneContexts (prunestate=<optimized out>, parent_plan=0x55601c213448,
initially_valid_subplans=<optimizedout>,
 
>     n_total_subplans=<optimized out>) at ./build/../src/backend/executor/execPartition.c:2413
>         partkey = 0x55601b0244c0
>         partdesc = <optimized out>
>         pprune = <optimized out>
>         nparts = 239
>         k = <optimized out>
>         prunedata = 0x55601b7ea748
>         j = <optimized out>
>         estate = <optimized out>
>         new_subplan_indexes = <optimized out>
>         new_other_subplans = <optimized out>
>         i = 0
>         newidx = <optimized out>
>         fix_subplan_map = <optimized out>
>         estate = <optimized out>
>         new_subplan_indexes = <optimized out>
>         new_other_subplans = <error reading variable new_other_subplans (Cannot access memory at address 0x0)>
>         i = <optimized out>
>         newidx = <optimized out>
>         fix_subplan_map = <optimized out>
>         prunedata = <error reading variable prunedata (Cannot access memory at address 0x0)>
>         j = <optimized out>
>         pprune = <optimized out>
>         nparts = <optimized out>
>         k = <optimized out>
>         partkey = <optimized out>
>         partdesc = <optimized out>
>         oldidx = <optimized out>
>         subidx = <optimized out>
>         subprune = <optimized out>
> #2  ExecInitPartitionExecPruning (planstate=planstate@entry=0x55601c213448, n_total_subplans=<optimized out>,
part_prune_index=<optimizedout>, relids=<optimized out>,
 
>     initially_valid_subplans=initially_valid_subplans@entry=0x7ffe30415500) at
./build/../src/backend/executor/execPartition.c:1934
>         prunestate = <optimized out>
>         estate = <optimized out>
>         pruneinfo = <optimized out>
>         __func__ = "ExecInitPartitionExecPruning"
> #3  0x0000555fe856b030 in ExecInitAppend (node=node@entry=0x55601b8531f8, estate=estate@entry=0x55601c1fb0d8,
eflags=eflags@entry=0)
>     at ./build/../src/backend/executor/nodeAppend.c:147
>         prunestate = <optimized out>
>         appendstate = 0x55601c213448
>         appendplanstates = <optimized out>
>         appendops = <optimized out>
>         validsubplans = 0x55601c213650
>         asyncplans = <optimized out>
>         nplans = <optimized out>
>         nasyncplans = <optimized out>
>         firstvalid = <optimized out>
>         i = <optimized out>
>         j = <optimized out>
> #4  0x0000555fe8559ad5 in ExecInitNode (node=0x55601b8531f8, estate=estate@entry=0x55601c1fb0d8, eflags=0) at
./build/../src/backend/executor/execProcnode.c:182
>         result = <optimized out>
>         subps = <optimized out>
>         l = <optimized out>
>         __func__ = "ExecInitNode"
> #5  0x0000555fe8584383 in ExecInitNestLoop (node=node@entry=0x55601b725a68, estate=estate@entry=0x55601c1fb0d8,
eflags=<optimizedout>, eflags@entry=0)
 
>     at ./build/../src/backend/executor/nodeNestloop.c:301
>         nlstate = 0x55601c1fbd80
>         __func__ = "ExecInitNestLoop"
> #6  0x0000555fe85598f1 in ExecInitNode (node=node@entry=0x55601b725a68, estate=estate@entry=0x55601c1fb0d8,
eflags=eflags@entry=0)
>     at ./build/../src/backend/executor/execProcnode.c:298
>         result = <optimized out>
>         subps = <optimized out>
>         l = <optimized out>
>         __func__ = "ExecInitNode"
> #7  0x0000555fe855480f in EvalPlanQualStart (epqstate=0x55601b745d68, planTree=0x55601b725a68) at
./build/../src/backend/executor/execMain.c:3152
>         parentestate = <optimized out>
>         oldcontext = 0x55601b7e99b0
>         rtsize = <optimized out>
>         rcestate = 0x55601c1fb0d8
>         l = <optimized out>
>         parentestate = <optimized out>
>         rtsize = <optimized out>
>         rcestate = <optimized out>
>         oldcontext = <optimized out>
>         l = <optimized out>
>         i = <optimized out>
>         l__state = <optimized out>
>         subplan = <optimized out>
>         subplanstate = <optimized out>
>         l__state = <optimized out>
>         earm = <optimized out>
>         l__state = <optimized out>
>         rtindex = <optimized out>
> #8  EvalPlanQualBegin (epqstate=epqstate@entry=0x55601b745d68) at ./build/../src/backend/executor/execMain.c:2930
>         parentestate = <optimized out>
>         recheckestate = <optimized out>
> #9  0x0000555fe85549ab in EvalPlanQual (epqstate=0x55601b745d68, relation=relation@entry=0x7f14d1722d68, rti=1,
inputslot=inputslot@entry=0x55601be51480)
>     at ./build/../src/backend/executor/execMain.c:2650
>         slot = <optimized out>
>         testslot = <optimized out>
> #10 0x0000555fe858001d in ExecUpdate (context=context@entry=0x7ffe304157d0,
resultRelInfo=resultRelInfo@entry=0x55601b745e88,tupleid=tupleid@entry=0x7ffe304157aa,
 
>     oldtuple=oldtuple@entry=0x0, oldSlot=<optimized out>, oldSlot@entry=0x55601be50c70,
slot=slot@entry=0x55601be51078,canSetTag=true)
 
>     at ./build/../src/backend/executor/nodeModifyTable.c:2606
>         inputslot = 0x55601be51480
>         epqslot = <optimized out>
>         lockedtid = {ip_blkid = {bi_hi = 30, bi_lo = 53843}, ip_posid = 40}
>         estate = 0x55601b7e9aa8
>         resultRelationDesc = <optimized out>
>         updateCxt = {crossPartUpdate = false, updateIndexes = TU_None, lockmode = LockTupleNoKeyExclusive}
>         result = <optimized out>
>         __func__ = "ExecUpdate"
> #11 0x0000555fe8581fff in ExecModifyTable (pstate=0x55601b745c80) at
./build/../src/backend/executor/nodeModifyTable.c:4510
>         node = 0x55601b745c80
>         context = {mtstate = 0x55601b745c80, epqstate = 0x55601b745d68, estate = 0x55601b7e9aa8, planSlot =
0x55601be4bef0,tmfd = {ctid = {ip_blkid = {bi_hi = 30,
 
>                 bi_lo = 53844}, ip_posid = 13}, xmax = 2949858589, cmax = 4294967295, traversed = true},
cpDeletedSlot= 0x0, cpUpdateReturningSlot = 0x0}
 
>         estate = 0x55601b7e9aa8
>         operation = CMD_UPDATE
>         resultRelInfo = 0x55601b745e88
>         subplanstate = <optimized out>
>         slot = 0x55601be51078
>         oldSlot = 0x55601be50c70
>         tuple_ctid = {ip_blkid = {bi_hi = 30, bi_lo = 53844}, ip_posid = 13}
>         oldtupdata = {t_len = 2675325712, t_self = {ip_blkid = {bi_hi = 32475, bi_lo = 0}, ip_posid = 32265},
t_tableOid= 0, t_data = 0xf0}
 
>         oldtuple = 0x0
>         tupleid = <optimized out>
>         tuplock = false
>         __func__ = "ExecModifyTable"
> #12 0x0000555fe855954d in ExecProcNodeInstr (node=0x55601b745c80) at
./build/../src/backend/executor/execProcnode.c:485
>         result = <optimized out>
> #13 0x0000555fe855191b in ExecProcNode (node=0x55601b745c80) at ./build/../src/include/executor/executor.h:315
> No locals.
> #14 ExecutePlan (queryDesc=0x55601b737af0, operation=CMD_UPDATE, sendTuples=true, numberTuples=0,
direction=<optimizedout>, dest=0x555fe8bd2ec0 <donothingDR>)
 
>     at ./build/../src/backend/executor/execMain.c:1697
>         estate = 0x55601b7e9aa8
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         planstate = 0x55601b745c80
>         current_tuple_count = 0
>         estate = <optimized out>
>         planstate = <optimized out>
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         current_tuple_count = <optimized out>
> #15 standard_ExecutorRun (queryDesc=0x55601b737af0, direction=<optimized out>, count=0) at
./build/../src/backend/executor/execMain.c:366
>         estate = 0x55601b7e9aa8
>         operation = CMD_UPDATE
>         dest = 0x555fe8bd2ec0 <donothingDR>
>         sendTuples = <optimized out>
>         oldcontext = 0x55601b01f490
> #16 0x0000555fe84e2e1c in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x55601b73b2d0, into=into@entry=0x0,
es=es@entry=0x55601b0218e0,
>     queryString=queryString@entry=0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"..., params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0,
>     planduration=0x7ffe30415aa8, bufusage=0x7ffe30415b50, mem_counters=0x0) at
./build/../src/backend/commands/explain.c:579
>         dir = <optimized out>
>         dest = 0x555fe8bd2ec0 <donothingDR>
>         queryDesc = 0x55601b737af0
>         starttime = <optimized out>
>         totaltime = 0
>         eflags = <optimized out>
>         instrument_option = <optimized out>
>         serializeMetrics = {bytesSent = 0, timeSpent = {ticks = 0}, bufferUsage = {shared_blks_hit = <optimized out>,
shared_blks_read= <optimized out>,
 
>             shared_blks_dirtied = <optimized out>, shared_blks_written = <optimized out>, local_blks_hit = <optimized
out>,local_blks_read = <optimized out>,
 
>             local_blks_dirtied = <optimized out>, local_blks_written = <optimized out>, temp_blks_read = <optimized
out>,temp_blks_written = <optimized out>,
 
>             shared_blk_read_time = {ticks = <optimized out>}, shared_blk_write_time = {ticks = <optimized out>},
local_blk_read_time= {ticks = <optimized out>},
 
>             local_blk_write_time = {ticks = <optimized out>}, temp_blk_read_time = {ticks = <optimized out>},
temp_blk_write_time= {ticks = <optimized out>}}}
 
> #17 0x0000555fe84e34c4 in standard_ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0,
es=0x55601b0218e0,
>     queryString=0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"..., params=0x0, queryEnv=0x0) at
./build/../src/backend/commands/explain.c:372
>         plan = 0x55601b73b2d0
>         planstart = <optimized out>
>         planduration = {ticks = 4659506}
>         bufusage_start = {shared_blks_hit = 18745, shared_blks_read = 0, shared_blks_dirtied = 0, shared_blks_written
=0, local_blks_hit = 0, local_blks_read = 0,
 
>           local_blks_dirtied = 0, local_blks_written = 0, temp_blks_read = 0, temp_blks_written = 0,
shared_blk_read_time= {ticks = 0}, shared_blk_write_time = {
 
>             ticks = 0}, local_blk_read_time = {ticks = 0}, local_blk_write_time = {ticks = 0}, temp_blk_read_time =
{ticks= 0}, temp_blk_write_time = {ticks = 0}}
 
>         bufusage = {shared_blks_hit = 16, shared_blks_read = 0, shared_blks_dirtied = 0, shared_blks_written = 0,
local_blks_hit= 0, local_blks_read = 0,
 
>           local_blks_dirtied = 0, local_blks_written = 0, temp_blks_read = 0, temp_blks_written = 0,
shared_blk_read_time= {ticks = 0}, shared_blk_write_time = {
 
>             ticks = 0}, local_blk_read_time = {ticks = 0}, local_blk_write_time = {ticks = 0}, temp_blk_read_time =
{ticks= 0}, temp_blk_write_time = {ticks = 0}}
 
>         mem_counters = {nblocks = 93871265236088, freechunks = 93871265236088, totalspace = 139727390028424,
freespace= 230455663}
 
>         planner_ctx = 0x0
>         saved_ctx = 0x0
> #18 0x0000555fe84e3641 in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=<optimized
out>,es=<optimized out>, pstate=<optimized out>,
 
>     params=<optimized out>) at ./build/../src/backend/commands/explain.c:309
> No locals.
> #19 0x0000555fe84e3733 in ExplainQuery (pstate=0x55601b01f728, stmt=0x55601b6b42d8, params=0x0, dest=0x55601b01f6a0)
at./build/../src/backend/commands/explain.c:223
 
>         l__state = {l = <optimized out>, i = 0}
>         l = 0x55601b197008
>         es = 0x55601b0218e0
>         tstate = <optimized out>
>         jstate = <optimized out>
>         query = <optimized out>
>         rewritten = 0x55601b196ff0
> #20 0x0000555fe872f083 in standard_ProcessUtility (pstmt=0x55601b6b4370,
>     queryString=0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"..., readOnlyTree=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL,params=0x0,
 
>     queryEnv=0x0, dest=0x55601b01f6a0, qc=0x7ffe30415db0) at ./build/../src/backend/tcop/utility.c:866
>         parsetree = 0x55601b6b42d8
>         isTopLevel = <optimized out>
>         isAtomicContext = true
>         pstate = 0x55601b01f728
>         readonly_flags = <optimized out>
>         __func__ = "standard_ProcessUtility"
> #21 0x0000555fe872d231 in PortalRunUtility (portal=portal@entry=0x55601afc2718, pstmt=0x55601b6b4370,
isTopLevel=isTopLevel@entry=true,
>     setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x55601b01f6a0, qc=qc@entry=0x7ffe30415db0) at
./build/../src/backend/tcop/pquery.c:1153
> No locals.
> #22 0x0000555fe872d5ef in FillPortalStore (portal=portal@entry=0x55601afc2718, isTopLevel=isTopLevel@entry=true) at
./build/../src/backend/tcop/pquery.c:1026
>         treceiver = 0x55601b01f6a0
>         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
>         __func__ = "FillPortalStore"
> #23 0x0000555fe872d96f in PortalRun (portal=portal@entry=0x55601afc2718, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true,
>     dest=dest@entry=0x55601b0ad0b0, altdest=altdest@entry=0x55601b0ad0b0, qc=qc@entry=0x7ffe30415fc0) at
./build/../src/backend/tcop/pquery.c:760
>         _save_exception_stack = 0x7ffe304162a0
>         _save_context_stack = 0x0
>         _local_sigjmp_buf = {{__jmpbuf = {93871257954072, -100083352534358458, 93871265235712, 140729708011456,
93871258914992,93871265235752, -100083352490318266,
 
>               -6061881521657190842}, __mask_was_saved = 0, __saved_mask = {__val = {0, 140728898420737,
93869327402605,93871257965608, 93870412007286,
 
>                 140729708011280, 93871257954072, 93870412007286, 1, 93871258914920, 93871265235752, 140729708011344,
93870411676182,140729708011344, 2,
 
>                 140729708011344}}}}
>         _do_rethrow = <optimized out>
>         result = <optimized out>
>         nprocessed = <optimized out>
>         saveTopTransactionResourceOwner = 0x55601af27370
>         saveTopTransactionContext = 0x55601afd83e0
>         saveActivePortal = 0x0
>         saveResourceOwner = 0x55601af27370
>         savePortalContext = 0x0
>         saveMemoryContext = 0x55601afd83e0
>         __func__ = "PortalRun"
> #24 0x0000555fe8729668 in exec_simple_query (
>     query_string=0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"...) at ./build/../src/backend/tcop/postgres.c:1273
 
>         cmdtaglen = 7
>         snapshot_set = <optimized out>
>         per_parsetree_context = 0x0
>         plantree_list = <optimized out>
>         parsetree = 0x55601b6b4300
>         commandTag = <optimized out>
>         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
>         querytree_list = <optimized out>
>         portal = 0x55601afc2718
>         receiver = 0x55601b0ad0b0
>         format = 0
>         cmdtagname = <optimized out>
>         parsetree_item__state = {l = 0x55601b6b4328, i = 0}
>         dest = DestRemote
>         oldcontext = 0x55601afd83e0
>         parsetree_list = 0x55601b6b4328
>         parsetree_item = 0x55601b6b4340
>         save_log_statement_stats = false
>         was_logged = false
>         use_implicit_block = false
>         msec_str =
"\340\031\301\350_U\000\000Q\000\000\000\000\000\000\000\000bA0\376\177\000\000\004\000\000\000\000\000\000"
>         __func__ = "exec_simple_query"
> #25 0x0000555fe872b56d in PostgresMain (dbname=<optimized out>, username=<optimized out>) at
./build/../src/backend/tcop/postgres.c:4766
>         query_string = 0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"...
 
>         firstchar = <optimized out>
>         input_message = {
>           data = 0x55601aed5198 "explain(buffers,verbose,analyze)update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 an"..., len = 431, maxlen = 1024, cursor = 431}
 
>         local_sigjmp_buf = {{__jmpbuf = {140729708012000, -100083351217347002, 2753760000, 4, 0, 1,
-100083351357856186,-6061881523619469754}, __mask_was_saved = 1,
 
>             __saved_mask = {__val = {4194304, 135168, 5232754935419077376, 16, 260416, 18446744073709551312, 260400,
0,16274, 139727397133088, 139727395799228,
 
>                 93870411867664, 139727390638096, 0, 18446744073709551312, 93871256618032}}}}
>         send_ready_for_query = false
>         idle_in_transaction_timeout_enabled = false
>         idle_session_timeout_enabled = false
>         __func__ = "PostgresMain"
> #26 0x0000555fe8725a33 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at
./build/../src/backend/tcop/backend_startup.c:124
>         bsdata = <optimized out>
> #27 0x0000555fe8683cfd in postmaster_child_launch (child_type=B_BACKEND, child_slot=316,
startup_data=startup_data@entry=0x7ffe304164c0,
>     startup_data_len=startup_data_len@entry=24, client_sock=client_sock@entry=0x7ffe304164e0) at
./build/../src/backend/postmaster/launch_backend.c:290
>         pid = <optimized out>
> #28 0x0000555fe8687802 in BackendStartup (client_sock=0x7ffe304164e0) at
./build/../src/backend/postmaster/postmaster.c:3587
>         bn = 0x7f14d1b05b50
>         pid = <optimized out>
>         startup_data = {canAcceptConnections = CAC_OK, socket_created = 813354333582584, fork_started =
813354333582603}
>         cac = <optimized out>
>         bn = <optimized out>
>         pid = <optimized out>
>         startup_data = <optimized out>
>         cac = <optimized out>
>         __func__ = "BackendStartup"
>         __errno_location = <optimized out>
>         save_errno = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
> #29 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1702
>         s = {sock = 10, raddr = {addr = {ss_family = 2,
>               __ss_padding =
"\305\370\274|$\247\000\000\000\000\000\000\000\000K\323\352\032`U\000\000\000\000\000\000\000\000\000\000PeA0\376\177\000\000
eA0\376\177\000\000\000\004\000\000\000\000\000\000@\323\352\032`U\000\000\213y\210\350_U",'\000' <repeats 18 times>,
"peA0\376\177\000\000x\344\217\350_U\000\000\000\000\000\000\000\000\000\000\255\226\311\321\024\177\000",__ss_align =
1},salen = 16}}
 
>         i = 0
>         now = <optimized out>
>         last_lockfile_recheck_time = 1760039078
>         last_touch_time = 1760036170
>         events = {{pos = 1, events = 2, fd = 6, user_data = 0x0}, {pos = 0, events = 0, fd = 6, user_data = 0x0},
{pos= 0, events = 0, fd = 8, user_data = 0x0}, {
 
>             pos = 658, events = 21855, fd = 451405112, user_data = 0x400000000aa}, {pos = 0, events = 21856, fd =
451597131,user_data = 0x0}, {pos = -1303149824,
 
>             events = 1218345699, fd = 451413120, user_data = 0x555fe8c28f60 <errordata>}, {pos = 809592352, events =
32766,fd = -393725350, user_data = 0xf}, {
 
>             pos = 0, events = 0, fd = 809592432, user_data = 0x0}, {pos = 809592432, events = 32766, fd = 451930800,
user_data= 0x555fe88eca37}, {pos = -389995904,
 
>             events = 21855, fd = 0, user_data = 0x555fe88ce239 <pg_freeaddrinfo_all+73>}, {pos = 8, events = 0, fd =
809592672,user_data = 0x7ffe30416fa0}, {
 
>             pos = -396723022, events = 21855, fd = 451936489, user_data = 0x15381af000f2}, {pos = 451767480, events =
21856,fd = 809592672,
 
>             user_data = 0x7ffe304166bc}, {pos = 1, events = 1, fd = 451936565, user_data = 0x1e8afb0b4}, {pos =
451930800,events = 21856, fd = -393223028,
 
>             user_data = 0x100000001}, {pos = 1, events = 0, fd = 0, user_data = 0x0}, {pos = 0, events = 0, fd = 0,
user_data= 0x7f0032333435}, {pos = -393347968,
 
>             events = 21855, fd = 451936712, user_data = 0x55601af001d2}, {pos = 451936739, events = 21856, fd =
15729133,user_data = 0x556000000000}, {pos = 0,
 
>             events = 0, fd = 0, user_data = 0x556000000000}, {pos = 0, events = 21760, fd = -771537064, user_data =
0x6e75722f7261762f},{pos = 1936683055,
 
>             events = 1701996404, fd = 795636083, user_data = 0x3334352e4c515347}, {pos = -393281486, events = 21855,
fd= -393262758, user_data = 0x7ffe30416dd0}, {
 
>             pos = -393347879, events = 21855, fd = 0, user_data = 0x0}, {pos = 809594368, events = 32766, fd =
-393348049,user_data = 0x7ffe30416e10}, {pos = 9305135,
 
>             events = 0, fd = 0, user_data = 0x0}, {pos = 0, events = 0, fd = -771537064, user_data = 0x0}, {pos =
-775242182,events = 32532, fd = 0,
 
>             user_data = 0xff0}, {pos = 0, events = 538976256, fd = -771537064, user_data = 0x5420706100000000}, {pos
=-773907776, events = 32532, fd = 255,
 
>             user_data = 0xfffffffffffffed0}, {pos = 0, events = 0, fd = 399, user_data = 0x55601ae7b2b0}, {pos =
-775242182,events = 32532, fd = 665957,
 
>             user_data = 0xdf20}, {pos = 0, events = 0, fd = 10, user_data = 0x0}, {pos = -773907776, events = 32532,
fd= 255, user_data = 0xfffffffffffffed0}, {
 
>             pos = -773914672, events = 32532, fd = 8, user_data = 0x7f14d1deffd0 <_IO_file_jumps>}, {pos =
-775238318,events = 32532, fd = 2996,
 
>             user_data = 0x55601ae7b2b0}, {pos = 4096, events = 0, fd = 809593520, user_data = 0x7f14d1deffd0
<_IO_file_jumps>},{pos = -775389748, events = 32532,
 
>             fd = 26, user_data = 0x1397}, {pos = 1, events = 0, fd = 33152, user_data = 0x70}, {pos = 0, events = 0,
fd= 1, user_data = 0x100000000}, {pos = 2,
 
>             events = 17, fd = 0, user_data = 0x3}, {pos = 0, events = 1, fd = 0, user_data = 0x0}, {pos = 0, events =
0,fd = 0, user_data = 0x0}, {pos = 0,
 
>             events = 0, fd = 0, user_data = 0x0}, {pos = 0, events = 0, fd = 0, user_data = 0x0}, {pos = 0, events =
0,fd = 0, user_data = 0x55601ae7b2b0}, {pos = 8,
 
>             events = 0, fd = -773907776, user_data = 0x802}, {pos = -304, events = 4294967295, fd = 5, user_data =
0x555fe88e5dd6},{pos = -393256156, events = 21855,
 
>             fd = -775238318, user_data = 0x7ffe30416a50}, {pos = -393342935, events = 21855, fd = 32768, user_data =
0x9},{pos = 809593680, events = 32766,
 
>             fd = -775001723, user_data = 0x7f0000000000}, {pos = 9, events = 0, fd = 809593648, user_data =
0x7f14d1ca90cd},{pos = 2429, events = 0, fd = 32832,
 
>             user_data = 0x55601af00ad0}, {pos = 32832, events = 0, fd = 451971856, user_data = 0x7f14d1caa4f8}, {pos
=9, events = 0, fd = 451939024,
 
>             user_data = 0xfffffffffffffe98}, {pos = 0, events = 0, fd = 2050, user_data = 0x7f14d1cad3c0 <free+384>},
{pos= 544854009, events = 0, fd = 1759619142,
 
>             user_data = 0x2079cff9}, {pos = 0, events = 0, fd = 0, user_data = 0x9}, {pos = 809593648, events =
32766,fd = 809593680, user_data = 0x55601af00ae0}, { 
>             pos = -393323050, events = 21855, fd = -393256156, user_data = 0x7f14d1ce678d <closedir+13>}, {pos =
451459056,events = 21856, fd = -395355527,
 
>             user_data = 0x55601af00ae0}, {pos = -393322945, events = 21855, fd = 809594800, user_data =
0x555fe86f90c8<RemovePgTempFiles+312>}, {pos = 451541152,
 
>             events = 21856, fd = 0, user_data = 0x7367702f65736162}, {pos = 1952410737, events = 1207988333, fd =
771766842,user_data = 0x7f14d1cabe3a}}
 
>         nevents = <optimized out>
>         __func__ = "ServerLoop"
> #30 0x0000555fe8689110 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x55601ae7c310) at
./build/../src/backend/postmaster/postmaster.c:1400
>         opt = <optimized out>
>         status = <optimized out>
>         userDoption = <optimized out>
>         listen_addr_saved = true
>         output_config_variable = <optimized out>
>         __func__ = "PostmasterMain"
> #31 0x0000555fe837f880 in main (argc=5, argv=0x55601ae7c310) at ./build/../src/backend/main/main.c:227
>         do_check_root = <optimized out>
>         dispatch_option = <optimized out>




Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
David Rowley
Дата:
On Fri, 10 Oct 2025 at 14:34, Yuri Zamyatin <yuri@yrz.am> wrote:
>
> Hi. I was able to reproduce the crash with the simpler (non hash-agg) plan from the previous message.
> Basically I launched it in multiple infinite loops that do BEGIN - UPDATE - ROLLBACK. Other clients could also modify
thetables during this time.
 
> We've seen this query crash on multiple physical hosts.

> > #0  0x0000555fe8678300 in PartitionDirectoryLookup (pdir=0x0, rel=0x7f14d172b288) at
./build/../src/backend/partitioning/partdesc.c:462
> >         pde = <optimized out>
> >         relid = 21856
> >         found = 27

Looks like that's crashing in a different place from the last
backtrace you showed.

Are you able to test this without any extensions loaded to see if you
still get a crash?

At a wild guess, perhaps an extension has gone rogue and spawned
another thread resulting in something like concurrent palloc requests
getting confused and causing something strange to happen when
accessing certain palloc'd chunks. Running without extensions may help
narrow things down.

> postgresql_effective_cache_size = 560GB
> postgresql_shared_buffers = 225GB

Which extension are these GUCs from?

David



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Yuri Zamyatin
Дата:
Hi.

Hash aggregation.

> enable_hashagg=off

Fixed all the crashes in a subset of queries that used hash aggregation (tts_minimal_store_tuple).
I'm sure about that since I did some a/b testing within a timeframe of a few days.


Configuration.

> > postgresql_effective_cache_size = 560GB
> > postgresql_shared_buffers = 225GB
> Which extension are these GUCs from?

It was a typo in my message, sorry about that.
In postgresql.conf they're stated correctly as effective_cache_size and shared_buffers.


Partition lookup.

> Are you able to test this without any extensions loaded to see if you still get a crash?

Yes, reproduced it on a machine with a clean Debian 13 & Postgresql 18 install
from the sources mentioned earlier. The only extension loaded is plpgsql. Changes to
postgresql.conf: max_connections=1000, work_mem=2000MB, shared_buffers=10GB, max_wal_size=10GB.

I pg_restore'd two tables (+partitions) from production into a clean cluster and created
indexes manually. The partitioned table is 2.2TB in size. I hope to narrow things down
and provide better reproduction steps tomorrow.

To cause the segfault, these queries were launched simultaneously.

> -- in 2 parallel infinite loops
> with ids as (select (118998526-random()*100000)::int id from generate_series(1,10000))
> update tcv_scene_datas set id=id where cv_scene_id in(select id from ids);
> with ids as (select (118998526-random()*100000)::int id from generate_series(1,10000))
> update tcv_scenes set id=id where id in(select id from ids);

> -- in 10 parallel infinite loops
> set jit = off;
> begin;
> -- <EXPLAIN to save query plan>
> update Tcv_scenes cs -- CRASHES
> set
>         state_id=2,
>         stitching_server_id=null,
>         stitching_server_pid=null
> from
>         tcv_scene_datas cd -- partitioned
> where
>         cd.cv_scene_id=cs.id and
>         (
>                 (state_id=7 and date_cr<now()-interval '24 hours' and date_state_mo>now()-interval '15 minutes' and
cd.stitcher_result::textlike '%download%') or 
>                 (state_id=3 and date_state_mo<now()-interval '5 minutes')
>         )
> returning cs.id;
> rollback;

Plan stayed the same. Stacktrace also looks the same (now without analyze part):

> #0  0x000055987ffde300 in PartitionDirectoryLookup (pdir=0x0, rel=0x7f3069be8c98) at
./build/../src/backend/partitioning/partdesc.c:462
>         pde = <optimized out>
>         relid = 0
>         found = false
> #1  0x000055987febeb51 in InitExecPartitionPruneContexts (prunestate=<optimized out>, parent_plan=0x55988a30b7d0,
initially_valid_subplans=<optimizedout>, 
>     n_total_subplans=<optimized out>) at ./build/../src/backend/executor/execPartition.c:2413
>         partkey = 0x5598892dd5a0
>         partdesc = <optimized out>
>         pprune = <optimized out>
>         nparts = 239
>         k = <optimized out>
>         prunedata = 0x559889a8f7b8
>         j = <optimized out>
>         estate = <optimized out>
>         new_subplan_indexes = <optimized out>
>         new_other_subplans = <optimized out>
>         i = 0
>         newidx = <optimized out>
>         fix_subplan_map = <optimized out>
>         estate = <optimized out>
>         new_subplan_indexes = <optimized out>
>         new_other_subplans = <error reading variable new_other_subplans (Cannot access memory at address 0x0)>
>         i = <optimized out>
>         newidx = <optimized out>
>         fix_subplan_map = <optimized out>
>         prunedata = <error reading variable prunedata (Cannot access memory at address 0x0)>
>         j = <optimized out>
>         pprune = <optimized out>
>         nparts = <optimized out>
>         k = <optimized out>
>         partkey = <optimized out>
>         partdesc = <optimized out>
>         oldidx = <optimized out>
>         subidx = <optimized out>
>         subprune = <optimized out>
> #2  ExecInitPartitionExecPruning (planstate=planstate@entry=0x55988a30b7d0, n_total_subplans=<optimized out>,
part_prune_index=<optimizedout>, relids=<optimized out>, 
>     initially_valid_subplans=initially_valid_subplans@entry=0x7ffefb55a2e0) at
./build/../src/backend/executor/execPartition.c:1934
>         prunestate = <optimized out>
>         estate = <optimized out>
>         pruneinfo = <optimized out>
>         __func__ = "ExecInitPartitionExecPruning"
> #3  0x000055987fed1030 in ExecInitAppend (node=node@entry=0x7f2dcff14420, estate=estate@entry=0x55988a309b08,
eflags=eflags@entry=0)
>     at ./build/../src/backend/executor/nodeAppend.c:147
>         prunestate = <optimized out>
>         appendstate = 0x55988a30b7d0
>         appendplanstates = <optimized out>
>         appendops = <optimized out>
>         validsubplans = 0x55988a325470
>         asyncplans = <optimized out>
>         nplans = <optimized out>
>         nasyncplans = <optimized out>
>         firstvalid = <optimized out>
>         i = <optimized out>
>         j = <optimized out>
> #4  0x000055987febfad5 in ExecInitNode (node=0x7f2dcff14420, estate=estate@entry=0x55988a309b08, eflags=0) at
./build/../src/backend/executor/execProcnode.c:182
>         result = <optimized out>
>         subps = <optimized out>
>         l = <optimized out>
>         __func__ = "ExecInitNode"
> #5  0x000055987feea383 in ExecInitNestLoop (node=node@entry=0x7f2dd0006f28, estate=estate@entry=0x55988a309b08,
eflags=<optimizedout>, eflags@entry=0) 
>     at ./build/../src/backend/executor/nodeNestloop.c:301
>         nlstate = 0x55988a30a7b0
>         __func__ = "ExecInitNestLoop"
> #6  0x000055987febf8f1 in ExecInitNode (node=node@entry=0x7f2dd0006f28, estate=estate@entry=0x55988a309b08,
eflags=eflags@entry=0)
>     at ./build/../src/backend/executor/execProcnode.c:298
>         result = <optimized out>
>         subps = <optimized out>
>         l = <optimized out>
>         __func__ = "ExecInitNode"
> #7  0x000055987feba80f in EvalPlanQualStart (epqstate=0x559889383d08, planTree=0x7f2dd0006f28) at
./build/../src/backend/executor/execMain.c:3152
>         parentestate = <optimized out>
>         oldcontext = 0x559889a8ea20
>         rtsize = <optimized out>
>         rcestate = 0x55988a309b08
>         l = <optimized out>
>         parentestate = <optimized out>
>         rtsize = <optimized out>
>         rcestate = <optimized out>
>         oldcontext = <optimized out>
>         l = <optimized out>
>         i = <optimized out>
>         l__state = <optimized out>
>         subplan = <optimized out>
>         subplanstate = <optimized out>
>         l__state = <optimized out>
>         earm = <optimized out>
>         l__state = <optimized out>
>         rtindex = <optimized out>
> #8  EvalPlanQualBegin (epqstate=epqstate@entry=0x559889383d08) at ./build/../src/backend/executor/execMain.c:2930
>         parentestate = <optimized out>
>         recheckestate = <optimized out>
> #9  0x000055987feba9ab in EvalPlanQual (epqstate=0x559889383d08, relation=relation@entry=0x7f3069be4710, rti=1,
inputslot=inputslot@entry=0x559889fc01f0)
>     at ./build/../src/backend/executor/execMain.c:2650
>         slot = <optimized out>
>         testslot = <optimized out>
> #10 0x000055987fee601d in ExecUpdate (context=context@entry=0x7ffefb55a5b0,
resultRelInfo=resultRelInfo@entry=0x559889383e28,tupleid=tupleid@entry=0x7ffefb55a58a, 
>     oldtuple=oldtuple@entry=0x0, oldSlot=<optimized out>, oldSlot@entry=0x559889fb3178,
slot=slot@entry=0x559889fb3580,canSetTag=true) 
>     at ./build/../src/backend/executor/nodeModifyTable.c:2606
>         inputslot = 0x559889fc01f0
>         epqslot = <optimized out>
>         lockedtid = {ip_blkid = {bi_hi = 31, bi_lo = 16528}, ip_posid = 88}
>         estate = 0x559889a8eb18
>         resultRelationDesc = <optimized out>
>         updateCxt = {crossPartUpdate = false, updateIndexes = TU_None, lockmode = LockTupleNoKeyExclusive}
>         result = <optimized out>
>         __func__ = "ExecUpdate"
> #11 0x000055987fee7fff in ExecModifyTable (pstate=0x559889383c20) at
./build/../src/backend/executor/nodeModifyTable.c:4510
>         node = 0x559889383c20
>         context = {mtstate = 0x559889383c20, epqstate = 0x559889383d08, estate = 0x559889a8eb18, planSlot =
0x559889fac818,tmfd = {ctid = {ip_blkid = {bi_hi = 31, 
>                 bi_lo = 16528}, ip_posid = 62}, xmax = 17203, cmax = 4294967295, traversed = true}, cpDeletedSlot =
0x0,cpUpdateReturningSlot = 0x7ffefb55a620} 
>         estate = 0x559889a8eb18
>         operation = CMD_UPDATE
>         resultRelInfo = 0x559889383e28
>         subplanstate = <optimized out>
>         slot = 0x559889fb3580
>         oldSlot = 0x559889fb3178
>         tuple_ctid = {ip_blkid = {bi_hi = 31, bi_lo = 16528}, ip_posid = 62}
>         oldtupdata = {t_len = 240, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 50744}, t_tableOid =
402685741,t_data = 0x559889facf00} 
>         oldtuple = 0x0
>         tupleid = <optimized out>
>         tuplock = false
>         __func__ = "ExecModifyTable"
> #12 0x000055987feb791b in ExecProcNode (node=0x559889383c20) at ./build/../src/include/executor/executor.h:315
> No locals.
> #13 ExecutePlan (queryDesc=0x5598892beb88, operation=CMD_UPDATE, sendTuples=true, numberTuples=0,
direction=<optimizedout>, dest=0x5598892beb00) 
>     at ./build/../src/backend/executor/execMain.c:1697
>         estate = 0x559889a8eb18
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         planstate = 0x559889383c20
>         current_tuple_count = 536
>         estate = <optimized out>
>         planstate = <optimized out>
>         use_parallel_mode = <optimized out>
>         slot = <optimized out>
>         current_tuple_count = <optimized out>
> #14 standard_ExecutorRun (queryDesc=0x5598892beb88, direction=<optimized out>, count=0) at
./build/../src/backend/executor/execMain.c:366
>         estate = 0x559889a8eb18
>         operation = CMD_UPDATE
>         dest = 0x5598892beb00
>         sendTuples = <optimized out>
>         oldcontext = 0x5598892be8f0
> #15 0x0000559880092774 in ProcessQuery (plan=0x7f2dd001cb00,
>     sourceText=0x559889170158 "update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 and date_cr<now()-interval '24 hou"..., params=0x0,
queryEnv=0x0,dest=0x5598892beb00, qc=0x7ffefb55a780) 
>     at ./build/../src/backend/tcop/pquery.c:161
>         queryDesc = 0x5598892beb88
> #16 0x0000559880093421 in PortalRunMulti (portal=portal@entry=0x559889267908, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=true,
>     dest=dest@entry=0x5598892beb00, altdest=0x559880538ec0 <donothingDR>, qc=qc@entry=0x7ffefb55a780) at
./build/../src/backend/tcop/pquery.c:1272
>         pstmt = 0x7f2dd001cb00
>         stmtlist_item__state = {l = 0x7f2dd00225c0, i = 0}
>         active_snapshot_set = true
>         stmtlist_item = 0x7f2dd00225d8
> #17 0x000055988009358f in FillPortalStore (portal=portal@entry=0x559889267908, isTopLevel=isTopLevel@entry=true) at
./build/../src/backend/tcop/pquery.c:1021
>         treceiver = 0x5598892beb00
>         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
>         __func__ = "FillPortalStore"
> #18 0x000055988009396f in PortalRun (portal=portal@entry=0x559889267908, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true,
>     dest=dest@entry=0x7f2dd000be20, altdest=altdest@entry=0x7f2dd000be20, qc=qc@entry=0x7ffefb55a990) at
./build/../src/backend/tcop/pquery.c:760
>         _save_exception_stack = 0x7ffefb55ac70
>         _save_context_stack = 0x0
>         _local_sigjmp_buf = {{__jmpbuf = {94113624389896, 9025151889858365403, 94113630543472, 140733115115920,
139834739965472,94113630543512, 9025151889659135963, 
>               3022814109072099291}, __mask_was_saved = 0, __saved_mask = {__val = {0, 140728898420737,
94114140538477,94113624401432, 94113473390454, 140733115115744, 
>                 94113624389896, 94113473390454, 1, 139834740057536, 94113630543512, 140733115115808, 94113473059350,
140733115115808,2, 140733115115808}}}} 
>         _do_rethrow = <optimized out>
>         result = <optimized out>
>         nprocessed = <optimized out>
>         saveTopTransactionResourceOwner = 0x5598892157f8
>         saveTopTransactionContext = 0x55988927c5c0
>         saveActivePortal = 0x0
>         saveResourceOwner = 0x5598892157f8
>         savePortalContext = 0x0
>         saveMemoryContext = 0x55988927c5c0
>         __func__ = "PortalRun"
> #19 0x000055988008f668 in exec_simple_query (
>     query_string=0x559889170158 "update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 and date_cr<now()-interval '24 hou"...) at
./build/../src/backend/tcop/postgres.c:1273
>         cmdtaglen = 6
>         snapshot_set = <optimized out>
>         per_parsetree_context = 0x0
>         plantree_list = <optimized out>
>         parsetree = 0x559889845e70
>         commandTag = <optimized out>
>         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
>         querytree_list = <optimized out>
>         portal = 0x559889267908
>         receiver = 0x7f2dd000be20
>         format = 0
>         cmdtagname = <optimized out>
>         parsetree_item__state = {l = 0x559889845e98, i = 0}
>         dest = DestRemote
>         oldcontext = 0x55988927c5c0
>         parsetree_list = 0x559889845e98
>         parsetree_item = 0x559889845eb0
>         save_log_statement_stats = false
>         was_logged = false
>         use_implicit_block = false
>         msec_str =
"\340yW\200\230U\000\000Q\000\000\000\000\000\000\000ЫU\373\376\177\000\000\004\000\000\000\000\000\000"
>         __func__ = "exec_simple_query"
> #20 0x000055988009156d in PostgresMain (dbname=<optimized out>, username=<optimized out>) at
./build/../src/backend/tcop/postgres.c:4766
>         query_string = 0x559889170158 "update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 and date_cr<now()-interval '24 hou"... 
>         firstchar = <optimized out>
>         input_message = {
>           data = 0x559889170158 "update Tcv_scenes
cs\nset\n\tstate_id=2,\n\tstitching_server_id=null,\n\tstitching_server_pid=null\nfrom\n\ttcv_scene_datas
cd\nwhere\n\tcd.cv_scene_id=cs.idand\n\t(\n\t\t(state_id=7 and date_cr<now()-interval '24 hou"..., len = 381, maxlen =
1024,cursor = 381} 
>         local_sigjmp_buf = {{__jmpbuf = {140733115116464, 9025151890000971739, 293935616, 4, 0, 1,
9025151889751410651,3022814102951954395}, __mask_was_saved = 1, 
>             __saved_mask = {__val = {4194304, 143360, 12259252146692762112, 16, 132672, 18446744073709551312, 132656,
0,8290, 139845915581216, 139845914247356, 
>                 94113473250832, 139845913448464, 2047, 18446744073709551312, 94113623012352}}}}
>         send_ready_for_query = false
>         idle_in_transaction_timeout_enabled = false
>         idle_session_timeout_enabled = false
>         __func__ = "PostgresMain"
> #21 0x000055988008ba33 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at
./build/../src/backend/tcop/backend_startup.c:124
>         bsdata = <optimized out>
> #22 0x000055987ffe9cfd in postmaster_child_launch (child_type=B_BACKEND, child_slot=6,
startup_data=startup_data@entry=0x7ffefb55ae90,
>     startup_data_len=startup_data_len@entry=24, client_sock=client_sock@entry=0x7ffefb55aeb0) at
./build/../src/backend/postmaster/launch_backend.c:290
>         pid = <optimized out>
> #23 0x000055987ffed802 in BackendStartup (client_sock=0x7ffefb55aeb0) at
./build/../src/backend/postmaster/postmaster.c:3587
>         bn = 0x5598891e6700
>         pid = <optimized out>
>         startup_data = {canAcceptConnections = CAC_OK, socket_created = 813767320273791, fork_started =
813767320273794}
>         cac = <optimized out>
>         bn = <optimized out>
>         pid = <optimized out>
>         startup_data = <optimized out>
>         cac = <optimized out>
>         __func__ = "BackendStartup"
>         __errno_location = <optimized out>
>         save_errno = <optimized out>
>         __errno_location = <optimized out>
>         __errno_location = <optimized out>
> #24 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1702
>         s = {sock = 10, raddr = {addr = {ss_family = 1,
>               __ss_padding =
"\2230\"\026\237\276\000\000\000\000\000\000\000\000\v\203\024\211\230U\000\000\000\000\000\000\000\000\000\000
\257U\373\376\177\000\000\360\256U\373\376\177\000\000\000\004\000\000\000\000\000\000\000\203\024\211\230U\000\000\213\331\036\200\230U",
'\000'<repeats 18 times>,
"@\257U\373\376\177\000\000\236(&\200\230U\000\000\000\000\000\000\000\000\000\000\255\226\tj0\177\000",__ss_align =
1},salen = 2}} 
>         i = 0
>         now = <optimized out>
>         last_lockfile_recheck_time = 1760452076
>         last_touch_time = 1760450375
>         events = {{pos = 3, events = 2, fd = 8, user_data = 0x0}, {pos = 0, events = 0, fd = 8, user_data = 0x0},
{pos= -1995336688, events = 21912, fd = 0, 
>             user_data = 0x559880252a37}, {pos = 0, events = 0, fd = -1995336440, user_data = 0x40000000069}, {pos =
0,events = 21912, fd = -1995144437, 
>             user_data = 0x0}, {pos = 126953984, events = 2854329568, fd = -1995328432, user_data = 0x55988058ef60
<errordata>},{pos = -78270480, events = 32766, 
>             fd = -2145478054, user_data = 0xf}, {pos = 0, events = 0, fd = -78270400, user_data = 0x0}, {pos =
-78270400,events = 32766, fd = -1994499072, 
>             user_data = 0x559880252a37}, {pos = -2141748608, events = 21912, fd = 0, user_data = 0x559880234239
<pg_freeaddrinfo_all+73>},{pos = 8, events = 0, 
>             fd = -78270160, user_data = 0x7ffefb55b970}, {pos = 2146491570, events = 21912, fd = 0, user_data =
0x153800000000},{pos = -1994974088, events = 21912, 
>             fd = -78270160, user_data = 0x7ffefb55b08c}, {pos = 1, events = 1, fd = -1994516715, user_data =
0x1891e171c},{pos = -1994499072, events = 21912, 
>             fd = -1994516682, user_data = 0x100000001}, {pos = 1, events = 0, fd = 0, user_data = 0x0}, {pos = 0,
events= 0, fd = 0, user_data = 0x7f0032333435}, { 
>             pos = -2145100672, events = 21912, fd = -1994516568, user_data = 0x5598891e17b2}, {pos = -1994516541,
events= 21912, fd = -1994516531, 
>             user_data = 0x5598891e17de}, {pos = -1994516506, events = 21912, fd = -1994516443, user_data =
0x5598891e182b},{pos = -1994516425, events = 21760, 
>             fd = 1782978392, user_data = 0x6e75722f7261762f}, {pos = 1936683055, events = 1701996404, fd = 795636083,
user_data= 0x3334352e4c515347}, { 
>             pos = -2145058766, events = 21912, fd = -2145015462, user_data = 0x7ffefb55b7a0}, {pos = -2145100583,
events= 21912, fd = 0, user_data = 0x0}, { 
>             pos = -78268464, events = 32766, fd = -2145100753, user_data = 0x7ffefb55b7e0}, {pos = 2382895, events =
0,fd = 0, user_data = 0x0}, {pos = 0, events = 0, 
>             fd = 1782978392, user_data = 0x0}, {pos = 1779089596, events = 32560, fd = 0, user_data =
0x5598891e38b0},{pos = 0, events = 0, fd = 1782978392, 
>             user_data = 0x8}, {pos = 1780423360, events = 32560, fd = 255, user_data = 0xfffffffffffffed0}, {pos = 0,
events= 0, fd = 399, 
>             user_data = 0x5598891e5420}, {pos = 1779088954, events = 32560, fd = -1994499536, user_data = 0x570},
{pos= 0, events = 0, fd = 10, user_data = 0x0}, { 
>             pos = 1780423360, events = 32560, fd = 255, user_data = 0xfffffffffffffed0}, {pos = 1780416464, events =
32560,fd = 8, 
>             user_data = 0x7f306a1effd0 <_IO_file_jumps>}, {pos = 1779092818, events = 32560, fd = 2996, user_data =
0x5598891e5420},{pos = 4096, events = 0, 
>             fd = -78269312, user_data = 0x7f306a1effd0 <_IO_file_jumps>}, {pos = 1778941388, events = 32560, fd = 25,
user_data= 0x21fa}, {pos = 1, events = 0, 
>             fd = 33152, user_data = 0x68}, {pos = 0, events = 0, fd = 1, user_data = 0x100000000}, {pos = 2, events =
17,fd = 0, user_data = 0x3}, {pos = 0, 
>             events = 1, fd = 0, user_data = 0x0}, {pos = 0, events = 0, fd = 0, user_data = 0x0}, {pos = 0, events =
0,fd = 0, user_data = 0x0}, {pos = 0, events = 0, 
>             fd = 0, user_data = 0x0}, {pos = 0, events = 0, fd = -1994501088, user_data = 0x5598891e5420}, {pos = 8,
events= 0, fd = 1780423360, user_data = 0x802}, { 
>             pos = -304, events = 4294967295, fd = 5, user_data = 0x55988024bdd6}, {pos = -2145008860, events = 21912,
fd= 1779092818, user_data = 0x7ffefb55b420}, { 
>             pos = -2145095639, events = 21912, fd = 32768, user_data = 0x9}, {pos = -78269152, events = 32766, fd =
1779329413,user_data = 0x7f0000000000}, {pos = 9, 
>             events = 0, fd = -78269184, user_data = 0x7f306a0e69ff}, {pos = 2346, events = 0, fd = 16318532,
user_data= 0x5598891fed30}, {pos = 126953984, 
>             events = 2854329568, fd = -1994363536, user_data = 0x9}, {pos = -1994396368, events = 21912, fd = -360,
user_data= 0x9}, {pos = -1994396368, 
>             events = 21912, fd = -360, user_data = 0x7f306a0ad3c0 <free+384>}, {pos = 926193527, events = 0, fd =
1760102962,user_data = 0x37349777}, {pos = 0, 
>             events = 0, fd = 0, user_data = 0x9}, {pos = -78269184, events = 32766, fd = -78269152, user_data =
0x5598891fed40},{pos = -2145075754, events = 21912, 
>             fd = -2145008860, user_data = 0x7f306a0e678d <closedir+13>}, {pos = -1995282496, events = 21912, fd =
-2147108231,user_data = 0x5598891fed40}, { 
>             pos = -2145075649, events = 21912, fd = -78268032, user_data = 0x55988005f0c8 <RemovePgTempFiles+312>},
{pos= -2141655518, events = 21912, fd = -78268820, 
>             user_data = 0x7367702f65736162}, {pos = 1952410737, events = 28781, fd = 771766842, user_data =
0x7f306a0abe3a}}
>         nevents = <optimized out>
>         __func__ = "ServerLoop"
> #25 0x000055987ffef110 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x5598891172e0) at
./build/../src/backend/postmaster/postmaster.c:1400
>         opt = <optimized out>
>         status = <optimized out>
>         userDoption = <optimized out>
>         listen_addr_saved = true
>         output_config_variable = <optimized out>
>         __func__ = "PostmasterMain"
> #26 0x000055987fce5880 in main (argc=5, argv=0x5598891172e0) at ./build/../src/backend/main/main.c:227
>         do_check_root = <optimized out>
>         dispatch_option = <optimized out>

__
Yuri Zamyatin




Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
David Rowley
Дата:
On Wed, 15 Oct 2025 at 04:51, Yuri Zamyatin <yuri@yrz.am> wrote:
> To cause the segfault, these queries were launched simultaneously.
>
> > -- in 2 parallel infinite loops
> > with ids as (select (118998526-random()*100000)::int id from generate_series(1,10000))
> > update tcv_scene_datas set id=id where cv_scene_id in(select id from ids);
> > with ids as (select (118998526-random()*100000)::int id from generate_series(1,10000))
> > update tcv_scenes set id=id where id in(select id from ids);

Are you able to mock this up using the schema and some test data then
share the script to populate the database?

David



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Yuri Zamyatin
Дата:
I found a much easier way.

create.sql:
> drop database if exists segtest;
> create database segtest;
> \c segtest;
> create table tcv_scene_datas(cv_scene_id bigint primary key) partition by range (cv_scene_id);
> do $$
> declare
>     i int;
>     range_start bigint;
>     range_end bigint;
>     partition_name text;
> begin
>     for i in 0..100 loop
>         range_start := 1 + (i * 10000);
>         range_end := range_start + 10000;
>         partition_name := 'tcv_scene_datas_' || LPAD(i::TEXT, 3, '0');
>         execute format(
>             'create table %I partition of tcv_scene_datas for values from (%s) to (%s)',
>             partition_name,
>             range_start,
>             range_end
>         );
>     end loop;
> end $$;
> insert into tcv_scene_datas(cv_scene_id) select id from generate_series(1,1_000_000) id;

crash.sql:
> \c segtest
> with ids as (select (random()*1_000_000)::int id from generate_series(1,1000))
> update tcv_scene_datas set cv_scene_id=cv_scene_id where cv_scene_id in(select id from ids);

Launch crash.sql in 16 threads of infinite loops:
> seq 16 | xargs -P 16 -I {} sh -c 'while true; do psql -f crash.sql; done'

In 1-2 minutes, 5 processes died with segfault.
Also I expected deadlocks with such query, strangely database did not report them.

Let me know if you need more data.

__
Best wishes, Yuri




Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
David Rowley
Дата:
On Thu, 16 Oct 2025 at 03:21, Yuri Zamyatin <yuri@yrz.am> wrote:
> create.sql:
> > drop database if exists segtest;
> > create database segtest;
> > \c segtest;
> > create table tcv_scene_datas(cv_scene_id bigint primary key) partition by range (cv_scene_id);
> > do $$
> > declare
> >     i int;
> >     range_start bigint;
> >     range_end bigint;
> >     partition_name text;
> > begin
> >     for i in 0..100 loop
> >         range_start := 1 + (i * 10000);
> >         range_end := range_start + 10000;
> >         partition_name := 'tcv_scene_datas_' || LPAD(i::TEXT, 3, '0');
> >         execute format(
> >             'create table %I partition of tcv_scene_datas for values from (%s) to (%s)',
> >             partition_name,
> >             range_start,
> >             range_end
> >         );
> >     end loop;
> > end $$;
> > insert into tcv_scene_datas(cv_scene_id) select id from generate_series(1,1_000_000) id;
>
> crash.sql:
> > \c segtest
> > with ids as (select (random()*1_000_000)::int id from generate_series(1,1000))
> > update tcv_scene_datas set cv_scene_id=cv_scene_id where cv_scene_id in(select id from ids);
>
> Launch crash.sql in 16 threads of infinite loops:
> > seq 16 | xargs -P 16 -I {} sh -c 'while true; do psql -f crash.sql; done'
>
> In 1-2 minutes, 5 processes died with segfault.

Perfect. Thank you.

It seems to be some more forgotten EPQ stuff from d47cbf474.  Amit got
some of these in 8741e48e5, but evidently the test case didn't do
pruning during execution, (only init plan pruning) so the partition
directory wasn't needed.

The attached seems to fix it for me.

David

Вложения

Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
David Rowley
Дата:
On Thu, 16 Oct 2025 at 11:45, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 16 Oct 2025 at 03:21, Yuri Zamyatin <yuri@yrz.am> wrote:
> > In 1-2 minutes, 5 processes died with segfault.
>
> Perfect. Thank you.
>
> It seems to be some more forgotten EPQ stuff from d47cbf474.  Amit got
> some of these in 8741e48e5, but evidently the test case didn't do
> pruning during execution, (only init plan pruning) so the partition
> directory wasn't needed.

I forgot to mention, this isn't the same thing as the
tts_minimal_store_tuple() issue you first reported, so if there is a
problem there, this one has nothing to do with it.

Any chance of a self-contained test case for the enable_hashagg=on crash?

David



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Yuri Zamyatin
Дата:
Thank you very much.

I just tested pruning on the original case with the patch
you sent and confirm segfaults went away.

Regarding hash aggregation, I'll try to find a test case and
follow up within a day or so (cloning a huge db right now).



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Amit Langote
Дата:
On Thu, Oct 16, 2025 at 7:51 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 16 Oct 2025 at 11:45, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Thu, 16 Oct 2025 at 03:21, Yuri Zamyatin <yuri@yrz.am> wrote:
> > > In 1-2 minutes, 5 processes died with segfault.

Thanks Yuri for the report and the test case.

> >
> > Perfect. Thank you.
> >
> > It seems to be some more forgotten EPQ stuff from d47cbf474.  Amit got
> > some of these in 8741e48e5, but evidently the test case didn't do
> > pruning during execution, (only init plan pruning) so the partition
> > directory wasn't needed.
>
> I forgot to mention, this isn't the same thing as the
> tts_minimal_store_tuple() issue you first reported, so if there is a
> problem there, this one has nothing to do with it.

Thanks again, David.

I've attached an updated patch with a test case.

--
Thanks, Amit Langote

Вложения

Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
David Rowley
Дата:
On Thu, 16 Oct 2025 at 16:29, Amit Langote <amitlangote09@gmail.com> wrote:
> I've attached an updated patch with a test case.

Looks good to me. Nice simple test.

David



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Amit Langote
Дата:
On Thu, Oct 16, 2025 at 1:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 16 Oct 2025 at 16:29, Amit Langote <amitlangote09@gmail.com> wrote:
> > I've attached an updated patch with a test case.
>
> Looks good to me. Nice simple test.

Thanks for checking. Pushed.

--
Thanks, Amit Langote



Re: BUG #19078: Segfaults in tts_minimal_store_tuple() following pg_upgrade

От
Jeff Davis
Дата:
On Thu, 2025-10-16 at 11:51 +1300, David Rowley wrote:
> I forgot to mention, this isn't the same thing as the
> tts_minimal_store_tuple() issue you first reported, so if there is a
> problem there, this one has nothing to do with it.

I investigated, but came up empty so far. Any additional info on the
hashagg crash would be appreciated.

I appended my raw notes below in case someone notices a mistake.

Regards,
    Jeff Davis


Raw notes:

* Somehow entry->firstTuple==0x1b, which is obviously wrong.

* The entry structure lives in the bucket array, allocated in the
metacxt using MCXT_ALLOC_ZERO, so there's no uninitialized memory
floating around in the bucket array.

* The metacxt (aggstate->hash_metacxt) is an AllocSet, and it's never
reset. It contains the bucket array as well as some ExprStates and an
ExprContext for evaluating hash functions.

* Hash entries are never deleted, but between batches the entire hash
table is reset (which memsets the entire bucket array to zero).

* The entry->firstTuple is assigned only in one place, from
ExecCopySlotMinimalTupleExtra(). The 'extra' argument is a multiple of
16.

* ExecCopySlotMinimalTupleExtra() does some interesting pointer math,
but I didn't find any path that could plausibly return something like
0x1b. The memory is allocated with palloc/palloc0, which cannot return
zero, and 0x1b is not a multiple of 16 so seems unrelated to the extra
argument.

* JIT does not seem to be involved, because it's going through
ExecInterpExpr().

* When the hash table grows, it invalidates previously-returned entry
pointers. But, given the site of the crash, I don't see that as a
problem in this case.