v14.0 segfaults on certain memoize query plans

Поиск
Список
Период
Сортировка
От Markus Zucker
Тема v14.0 segfaults on certain memoize query plans
Дата
Msg-id 1183ab53-6a4c-b798-5cfb-b33e71ddf0dd@enospc.net
обсуждение исходный текст
Ответы Re: v14.0 segfaults on certain memoize query plans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-bugs
Hi everyone,

I started to experience occasional segfaults after upgrading to Postgres 14.

The issue is semi-reproducible with the official Docker image and the attached DB dump:

postgres=# SELECT version();
                                                            version

-----------------------------------------------------------------------------------------------------------------------------
  PostgreSQL 14.0 (Debian 14.0-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110,

64-bit
(1 row)


Schema:

CREATE TABLE clusters (
     id bigint,
     dbt_valid_from timestamp with time zone,
     dbt_valid_to timestamp with time zone
);

CREATE TABLE ias (
     id bigint,
     event_id bigint,
     created_at timestamp with time zone
);


Query:

SELECT 1
FROM ias
LEFT JOIN clusters cs ON cs.id = ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @>
ias.created_at;


Log output:

2021-10-19 11:27:38.712 UTC [1] LOG:  server process (PID 97) was terminated by signal 11: Segmentation fault
2021-10-19 11:27:38.712 UTC [1] DETAIL:  Failed process was running: SELECT 1
         FROM debug.ias ias
         LEFT JOIN debug.clusters cs ON cs.id = ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @>

ias.created_at;


The problem only occurs if the query plan contains a "Memoize" node:

postgres=# EXPLAIN
SELECT 1
FROM public.ias
LEFT JOIN public.clusters cs ON cs.id = ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @> 
ias.created_at;
                                           QUERY PLAN
-----------------------------------------------------------------------------------------------
  Nested Loop Left Join  (cost=0.43..2303.35 rows=361 width=4)
    ->  Seq Scan on ias  (cost=0.00..6.61 rows=361 width=16)
    ->  Memoize  (cost=0.43..6.76 rows=1 width=24)
          Cache Key: ias.created_at, ias.event_id
          ->  Index Scan using ix_id on clusters cs  (cost=0.42..6.75 rows=1 width=24)
                Index Cond: (id = ias.event_id)
                Filter: (tstzrange(dbt_valid_from, dbt_valid_to, '[)'::text) @> ias.created_at)
(7 rows)


The query runs as expected with the following plan:

postgres=# EXPLAIN
SELECT 1
FROM public.ias
LEFT JOIN public.clusters cs ON cs.id = ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @> 
ias.created_at;
                                        QUERY PLAN
-----------------------------------------------------------------------------------------
  Nested Loop Left Join  (cost=0.42..38299.10 rows=1570 width=4)
    ->  Seq Scan on ias  (cost=0.00..25.70 rows=1570 width=16)
    ->  Index Scan using ix_id on clusters cs  (cost=0.42..24.34 rows=4 width=24)
          Index Cond: (id = ias.event_id)
          Filter: (tstzrange(dbt_valid_from, dbt_valid_to, '[)'::text) @> ias.created_at)
(5 rows)


Backtrace:

(gdb) bt full
#0  0x0000564ab6686d83 in pg_detoast_datum (datum=0x2707cb3db0b4a) at ./build/../src/backend/utils/fmgr/fmgr.c:1724
No locals.
#1  0x0000564ab660744c in hash_range (fcinfo=0x7ffd30af2910) at ./build/../src/backend/utils/adt/rangetypes.c:1314
         r = <optimized out>
         result = <optimized out>
         typcache = <optimized out>
         scache = <optimized out>
         lower = {val = 139813693468928, infinite = 88, inclusive = 55, lower = 135}
         upper = {val = 94878908251905, infinite = 224, inclusive = 40, lower = 175}
         empty = false
         flags = <optimized out>
         lower_hash = <optimized out>
         upper_hash = <optimized out>
         __func__ = "hash_range"
#2  0x0000564ab6685dcd in FunctionCall1Coll (flinfo=flinfo@entry=0x564ab7ac3240, collation=<optimized out>, 
arg1=<optimized out>) at ./build/../src/backend/utils/fmgr/fmgr.c:1138
         fcinfodata = {fcinfo = {flinfo = 0x564ab7ac3240, context = 0x0, resultinfo = 0x0, fncollation = 0, isnull = 
false, nargs = 1, args = 0x7ffd30af2930},
           fcinfo_data = "@2\254\267JV", '\000' <repeats 24 times>,
"\001\000J\v۳|p\002\000\000\000\000\000\000\000\000"}
         fcinfo = 0x7ffd30af2910
         result = <optimized out>
         __func__ = "FunctionCall1Coll"
         __errno_location = <optimized out>
#3  0x0000564ab6407f69 in MemoizeHash_hash (key=0x0, tb=<optimized out>, tb=<optimized out>) at 
./build/../src/backend/executor/nodeMemoize.c:175
         hkey = <optimized out>
         i = <optimized out>
         mstate = <optimized out>
         pslot = 0x564ab7ac3190
         hashkey = 0
         numkeys = 2
         hashfunctions = <optimized out>
         collations = <optimized out>
#4  0x0000564ab64086d0 in memoize_insert (key=0x0, found=<synthetic pointer>, tb=0x564ab7abf9c8) at 
./build/../src/include/lib/simplehash.h:758
         hash = <optimized out>
         hash = <optimized out>
#5  cache_lookup (found=<synthetic pointer>, mstate=0x564ab7a8be08) at
./build/../src/backend/executor/nodeMemoize.c:423
         key = <optimized out>
         entry = <optimized out>
         oldcontext = <optimized out>
         key = <optimized out>
         entry = <optimized out>
         oldcontext = <optimized out>
#6  ExecMemoize (pstate=0x564ab7a8be08) at ./build/../src/backend/executor/nodeMemoize.c:609
         entry = <optimized out>
         outerslot = <optimized out>
         found = <optimized out>
         node = 0x564ab7a8be08
         outerNode = <optimized out>
         slot = <optimized out>
         __func__ = "ExecMemoize"
#7  0x0000564ab640eb25 in ExecProcNode (node=0x564ab7a8be08) at ./build/../src/include/executor/executor.h:257
No locals.
#8  ExecNestLoop (pstate=0x564ab7a8b708) at ./build/../src/backend/executor/nodeNestloop.c:160
         node = <optimized out>
         nl = 0x564ab7ac8040
         innerPlan = <optimized out>
         outerPlan = 0x564ab7a8b8f8
         outerTupleSlot = <optimized out>
         innerTupleSlot = <optimized out>
         joinqual = 0x0
         otherqual = 0x0
         econtext = <optimized out>
         lc = <optimized out>
#9  0x0000564ab63e29cd in ExecProcNode (node=0x564ab7a8b708) at ./build/../src/include/executor/executor.h:257
No locals.
#10 ExecutePlan (execute_once=<optimized out>, dest=0x564ab7ac8ad0, direction=<optimized out>, numberTuples=0, 
sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x564ab7a8b708,
     estate=0x564ab7a8b478) at ./build/../src/backend/executor/execMain.c:1551
         slot = <optimized out>
         current_tuple_count = 0
         slot = <optimized out>
         current_tuple_count = <optimized out>
#11 standard_ExecutorRun (queryDesc=0x564ab79f3b08, direction=<optimized out>, count=0, execute_once=<optimized out>)
at
 
./build/../src/backend/executor/execMain.c:361
         estate = 0x564ab7a8b478
         operation = CMD_SELECT
         dest = 0x564ab7ac8ad0
         sendTuples = <optimized out>
         oldcontext = 0x564ab79f39f0
         __func__ = "standard_ExecutorRun"
#12 0x0000564ab655b61b in PortalRunSelect (portal=0x564ab7a39868, forward=<optimized out>, count=0, dest=<optimized 
out>) at ./build/../src/backend/tcop/pquery.c:919
         queryDesc = 0x564ab79f3b08
         direction = <optimized out>
         nprocessed = <optimized out>
         __func__ = "PortalRunSelect"
#13 0x0000564ab655caab in PortalRun (portal=portal@entry=0x564ab7a39868, count=count@entry=9223372036854775807, 
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x564ab7ac8ad0,
     altdest=altdest@entry=0x564ab7ac8ad0, qc=0x7ffd30af2d30) at ./build/../src/backend/tcop/pquery.c:763
         _save_exception_stack = 0x7ffd30af3020
         _save_context_stack = 0x0
         _local_sigjmp_buf = {{__jmpbuf = {1, 9035177361156350739, 94878908520552, 140725420240176, 94878909106896, 
94878908520552, 9035177361200390931, 3318141397421531923}, __mask_was_saved = 0, __saved_mask = {__val =
{94875827568640,
                 544, 140725420240039, 94878889478997, 94878908234224, 2, 2, 1, 94878908097440, 112, 2, 177,
94878908520552, 140725420240048, 94878887997419, 2}}}}
         _do_rethrow = <optimized out>
         result = <optimized out>
         nprocessed = <optimized out>
         saveTopTransactionResourceOwner = 0x564ab79fef60
         saveTopTransactionContext = 0x564ab79f7ea0
         saveActivePortal = 0x0
         saveResourceOwner = 0x564ab79fef60
         savePortalContext = 0x0
         saveMemoryContext = 0x564ab79f7ea0
         __func__ = "PortalRun"
#14 0x0000564ab6558bed in exec_simple_query (query_string=0x564ab79d24b8 "SELECT 1\nFROM public.ias ias\nLEFT JOIN 
public.clusters cs ON cs.id = ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @>
ias.created_at;")
     at ./build/../src/backend/tcop/postgres.c:1214
         snapshot_set = <optimized out>
         per_parsetree_context = 0x0
         plantree_list = <optimized out>
         parsetree = 0x564ab79d3d18
         commandTag = <optimized out>
         qc = {commandTag = CMDTAG_UNKNOWN, nprocessed = 0}
         querytree_list = <optimized out>
         portal = 0x564ab7a39868
         receiver = 0x564ab7ac8ad0
         format = 0
         parsetree_item__state = {l = 0x564ab79d3d48, i = <optimized out>}
         dest = DestRemote
         oldcontext = 0x564ab79f7ea0
         parsetree_list = 0x564ab79d3d48
         parsetree_item = <optimized out>
         save_log_statement_stats = false
         was_logged = false
         use_implicit_block = false
         msec_str = "`-\257\060\375\177\000\000\000 ", '\000' <repeats 14 times>, "Q\000\000\000\000\000\000"
         __func__ = "exec_simple_query"
#15 0x0000564ab655ab2c in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffd30af3200, dbname=<optimized out>, 
username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4486
         query_string = 0x564ab79d24b8 "SELECT 1\nFROM public.ias ias\nLEFT JOIN public.clusters cs ON cs.id = 
ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @> ias.created_at;"
         firstchar = <optimized out>
         input_message = {data = 0x564ab79d24b8 "SELECT 1\nFROM public.ias ias\nLEFT JOIN public.clusters cs ON cs.id =

ias.event_id AND tstzrange(cs.dbt_valid_from, cs.dbt_valid_to, '[)') @> ias.created_at;", len = 157, maxlen = 1024,
           cursor = 157}
         local_sigjmp_buf = {{__jmpbuf = {0, 9035177362068611859, 140725420240768, 1, 94878908274760, 3, 
9035177361124893459, 3318141400352432915}, __mask_was_saved = 1, __saved_mask = {__val = {4194304, 139637976727552, 
94878888117972,
                 140725420241212, 11760129689661931776, 140725420241264, 94878888147487, 206158430240, 140725420241280,

140725420241088, 11760129689661931776, 94878907979296, 0, 9, 94878908103080, 94878908103080}}}}
         send_ready_for_query = false
         idle_in_transaction_timeout_enabled = false
         idle_session_timeout_enabled = false
         __func__ = "PostgresMain"
#16 0x0000564ab64d8998 in BackendRun (port=<optimized out>, port=<optimized out>) at 
./build/../src/backend/postmaster/postmaster.c:4506
         av = {0x564ab66d5a57 "postgres", 0x0}
         ac = 1
         av = {<optimized out>, <optimized out>}
         ac = <optimized out>
#17 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4228
         bn = <optimized out>
         pid = <optimized out>
         bn = <optimized out>
         pid = <optimized out>
         __func__ = "BackendStartup"
         __errno_location = <optimized out>
         __errno_location = <optimized out>
         save_errno = <optimized out>
         __errno_location = <optimized out>
         __errno_location = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#18 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1745
         port = <optimized out>
         i = <optimized out>
         rmask = {fds_bits = {128, 0 <repeats 15 times>}}
         selres = <optimized out>
         now = <optimized out>
         readmask = {fds_bits = {224, 0 <repeats 15 times>}}
         nSockets = 8
         last_lockfile_recheck_time = 1634646999
         last_touch_time = 1634643984
         __func__ = "ServerLoop"
#19 0x0000564ab64d9804 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x564ab79c6d20) at 
./build/../src/backend/postmaster/postmaster.c:1417
         opt = <optimized out>
         status = <optimized out>
         userDoption = <optimized out>
         listen_addr_saved = true
         i = <optimized out>
         output_config_variable = <optimized out>
         __func__ = "PostmasterMain"
#20 0x0000564ab624ee29 in main (argc=1, argv=0x564ab79c6d20) at ./build/../src/backend/main/main.c:209
         do_check_root = <optimized out>

I hope this information is enough to isolate the problem. Thanks for looking into this!
Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Efrain J. Berdecia"
Дата:
Сообщение: Re: BUG #17229: Segmentation Fault after upgrading to version 13
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17236: Postgres core on pstate->p_multiassign_exprs