Обсуждение: Missing CFI in hlCover()?
Greetings, I'm looking into an issue that we're seeing on the PG archives server with runaway queries that don't seem to ever want to end- and ignore signals. This is PG11, 11.8-1.pgdg100+1 specifically on Debian/buster and what we're seeing is the loop in hlCover() (wparser_def.c:2071 to 2093) is lasting an awful long time without any CFI call. It's possible the CFI call should actually go elsewhere, but the complete lack of any CFI in wparser_def.c or tsvector_op.c seems a bit concerning. I'm suspicious there's something else going on here that's causing this to take a long time but I don't have any smoking gun as yet. Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > I'm looking into an issue that we're seeing on the PG archives server > with runaway queries that don't seem to ever want to end- and ignore > signals. > This is PG11, 11.8-1.pgdg100+1 specifically on Debian/buster and what > we're seeing is the loop in hlCover() (wparser_def.c:2071 to 2093) is > lasting an awful long time without any CFI call. It's possible the CFI > call should actually go elsewhere, but the complete lack of any CFI in > wparser_def.c or tsvector_op.c seems a bit concerning. > I'm suspicious there's something else going on here that's causing this > to take a long time but I don't have any smoking gun as yet. Hm. I'd vote for a CFI within the recursion in TS_execute(), if there's not one there yet. Maybe hlFirstIndex needs one too --- if there are a lot of words in the query, maybe that could be slow? Did you pin the blame down any more precisely than hlCover? regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I'm looking into an issue that we're seeing on the PG archives server > > with runaway queries that don't seem to ever want to end- and ignore > > signals. > > > This is PG11, 11.8-1.pgdg100+1 specifically on Debian/buster and what > > we're seeing is the loop in hlCover() (wparser_def.c:2071 to 2093) is > > lasting an awful long time without any CFI call. It's possible the CFI > > call should actually go elsewhere, but the complete lack of any CFI in > > wparser_def.c or tsvector_op.c seems a bit concerning. > > > I'm suspicious there's something else going on here that's causing this > > to take a long time but I don't have any smoking gun as yet. > > Hm. I'd vote for a CFI within the recursion in TS_execute(), if there's > not one there yet. Maybe hlFirstIndex needs one too --- if there are > a lot of words in the query, maybe that could be slow? Did you pin the > blame down any more precisely than hlCover? I've definitely seen hlFirstIndex take a few seconds to run (while running this under gdb and stepping through), so that could be a good choice to place one (perhaps even additionally to this...). I have to admit to wondering if we shouldn't consider having one in check_stack_depth() to try and reduce the risk of us forgetting to have one in sensible places, though I've not really looked at all the callers and that might not be reasonable in some cases (though I wonder if maybe we consider having a 'default' version that has a CFI, and an alternate that doesn't...). The depth of recursion for TS_execute_recurse() is actually not all that bad either though (only 6 or so, as the query string here is: "ERROR: The required file is not available"), so maybe that isn't really the right thing to be thinking here. Down in checkcondition_HL(), checkval->len is 213601, and it seems to pretty much always end up with a result of TS_NO, but doesn't seem to take all that long to arrive at that. Over in hlFirstIndex(): hlFirstIndex (prs=0x7ffc2d4b16c0, prs=0x7ffc2d4b16c0, pos=219518, query=0x559619f81528) at ./build/../src/backend/tsearch/wparser_def.c:2013 2013 hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) (gdb) n 2026 if (item->type == QI_VAL && (gdb) 2029 item++; (gdb) p pos $72 = 219518 (gdb) p prs->curwords $73 = 583766 (gdb) Here's a full backtrace down to the checkcondition_HL(): (gdb) i s #0 checkcondition_HL (opaque=0x7ffc2d4b11f0, val=0x559619f815c0, data=0x0) at ./build/../src/backend/tsearch/wparser_def.c:1981 #1 0x0000559617eced2b in TS_execute_recurse (curitem=0x559619f815c0, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1872 #2 0x0000559617ecedd1 in TS_execute_recurse(curitem=0x559619f815a8, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, chkcond=chkcond@entry=0x559617df0120<checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #3 0x0000559617ecedd1 in TS_execute_recurse (curitem=0x559619f81590, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #4 0x0000559617ecedd1 in TS_execute_recurse (curitem=0x559619f81578, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #5 0x0000559617ecedd1 in TS_execute_recurse (curitem=0x559619f81560, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #6 0x0000559617ecedd1 in TS_execute_recurse (curitem=0x559619f81548, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #7 0x0000559617ecedd1 in TS_execute_recurse (curitem=curitem@entry=0x559619f81530, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1892 #8 0x0000559617ed26d9 in TS_execute (curitem=curitem@entry=0x559619f81530, arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0,chkcond=chkcond@entry=0x559617df0120 <checkcondition_HL>) at ./build/../src/backend/utils/adt/tsvector_op.c:1854 #9 0x0000559617df041e in hlCover (prs=prs@entry=0x7ffc2d4b16c0, query=query@entry=0x559619f81528, p=p@entry=0x7ffc2d4b12a0,q=q@entry=0x7ffc2d4b12a4) at ./build/../src/backend/tsearch/wparser_def.c:2075 #10 0x0000559617df1a2din mark_hl_words (max_words=35, min_words=15, shortword=3, highlightall=<optimized out>, query=<optimizedout>, prs=0x7ffc2d4b16c0) at ./build/../src/backend/tsearch/wparser_def.c:2393 #11 prsd_headline (fcinfo=<optimized out>) at ./build/../src/backend/tsearch/wparser_def.c:2614 #12 0x0000559617f0cdab in FunctionCall3Coll (flinfo=flinfo@entry=0x559619fe1d90, collation=collation@entry=0, arg1=arg1@entry=140721068381888,arg2=<optimized out>, arg3=arg3@entry=94103169144104) at ./build/../src/backend/utils/fmgr/fmgr.c:1170 #13 0x0000559617def48b in ts_headline_byid_opt (fcinfo=fcinfo@entry=0x7ffc2d4b1740) at ./build/../src/backend/tsearch/wparser.c:336 #14 0x0000559617f0c484 in DirectFunctionCall4Coll (func=0x559617def380 <ts_headline_byid_opt>, collation=<optimized out>,arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=<optimized out>) at ./build/../src/backend/utils/fmgr/fmgr.c:877 #15 0x0000559617c8ad41 in ExecInterpExpr (state=0x559619f9b170, econtext=0x559619fd2ae0, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:678 #16 0x0000559617cb51da in ExecEvalExprSwitchContext (isNull=0x7ffc2d4b1b97, econtext=0x559619fd2ae0, state=0x559619f9b170)at ./build/../src/include/executor/executor.h:313 #17 ExecProject (projInfo=0x559619f9b168) at ./build/../src/include/executor/executor.h:347 #18 ExecResult (pstate=<optimized out>) at ./build/../src/backend/executor/nodeResult.c:136 #19 0x0000559617c955d9 in ExecProcNodeInstr (node=0x559619fd29d0) at ./build/../src/backend/executor/execProcnode.c:461 #20 0x0000559617cadc08 in ExecProcNode (node=0x559619fd29d0) at ./build/../src/include/executor/executor.h:247 #21 ExecLimit (pstate=0x559619fd27e0) at ./build/../src/backend/executor/nodeLimit.c:149 #22 0x0000559617c955d9 in ExecProcNodeInstr (node=0x559619fd27e0) at ./build/../src/backend/executor/execProcnode.c:461 #23 0x0000559617c8e37b in ExecProcNode (node=0x559619fd27e0) at ./build/../src/include/executor/executor.h:247 #24 ExecutePlan (execute_once=<optimized out>, dest=0x559618187460 <donothingDR>, direction=<optimized out>, numberTuples=0,sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x559619fd27e0, estate=0x559619fd25d0) at ./build/../src/backend/executor/execMain.c:1723 #25 standard_ExecutorRun (queryDesc=0x559619f92af8, direction=<optimized out>, count=0, execute_once=<optimized out>) at./build/../src/backend/executor/execMain.c:364 #26 0x00007f607e5e5045 in pgss_ExecutorRun (queryDesc=0x559619f92af8, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:892 #27 0x0000559617c251d6 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x559619f937e8, into=into@entry=0x0, es=es@entry=0x559619fda528, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., params=params@entry=0x0, queryEnv=<optimized out>, planduration=0x7ffc2d4b1e70)at ./build/../src/backend/commands/explain.c:536 #28 0x0000559617c254ee in ExplainOneQuery (queryEnv=<optimized out>, params=0x0, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., es=0x559619fda528, into=0x0, cursorOptions=<optimized out>, query=<optimized out>) at ./build/../src/backend/commands/explain.c:372 #29 ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0, es=0x559619fda528, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., params=0x0, queryEnv=0x0) at ./build/../src/backend/commands/explain.c:340 #30 0x0000559617c25a2d in ExplainQuery (pstate=pstate@entry=0x559619f8c0e8, stmt=stmt@entry=0x55961a045650, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x559619f8c058)at ./build/../src/backend/commands/explain.c:255 #31 0x0000559617deb94b in standard_ProcessUtility (pstmt=pstmt@entry=0x55961a045700, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,dest=dest@entry=0x559619f8c058, completionTag=0x7ffc2d4b22b0 "") at ./build/../src/backend/tcop/utility.c:675 #32 0x00007f607e5e7347 in pgss_ProcessUtility (pstmt=0x55961a045700, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x559619f8c058, completionTag=0x7ffc2d4b22b0"") at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1005 #33 0x0000559617de83d9 in PortalRunUtility (portal=0x559619f33de0, pstmt=0x55961a045700, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x559619f8c058, completionTag=0x7ffc2d4b22b0 "") at ./build/../src/backend/tcop/pquery.c:1178 #34 0x0000559617de91ea in FillPortalStore (portal=0x559619f33de0, isTopLevel=<optimized out>) at ./build/../src/include/nodes/pg_list.h:79 #35 0x0000559617de9dc7 in PortalRun (portal=portal@entry=0x559619f33de0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,run_once=run_once@entry=true, dest=dest@entry=0x559619fd7c08, altdest=altdest@entry=0x559619fd7c08, completionTag=0x7ffc2d4b24f0 "") at ./build/../src/backend/tcop/pquery.c:768 #36 0x0000559617de5a3e in exec_simple_query ( query_string=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt , plainto_tsquery('pub"...) at ./build/../src/backend/tcop/postgres.c:1145 #37 0x0000559617de72a6 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x559619efa418, dbname=<optimized out>, username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4193 #38 0x0000559617d709e2 in BackendRun (port=0x559619eee4c0) at ./build/../src/backend/postmaster/postmaster.c:4364 #39 BackendStartup (port=0x559619eee4c0) at ./build/../src/backend/postmaster/postmaster.c:4036 #40 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1707 #41 0x0000559617d71886 in PostmasterMain (argc=5, argv=0x559619e9e930) at ./build/../src/backend/postmaster/postmaster.c:1380 #42 0x0000559617aecdc9 in main (argc=5, argv=0x559619e9e930) at ./build/../src/backend/main/main.c:228 and down to the hlFirstIndex(): #0 hlFirstIndex (prs=0x7ffc2d4b16c0, prs=0x7ffc2d4b16c0, pos=219518, query=0x559619f81528) at ./build/../src/backend/tsearch/wparser_def.c:2029 #1 hlCover (prs=prs@entry=0x7ffc2d4b16c0, query=query@entry=0x559619f81528, p=p@entry=0x7ffc2d4b12a0, q=q@entry=0x7ffc2d4b12a4)at ./build/../src/backend/tsearch/wparser_def.c:2083 #2 0x0000559617df1a2d in mark_hl_words (max_words=35, min_words=15, shortword=3, highlightall=<optimized out>, query=<optimizedout>, prs=0x7ffc2d4b16c0) at ./build/../src/backend/tsearch/wparser_def.c:2393 #3 prsd_headline (fcinfo=<optimized out>) at ./build/../src/backend/tsearch/wparser_def.c:2614 #4 0x0000559617f0cdab in FunctionCall3Coll (flinfo=flinfo@entry=0x559619fe1d90, collation=collation@entry=0, arg1=arg1@entry=140721068381888,arg2=<optimized out>, arg3=arg3@entry=94103169144104) at ./build/../src/backend/utils/fmgr/fmgr.c:1170 #5 0x0000559617def48b in ts_headline_byid_opt (fcinfo=fcinfo@entry=0x7ffc2d4b1740) at ./build/../src/backend/tsearch/wparser.c:336 #6 0x0000559617f0c484 in DirectFunctionCall4Coll (func=0x559617def380 <ts_headline_byid_opt>, collation=<optimized out>,arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=<optimized out>) at ./build/../src/backend/utils/fmgr/fmgr.c:877 #7 0x0000559617c8ad41 in ExecInterpExpr (state=0x559619f9b170, econtext=0x559619fd2ae0, isnull=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:678 #8 0x0000559617cb51da in ExecEvalExprSwitchContext (isNull=0x7ffc2d4b1b97, econtext=0x559619fd2ae0, state=0x559619f9b170)at ./build/../src/include/executor/executor.h:313 #9 ExecProject (projInfo=0x559619f9b168) at ./build/../src/include/executor/executor.h:347 #10 ExecResult (pstate=<optimized out>) at ./build/../src/backend/executor/nodeResult.c:136 #11 0x0000559617c955d9 in ExecProcNodeInstr (node=0x559619fd29d0) at ./build/../src/backend/executor/execProcnode.c:461 #12 0x0000559617cadc08 in ExecProcNode (node=0x559619fd29d0) at ./build/../src/include/executor/executor.h:247 #13 ExecLimit (pstate=0x559619fd27e0) at ./build/../src/backend/executor/nodeLimit.c:149 #14 0x0000559617c955d9 in ExecProcNodeInstr (node=0x559619fd27e0) at ./build/../src/backend/executor/execProcnode.c:461 #15 0x0000559617c8e37b in ExecProcNode (node=0x559619fd27e0) at ./build/../src/include/executor/executor.h:247 #16 ExecutePlan (execute_once=<optimized out>, dest=0x559618187460 <donothingDR>, direction=<optimized out>, numberTuples=0,sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x559619fd27e0, estate=0x559619fd25d0) at ./build/../src/backend/executor/execMain.c:1723 #17 standard_ExecutorRun (queryDesc=0x559619f92af8, direction=<optimized out>, count=0, execute_once=<optimized out>) at./build/../src/backend/executor/execMain.c:364 #18 0x00007f607e5e5045 in pgss_ExecutorRun (queryDesc=0x559619f92af8, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:892 #19 0x0000559617c251d6 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x559619f937e8, into=into@entry=0x0, es=es@entry=0x559619fda528, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., params=params@entry=0x0, queryEnv=<optimized out>, planduration=0x7ffc2d4b1e70)at ./build/../src/backend/commands/explain.c:536 #20 0x0000559617c254ee in ExplainOneQuery (queryEnv=<optimized out>, params=0x0, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., es=0x559619fda528, into=0x0, cursorOptions=<optimized out>, query=<optimized out>) at ./build/../src/backend/commands/explain.c:372 #21 ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0, es=0x559619fda528, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., params=0x0, queryEnv=0x0) at ./build/../src/backend/commands/explain.c:340 #22 0x0000559617c25a2d in ExplainQuery (pstate=pstate@entry=0x559619f8c0e8, stmt=stmt@entry=0x55961a045650, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x559619f8c058)at ./build/../src/backend/commands/explain.c:255 #23 0x0000559617deb94b in standard_ProcessUtility (pstmt=pstmt@entry=0x55961a045700, queryString=queryString@entry=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti,plainto_tsquery('public.pg', 'ERROR: The required file is not available')), ts _headline(bodytxt, plainto_tsquery('pub"..., context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,dest=dest@entry=0x559619f8c058, completionTag=0x7ffc2d4b22b0 "") at ./build/../src/backend/tcop/utility.c:675 #24 0x00007f607e5e7347 in pgss_ProcessUtility (pstmt=0x55961a045700, queryString=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt, plainto_tsquery('pub"..., context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x559619f8c058, completionTag=0x7ffc2d4b22b0"") at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:1005 #25 0x0000559617de83d9 in PortalRunUtility (portal=0x559619f33de0, pstmt=0x55961a045700, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x559619f8c058, completionTag=0x7ffc2d4b22b0 "") at ./build/../src/backend/tcop/pquery.c:1178 #26 0x0000559617de91ea in FillPortalStore (portal=0x559619f33de0, isTopLevel=<optimized out>) at ./build/../src/include/nodes/pg_list.h:79 #27 0x0000559617de9dc7 in PortalRun (portal=portal@entry=0x559619f33de0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,run_once=run_once@entry=true, dest=dest@entry=0x559619fd7c08, altdest=altdest@entry=0x559619fd7c08, completionTag=0x7ffc2d4b24f0 "") at ./build/../src/backend/tcop/pquery.c:768 #28 0x0000559617de5a3e in exec_simple_query ( query_string=0x559619ea3e80 "explain (analyze, buffers) SELECT messageid, date, subject, _from, ts_rank_cd(fti, plainto_tsquery('public.pg','ERROR: The required file is not available')), ts_headline(bodytxt , plainto_tsquery('pub"...) at ./build/../src/backend/tcop/postgres.c:1145 #29 0x0000559617de72a6 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x559619efa418, dbname=<optimized out>, username=<optimizedout>) at ./build/../src/backend/tcop/postgres.c:4193 #30 0x0000559617d709e2 in BackendRun (port=0x559619eee4c0) at ./build/../src/backend/postmaster/postmaster.c:4364 #31 BackendStartup (port=0x559619eee4c0) at ./build/../src/backend/postmaster/postmaster.c:4036 #32 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1707 #33 0x0000559617d71886 in PostmasterMain (argc=5, argv=0x559619e9e930) at ./build/../src/backend/postmaster/postmaster.c:1380 #34 0x0000559617aecdc9 in main (argc=5, argv=0x559619e9e930) at ./build/../src/backend/main/main.c:228 Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Hm. I'd vote for a CFI within the recursion in TS_execute(), if there's >> not one there yet. Maybe hlFirstIndex needs one too --- if there are >> a lot of words in the query, maybe that could be slow? Did you pin the >> blame down any more precisely than hlCover? > I've definitely seen hlFirstIndex take a few seconds to run (while > running this under gdb and stepping through), so that could be a good > choice to place one (perhaps even additionally to this...). I have to > admit to wondering if we shouldn't consider having one in > check_stack_depth() to try and reduce the risk of us forgetting to have > one in sensible places, though I've not really looked at all the callers > and that might not be reasonable in some cases (though I wonder if maybe > we consider having a 'default' version that has a CFI, and an alternate > that doesn't...). Adding it to check_stack_depth doesn't really seem like a reasonable proposal to me; aside from failing to separate concerns, running a long time is quite distinct from taking a lot of stack. The reason I'm eyeing TS_execute is that it involves callbacks to functions that might be pretty complex in themselves, eg during index scans. So that would guard a lot of territory besides hlCover. But hlFirstIndex could use a CFI too, if you've seen it take that long. (I wonder if we need to try to make it faster. I'd supposed that the loop was cheap enough to be a non-problem, but with large enough documents maybe not? It seems like converting to a hash table could be worthwhile for a large doc.) regards, tom lane
I tried to duplicate a multiple-second ts_headline call here, and failed to, so there must be something I'm missing. Can you provide a concrete example? I'd like to do some analysis with perf. regards, tom lane
I wrote: > (I wonder if we need to try to make it faster. I'd supposed that the > loop was cheap enough to be a non-problem, but with large enough > documents maybe not? It seems like converting to a hash table could > be worthwhile for a large doc.) OK, I dug into Stephen's test case off-list. While some CFIs would be a good idea, that's just band-aid'ing the symptom. The actual problem is that hlCover() is taking way too much time. The test case boils down to "common_word & rare_word" matched to a very long document, wherein the rare_word appears only near the front. Once we're past that match, hlCover() tries all the remaining matches for common_word, of which there are plenty ... and for each one, it scans clear to the end of the document, looking vainly for a substring that will satisfy the "common_word & rare_word" query. So obviously, this is O(N^2) in the length of the document :-(. I have to suppose that I introduced this problem in c9b0c678d, since AFAIR we weren't getting ts_headline-takes-forever complaints before that. It's not immediately obvious why the preceding algorithm doesn't have a similar issue, but then there's not anything at all that was obvious about the preceding algorithm. The most obvious way to fix the O(N^2) hazard is to put a limit on the length of "cover" (matching substring) that we'll consider. For the mark_hl_words caller, we won't highlight more than max_words words anyway, so it would be reasonable to bound covers to that length or some small multiple of it. The other caller mark_hl_fragments is willing to highlight up to max_fragments of up to max_words each, and there can be some daylight between the fragments, so it's not quite clear what the longest reasonable match is. Still, I doubt it's useful to show a headline consisting of a few words from the start of the document and a few words from thousands of words later, so a limit of max_fragments times max_words times something would probably be reasonable. We could hard-code a rule like that, or we could introduce a new explicit parameter for the maximum cover length. The latter would be more flexible, but we need something back-patchable and I'm concerned about the compatibility hazards of adding a new parameter in minor releases. So on the whole I propose hard-wiring a multiplier of, say, 10 for both these cases. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > We could hard-code a rule like that, or we could introduce a new > explicit parameter for the maximum cover length. The latter would be > more flexible, but we need something back-patchable and I'm concerned > about the compatibility hazards of adding a new parameter in minor > releases. So on the whole I propose hard-wiring a multiplier of, > say, 10 for both these cases. That sounds alright to me, though I do think we should probably still toss a CFI (or two) in this path somewhere as we don't know how long some of these functions might take... Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> We could hard-code a rule like that, or we could introduce a new >> explicit parameter for the maximum cover length. The latter would be >> more flexible, but we need something back-patchable and I'm concerned >> about the compatibility hazards of adding a new parameter in minor >> releases. So on the whole I propose hard-wiring a multiplier of, >> say, 10 for both these cases. > That sounds alright to me, though I do think we should probably still > toss a CFI (or two) in this path somewhere as we don't know how long > some of these functions might take... Yeah, of course. I'm still leaning to doing that in TS_execute_recurse. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> We could hard-code a rule like that, or we could introduce a new > >> explicit parameter for the maximum cover length. The latter would be > >> more flexible, but we need something back-patchable and I'm concerned > >> about the compatibility hazards of adding a new parameter in minor > >> releases. So on the whole I propose hard-wiring a multiplier of, > >> say, 10 for both these cases. > > > That sounds alright to me, though I do think we should probably still > > toss a CFI (or two) in this path somewhere as we don't know how long > > some of these functions might take... > > Yeah, of course. I'm still leaning to doing that in TS_execute_recurse. Works for me. Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Stephen Frost <sfrost@snowman.net> writes: >>> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>>> We could hard-code a rule like that, or we could introduce a new >>>> explicit parameter for the maximum cover length. The latter would be >>>> more flexible, but we need something back-patchable and I'm concerned >>>> about the compatibility hazards of adding a new parameter in minor >>>> releases. So on the whole I propose hard-wiring a multiplier of, >>>> say, 10 for both these cases. >>> That sounds alright to me, though I do think we should probably still >>> toss a CFI (or two) in this path somewhere as we don't know how long >>> some of these functions might take... >> Yeah, of course. I'm still leaning to doing that in TS_execute_recurse. > Works for me. Here's a proposed patch along that line. regards, tom lane diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c index 76b6f9aef0..1df1c0362d 100644 --- a/src/backend/tsearch/wparser_def.c +++ b/src/backend/tsearch/wparser_def.c @@ -2028,8 +2028,10 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) /* * hlCover: try to find a substring of prs' word list that satisfies query * - * At entry, *p must be the first word index to consider (initialize this to - * zero, or to the next index after a previous successful search). + * At entry, *p must be the first word index to consider (initialize this + * to zero, or to the next index after a previous successful search). + * We will consider all substrings starting at or after that word, and + * containing no more than max_cover words. * * On success, sets *p to first word index and *q to last word index of the * cover substring, and returns true. @@ -2038,7 +2040,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) * words used in the query. */ static bool -hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q) +hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover, + int *p, int *q) { int pmin, pmax, @@ -2084,7 +2087,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q) nextpmin = nextpmax; pmax = nextpmax; } - while (pmax >= 0); + while (pmax >= 0 && pmax - pmin < max_cover); /* No luck here, so try next feasible startpoint */ pmin = nextpmin; } @@ -2186,7 +2189,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos, static void mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, int shortword, int min_words, - int max_words, int max_fragments) + int max_words, int max_fragments, int max_cover) { int32 poslen, curlen, @@ -2213,7 +2216,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, covers = palloc(maxcovers * sizeof(CoverPos)); /* get all covers */ - while (hlCover(prs, query, &p, &q)) + while (hlCover(prs, query, max_cover, &p, &q)) { startpos = p; endpos = q; @@ -2368,7 +2371,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, */ static void mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall, - int shortword, int min_words, int max_words) + int shortword, int min_words, int max_words, int max_cover) { int p = 0, q = 0; @@ -2386,7 +2389,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall, if (!highlightall) { /* examine all covers, select a headline using the best one */ - while (hlCover(prs, query, &p, &q)) + while (hlCover(prs, query, max_cover, &p, &q)) { /* * Count words (curlen) and interesting words (poslen) within @@ -2542,6 +2545,7 @@ prsd_headline(PG_FUNCTION_ARGS) int shortword = 3; int max_fragments = 0; bool highlightall = false; + int max_cover; ListCell *l; /* Extract configuration option values */ @@ -2581,6 +2585,15 @@ prsd_headline(PG_FUNCTION_ARGS) defel->defname))); } + /* + * We might eventually make max_cover a user-settable parameter, but for + * now, just compute a reasonable value based on max_words and + * max_fragments. + */ + max_cover = Max(max_words * 10, 100); + if (max_fragments > 0) + max_cover *= max_fragments; + /* in HighlightAll mode these parameters are ignored */ if (!highlightall) { @@ -2605,10 +2618,10 @@ prsd_headline(PG_FUNCTION_ARGS) /* Apply appropriate headline selector */ if (max_fragments == 0) mark_hl_words(prs, query, highlightall, shortword, - min_words, max_words); + min_words, max_words, max_cover); else mark_hl_fragments(prs, query, highlightall, shortword, - min_words, max_words, max_fragments); + min_words, max_words, max_fragments, max_cover); /* Fill in default values for string options */ if (!prs->startsel) diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index f01b1ee253..756a48a167 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -1868,6 +1868,9 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags, /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); + /* ... and let's check for query cancel while we're at it */ + CHECK_FOR_INTERRUPTS(); + if (curitem->type == QI_VAL) return chkcond(arg, (QueryOperand *) curitem, NULL /* don't need position info */ );
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Stephen Frost <sfrost@snowman.net> writes: > >>> * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >>>> We could hard-code a rule like that, or we could introduce a new > >>>> explicit parameter for the maximum cover length. The latter would be > >>>> more flexible, but we need something back-patchable and I'm concerned > >>>> about the compatibility hazards of adding a new parameter in minor > >>>> releases. So on the whole I propose hard-wiring a multiplier of, > >>>> say, 10 for both these cases. > > >>> That sounds alright to me, though I do think we should probably still > >>> toss a CFI (or two) in this path somewhere as we don't know how long > >>> some of these functions might take... > > >> Yeah, of course. I'm still leaning to doing that in TS_execute_recurse. > > > Works for me. > > Here's a proposed patch along that line. I've back-patched this to 11 (which was just a bit of fuzz) and tested it out with a couple of different queries that were causing issues previously on the archive server, and they finish in a much more reasonable time and react faster to cancel requests/signals. If you'd like to play with it more, the PG11 installed on ark2 now has this patch built into it. So, looks good to me, and would certainly be nice to get this into the next set of releases, so the archive server doesn't get stuck anymore. :) Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Here's a proposed patch along that line. > I've back-patched this to 11 (which was just a bit of fuzz) and tested > it out with a couple of different queries that were causing issues > previously on the archive server, and they finish in a much more > reasonable time and react faster to cancel requests/signals. Yeah, I'd tried this locally using the data from the one test case you showed me, and it seemed to fix that. > So, looks good to me, and would certainly be nice to get this into the > next set of releases, so the archive server doesn't get stuck anymore. I'll push this tomorrow if nobody has objected to it. BTW, I had noticed last night that hlFirstIndex is being unreasonably stupid. Many of the "words" have null item pointers and hence can't possibly match any query item (I think because we have "words" for inter-word spaces/punctuation as well as the actual words). Checking that, as in the attached v2 patch, makes things a bit faster yet. regards, tom lane diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c index 76b6f9aef0..92517e4c17 100644 --- a/src/backend/tsearch/wparser_def.c +++ b/src/backend/tsearch/wparser_def.c @@ -2011,13 +2011,18 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) for (i = pos; i < prs->curwords; i++) { /* ... scan the query to see if this word matches any operand */ - QueryItem *item = GETQUERY(query); + QueryOperand *worditem = prs->words[i].item; + QueryItem *item; int j; + if (worditem == NULL) + continue; /* certainly not a match */ + + item = GETQUERY(query); for (j = 0; j < query->size; j++) { if (item->type == QI_VAL && - prs->words[i].item == &item->qoperand) + worditem == &item->qoperand) return i; item++; } @@ -2028,8 +2033,14 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) /* * hlCover: try to find a substring of prs' word list that satisfies query * - * At entry, *p must be the first word index to consider (initialize this to - * zero, or to the next index after a previous successful search). + * At entry, *p must be the first word index to consider (initialize this + * to zero, or to the next index after a previous successful search). + * We will consider all substrings starting at or after that word, and + * containing no more than max_cover words. (We need a length limit to + * keep this from taking O(N^2) time for a long document with many query + * words but few complete matches. Actually, since checkcondition_HL is + * roughly O(N) in the length of the substring being checked, it's even + * worse than that.) * * On success, sets *p to first word index and *q to last word index of the * cover substring, and returns true. @@ -2038,7 +2049,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos) * words used in the query. */ static bool -hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q) +hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover, + int *p, int *q) { int pmin, pmax, @@ -2084,7 +2096,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q) nextpmin = nextpmax; pmax = nextpmax; } - while (pmax >= 0); + while (pmax >= 0 && pmax - pmin < max_cover); /* No luck here, so try next feasible startpoint */ pmin = nextpmin; } @@ -2186,7 +2198,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos, static void mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, int shortword, int min_words, - int max_words, int max_fragments) + int max_words, int max_fragments, int max_cover) { int32 poslen, curlen, @@ -2213,7 +2225,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, covers = palloc(maxcovers * sizeof(CoverPos)); /* get all covers */ - while (hlCover(prs, query, &p, &q)) + while (hlCover(prs, query, max_cover, &p, &q)) { startpos = p; endpos = q; @@ -2368,7 +2380,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall, */ static void mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall, - int shortword, int min_words, int max_words) + int shortword, int min_words, int max_words, int max_cover) { int p = 0, q = 0; @@ -2386,7 +2398,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall, if (!highlightall) { /* examine all covers, select a headline using the best one */ - while (hlCover(prs, query, &p, &q)) + while (hlCover(prs, query, max_cover, &p, &q)) { /* * Count words (curlen) and interesting words (poslen) within @@ -2542,6 +2554,7 @@ prsd_headline(PG_FUNCTION_ARGS) int shortword = 3; int max_fragments = 0; bool highlightall = false; + int max_cover; ListCell *l; /* Extract configuration option values */ @@ -2581,6 +2594,15 @@ prsd_headline(PG_FUNCTION_ARGS) defel->defname))); } + /* + * We might eventually make max_cover a user-settable parameter, but for + * now, just compute a reasonable value based on max_words and + * max_fragments. + */ + max_cover = Max(max_words * 10, 100); + if (max_fragments > 0) + max_cover *= max_fragments; + /* in HighlightAll mode these parameters are ignored */ if (!highlightall) { @@ -2605,10 +2627,10 @@ prsd_headline(PG_FUNCTION_ARGS) /* Apply appropriate headline selector */ if (max_fragments == 0) mark_hl_words(prs, query, highlightall, shortword, - min_words, max_words); + min_words, max_words, max_cover); else mark_hl_fragments(prs, query, highlightall, shortword, - min_words, max_words, max_fragments); + min_words, max_words, max_fragments, max_cover); /* Fill in default values for string options */ if (!prs->startsel) diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index f01b1ee253..756a48a167 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -1868,6 +1868,9 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags, /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); + /* ... and let's check for query cancel while we're at it */ + CHECK_FOR_INTERRUPTS(); + if (curitem->type == QI_VAL) return chkcond(arg, (QueryOperand *) curitem, NULL /* don't need position info */ );
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Here's a proposed patch along that line. > > > I've back-patched this to 11 (which was just a bit of fuzz) and tested > > it out with a couple of different queries that were causing issues > > previously on the archive server, and they finish in a much more > > reasonable time and react faster to cancel requests/signals. > > Yeah, I'd tried this locally using the data from the one test case you > showed me, and it seemed to fix that. Good stuff. > > So, looks good to me, and would certainly be nice to get this into the > > next set of releases, so the archive server doesn't get stuck anymore. > > I'll push this tomorrow if nobody has objected to it. Sounds good. > BTW, I had noticed last night that hlFirstIndex is being unreasonably > stupid. Many of the "words" have null item pointers and hence can't > possibly match any query item (I think because we have "words" for > inter-word spaces/punctuation as well as the actual words). Checking > that, as in the attached v2 patch, makes things a bit faster yet. Nice, looks good to me. Thanks! Stephen