Re: Row pattern recognition

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: Row pattern recognition
Дата
Msg-id 20260215.180652.676783304925959084.ishii@postgresql.org
обсуждение исходный текст
Ответ на Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
Ответы Re: Row pattern recognition
Re: Row pattern recognition
Список pgsql-hackers
Hi Henson,

Thanks for the patches! I applied them on top of v42 and rebased
against current master.  Attached are the v43 patches.

> Hi Tatsuo,
> 
> This round focused on reviewing nodeWindowAgg.c -- the NFA executor
> -- which was the last piece remaining on my review list.  I've also
> added a comprehensive NFA runtime test suite.
> 
> The attached patch continues the incremental series on top of v42.
> Here are the changes:
> 
> 
> FIXME issues documented:
> 
> These require non-trivial structural changes, so I've documented
> them as FIXME comments for now rather than attempting a fix.
> 
> - altPriority tracks only the last ALT choice, so repeated or nested
>   ALTs like (A|B)+ cannot correctly implement SQL standard lexical
>   ordering.  A full-path classifier structure is needed.
> 
> - Cycle prevention condition (count == 0 && min == 0) is insufficient
>   for patterns like (A*)* where cycles occur at count > 0.  Currently
>   relies on implicit duplicate detection in nfa_add_state_unique.
> 
> 
> Bugs fixed:
> 
> - Fix nfa_advance_begin() routing order: enter group first (lexically
>   first), then skip path (lexically second).
> 
> - Add ALT scope boundary check in nfa_advance_alt() and
>   computeAbsorbabilityRecursive() to stop branch traversal when
>   element depth exits ALT scope.
> 
> - Disallow RANGE and GROUPS frame types with row pattern
>   recognition, as the SQL standard only requires ROWS.
>   Also remove the now-dead frame-type determination code.
> 
> 
> NFA executor refactoring (nodeWindowAgg.c):
> 
> - Simplify nfa_process_row() phase 1: remove nfa_advance() call
>   after forced mismatch at frame boundary, since all states are
>   already at VAR positions and mismatch removes them.  Replace
>   the redundant phase 3 frame boundary check with Assert.
> 
> - Simplify update_reduced_frame() result registration: mismatch
>   path now returns early, and the post-processing SKIP mode cleanup
>   block (nfa_remove_contexts_up_to vs nfa_context_free) is removed
>   since eager pruning handles SKIP PAST LAST ROW and both paths
>   now simply call nfa_context_free().
> 
> - Replace nfa_remove_contexts_up_to() with eager pruning inside
>   nfa_add_matched_state().  When matchEndRow is extended during
>   greedy matching, pending contexts within the match range are
>   pruned incrementally instead of after match completion.  For
>   SKIP PAST LAST ROW patterns like START UP+, this reduces the
>   number of live contexts at each row from O(n) to O(1), avoiding
>   O(n^2) per-row processing of contexts that would be skipped
>   anyway.
> 
> - Optimize nfa_states_equal() to compare counts only up to the
>   current element's depth instead of the full maxDepth.
> 
> - Rename nfa_state_clone -> nfa_state_create,
>   nfa_find_context_for_pos -> nfa_get_head_context for clarity.
> 
> - Add nfa_record_context_success/skipped/absorbed statistics helpers.
> 
> - Remove unused RPRPattern parameter from
>   nfa_update_absorption_flags().
> 
> 
> Absorbability refactoring (rpr.c):
> 
> - Remove parentDepth parameter from isUnboundedStart() and
>   computeAbsorbabilityRecursive().  Scope boundaries are now
>   determined by element depth comparison instead of an explicit
>   parent depth parameter.
> 
> - Simplify isUnboundedStart(): check simple VAR case first, then
>   GROUP case with a depth-bounded loop instead of a FIN-terminated
>   traversal with multiple break conditions.
> 
> 
> Test changes:
> 
> - Add rpr_nfa.sql: comprehensive NFA runtime test suite covering
>   quantifier boundaries (min/max/exact), alternation priority, nested
>   patterns, frame boundary variations, INITIAL mode (syntax error
>   expected), pathological patterns, context absorption, and FIXME
>   reproduction cases.
> 
> - Add nth_value out-of-frame tests, ReScan/LATERAL test, and
>   nth_value IGNORE NULLS test to rpr.sql.
> 
> - Change selected EXPLAIN test queries in rpr_explain.sql from
>   EXPLAIN (COSTS OFF) to EXPLAIN (ANALYZE, ...) to verify actual
>   NFA execution statistics.
> 
> - Fix stale comments across rpr.sql, rpr_base.sql, and rpr_nfa.sql:
>   remove resolved BUG annotations, update error messages to match
>   actual output, correct optimization result descriptions, and
>   standardize Expected comment placement to after SQL statements.
> 
> 
> Other changes:
> 
> - Run pgindent on RPR source files.  Add /*----------*/ comment
>   guards to protect structured comments from reformatting.
>
> Coverage:
> 
> I ran gcov on the modified lines (diff-only coverage).  The attached
> coverage.zip contains an HTML report.  Summary:
> 
> - 18 files, 124 functions, 2238 modified lines analyzed
> - Overall: 92.1% line coverage (2061/2238)
> - Core files (nodeWindowAgg.c, rpr.c, explain.c, parse_rpr.c):
>   98.0-98.5% each
> - Node serialization (outfuncs.c, readfuncs.c, equalfuncs.c):
>   0% -- these implement RPRPattern serialization for plan caching,
>   but regression tests don't exercise prepared statements with RPR
> - ruleutils.c: 96.3% -- untested lines are the reluctant quantifier
>   display path ({1}?), which is currently rejected at parse time

Thanks for the report.

> The node serialization functions (141 lines, 0% coverage) are the
> largest untested area.  I'm not sure how to trigger these paths
> in the regression test framework.  Any suggestions?

I think we can leave it as it is, until reluctant quantifier is
implemented.

> I'll send a separate email within a few days listing the FIXME
> issues and other unresolved items from the mailing list discussion
> for your review.

Looking forward to reading your email.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Вложения

В списке pgsql-hackers по дате отправления: