Обсуждение: Re: Row pattern recognition

Поиск
Список
Период
Сортировка

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> Attached are the v30 patches, just adding a patch to change the
> default io_method parameter to sync, expecting this affects something
> to the recent CFbot failure.
> https://commitfest.postgresql.org/patch/4460/
> https://cirrus-ci.com/task/6078653164945408
> which is similar to:
> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org

CFBot has just run tests against v30 patches and now it turns to green
again!  I guess io_method = sync definitely did the trick.  Note that
previously only the Windows Server 2019, VS 2019 - Neason & ninja test
had failed.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> Attached are the v30 patches, just adding a patch to change the
> default io_method parameter to sync, expecting this affects something
> to the recent CFbot failure.
> https://commitfest.postgresql.org/patch/4460/
> https://cirrus-ci.com/task/6078653164945408
> which is similar to:
> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
> 
> (Once the issue resolved, the patch should be removed, of course)

It seems this has turned to green since May 2, 2025.
https://commitfest.postgresql.org/patch/5679/.

The last time it turned to red was April 16, 2025. Maybe something
committed between the period solved the cause of red, but I don't know
exactly which commit.

Anyway v31 patches now remove the change to default io_method
parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
ninja test.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Вложения

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
>> Attached are the v30 patches, just adding a patch to change the
>> default io_method parameter to sync, expecting this affects something
>> to the recent CFbot failure.
>> https://commitfest.postgresql.org/patch/4460/
>> https://cirrus-ci.com/task/6078653164945408
>> which is similar to:
>> https://www.postgresql.org/message-id/20250422.111139.1502127917165838403.ishii%40postgresql.org
>> 
>> (Once the issue resolved, the patch should be removed, of course)
> 
> It seems this has turned to green since May 2, 2025.
> https://commitfest.postgresql.org/patch/5679/.
> 
> The last time it turned to red was April 16, 2025. Maybe something
> committed between the period solved the cause of red, but I don't know
> exactly which commit.
> 
> Anyway v31 patches now remove the change to default io_method
> parameter to see if it passes Windows Server 2019, VS 2019 - Meson &
> ninja test.

Now I see it passes the test.
https://cirrus-ci.com/build/5671612202090496

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Chao Li
Дата:
Hi Tatsuo-san,

I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day.

> On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> Attached are the v35 patches for Row pattern recognition.  In v34-0001
> gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
> patches are not changed.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
<v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>

I got a few comments, maybe just questions:

1 - 0001 - kwlist.h
```
+PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
```

Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
```
<programlisting>
SELECT company, tdate, price,
 first_value(price) OVER w,
 max(price) OVER w,
 count(price) OVER w
FROM stock
 WINDOW w AS (
 PARTITION BY company
 ORDER BY tdate
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);
</programlisting>
```

PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
```
PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
```

Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?

So I just want to clarify.

2 - 0001 - gram.y
```
opt_row_pattern_initial_or_seek:
            INITIAL        { $$ = true; }
            | SEEK
                {
                    ereport(ERROR,
                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                             errmsg("SEEK is not supported"),
                             errhint("Use INITIAL instead."),
                             parser_errposition(@1)));
                }
            | /*EMPTY*/        { $$ = true; }
        ;
```

As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to defer
thesemantic check to later parse phase, which would future work easier. 

3 - 0001 - parsenodes.h
```
+/*
+ * RowPatternCommonSyntax - raw representation of row pattern common syntax
+ *
+ */
+typedef struct RPCommonSyntax
+{
+    NodeTag        type;
+    RPSkipTo    rpSkipTo;        /* Row Pattern AFTER MATCH SKIP type */
+    bool        initial;        /* true if <row pattern initial or seek> is
+                                 * initial */
+    List       *rpPatterns;        /* PATTERN variables (list of A_Expr) */
+    List       *rpDefs;            /* row pattern definitions clause (list of
+                                 * ResTarget) */
+} RPCommonSyntax;
+
 /*
  * WindowDef - raw representation of WINDOW and OVER clauses
  *
@@ -593,6 +618,7 @@ typedef struct WindowDef
     char       *refname;        /* referenced window name, if any */
     List       *partitionClause;    /* PARTITION BY expression list */
     List       *orderClause;    /* ORDER BY (list of SortBy) */
+    RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
```

RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly define
RPfields in WindowRef as well? 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 18, 2025, at 13:03, Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Tatsuo-san,
>
> I just reviewed 0006 docs changes and 0001. I plan to slowly review the patch, maybe one commit a day.
>
>> On Nov 18, 2025, at 10:33, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>> Attached are the v35 patches for Row pattern recognition.  In v34-0001
>> gram.y patch, %type for RPR were misplaced. v35-0001 fixes this. Other
>> patches are not changed.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
<v35-0001-Row-pattern-recognition-patch-for-raw-parser.patch><v35-0002-Row-pattern-recognition-patch-parse-analysis.patch><v35-0003-Row-pattern-recognition-patch-rewriter.patch><v35-0004-Row-pattern-recognition-patch-planner.patch><v35-0005-Row-pattern-recognition-patch-executor.patch><v35-0006-Row-pattern-recognition-patch-docs.patch><v35-0007-Row-pattern-recognition-patch-tests.patch><v35-0008-Row-pattern-recognition-patch-typedefs.list.patch>
>
> I got a few comments, maybe just questions:
>
> 1 - 0001 - kwlist.h
> ```
> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
> ```
> <programlisting>
> SELECT company, tdate, price,
> first_value(price) OVER w,
> max(price) OVER w,
> count(price) OVER w
> FROM stock
> WINDOW w AS (
> PARTITION BY company
> ORDER BY tdate
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (LOWPRICE UP+ DOWN+)
> DEFINE
>  LOWPRICE AS price <= 100,
>  UP AS price > PREV(price),
>  DOWN AS price < PREV(price)
> );
> </programlisting>
> ```
>
> PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
> ```
> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?
>
> So I just want to clarify.
>
> 2 - 0001 - gram.y
> ```
> opt_row_pattern_initial_or_seek:
> INITIAL { $$ = true; }
> | SEEK
> {
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("SEEK is not supported"),
> errhint("Use INITIAL instead."),
> parser_errposition(@1)));
> }
> | /*EMPTY*/ { $$ = true; }
> ;
> ```
>
> As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to
deferthe semantic check to later parse phase, which would future work easier. 
>
> 3 - 0001 - parsenodes.h
> ```
> +/*
> + * RowPatternCommonSyntax - raw representation of row pattern common syntax
> + *
> + */
> +typedef struct RPCommonSyntax
> +{
> + NodeTag type;
> + RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */
> + bool initial; /* true if <row pattern initial or seek> is
> + * initial */
> + List   *rpPatterns; /* PATTERN variables (list of A_Expr) */
> + List   *rpDefs; /* row pattern definitions clause (list of
> + * ResTarget) */
> +} RPCommonSyntax;
> +
> /*
>  * WindowDef - raw representation of WINDOW and OVER clauses
>  *
> @@ -593,6 +618,7 @@ typedef struct WindowDef
> char   *refname; /* referenced window name, if any */
> List   *partitionClause; /* PARTITION BY expression list */
> List   *orderClause; /* ORDER BY (list of SortBy) */
> + RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
> ```
>
> RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly
defineRP fields in WindowRef as well? 
>

4 - 0001 - parsenodes.h
```
+    /* Row Pattern AFTER MACH SKIP clause */
```

Typo: MACH -> MATCH

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Vik Fearing
Дата:
On 18/11/2025 06:03, Chao Li wrote:
> 1 - 0001 - kwlist.h
> ```
> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Why do we add “define” as a reserved keyword? From the SQL example you put in 0006:
> ```
> <programlisting>
> SELECT company, tdate, price,
>   first_value(price) OVER w,
>   max(price) OVER w,
>   count(price) OVER w
> FROM stock
>   WINDOW w AS (
>   PARTITION BY company
>   ORDER BY tdate
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   AFTER MATCH SKIP PAST LAST ROW
>   INITIAL
>   PATTERN (LOWPRICE UP+ DOWN+)
>   DEFINE
>    LOWPRICE AS price <= 100,
>    UP AS price > PREV(price),
>    DOWN AS price < PREV(price)
> );
> </programlisting>
> ```
>
> PARTITION is at the same level as DEFINE, but it’s not defined as a reserved keyword:
> ```
> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
> ```
>
> Even in this patch,”initial”,”past”, “pattern” and “seek” are defined as unreserved, why?
>
> So I just want to clarify.


Because of position. Without making DEFINE a reserved keyword, how do 
you know that it isn't another variable in the PATTERN clause?

-- 

Vik Fearing




Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
>
>
> Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the
PATTERNclause? 
>

Ah, thanks for the clarification. Now I got it.

I’m continue to review 0002.

5 - 0002 - parse_clause.c
```
+        ereport(ERROR,
+                (errcode(ERRCODE_SYNTAX_ERROR),
+                 errmsg("FRAME must start at current row when row patttern recognition is used")));
```

Nit typo: patttern -> pattern

6 - 0002 - parse_clause.c
```
+    /* DEFINE variable name initials */
+    static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
```

This string can also be const, so “static const char *”

7 - 0002 - parse_clause.c
```
+    /*
+     * Create list of row pattern DEFINE variable name's initial. We assign
+     * [a-z] to them (up to 26 variable names are allowed).
+     */
+    restargets = NIL;
+    i = 0;
+    initialLen = strlen(defineVariableInitials);
+
+    foreach(lc, windef->rpCommonSyntax->rpDefs)
+    {
+        char        initial[2];
+
+        restarget = (ResTarget *) lfirst(lc);
+        name = restarget->name;
```

Looks like “name” is not used inside “foreach”.

8 - 0002 - parse_clause.c
```
+    /*
+     * Create list of row pattern DEFINE variable name's initial. We assign
+     * [a-z] to them (up to 26 variable names are allowed).
+     */
+    restargets = NIL;
+    i = 0;
+    initialLen = strlen(defineVariableInitials);
+
+    foreach(lc, windef->rpCommonSyntax->rpDefs)
+    {
+        char        initial[2];
```

I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication.
Ifan def has no dup, then assign an initial to it. 

9 - 0002 - parse_clause.c
```
+static void
+transformPatternClause(ParseState *pstate, WindowClause *wc,
+                       WindowDef *windef)
+{
+    ListCell   *lc;
+
+    /*
+     * Row Pattern Common Syntax clause exists?
+     */
+    if (windef->rpCommonSyntax == NULL)
+        return;
```

Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by
transformRPR(),and transformRPR() has done the check. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> On 18/11/2025 06:03, Chao Li wrote:
>> 1 - 0001 - kwlist.h
>> ```
>> +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
>> ```
>>
>> Why do we add “define” as a reserved keyword? From the SQL example
>> you put in 0006:
>> ```
>> <programlisting>
>> SELECT company, tdate, price,
>>   first_value(price) OVER w,
>>   max(price) OVER w,
>>   count(price) OVER w
>> FROM stock
>>   WINDOW w AS (
>>   PARTITION BY company
>>   ORDER BY tdate
>>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>>   AFTER MATCH SKIP PAST LAST ROW
>>   INITIAL
>>   PATTERN (LOWPRICE UP+ DOWN+)
>>   DEFINE
>>    LOWPRICE AS price <= 100,
>>    UP AS price > PREV(price),
>>    DOWN AS price < PREV(price)
>> );
>> </programlisting>
>> ```
>>
>> PARTITION is at the same level as DEFINE, but it’s not defined as a
>> reserved keyword:
>> ```
>> PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL)
>> ```
>>
>> Even in this patch,”initial”,”past”, “pattern” and “seek” are
>> defined as unreserved, why?
>>
>> So I just want to clarify.
> 
> 
> Because of position. Without making DEFINE a reserved keyword, how do
> you know that it isn't another variable in the PATTERN clause?

I think we don't need to worry about this because PATTERN_P is in the
$nonassoc list in the patch, which gives PATTERN different precedence
from DEFINE.

@@ -888,6 +896,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %nonassoc    UNBOUNDED NESTED /* ideally would have same precedence as IDENT */
 %nonassoc    IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
             SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
+            AFTER INITIAL SEEK PATTERN_P

And I think we could change DEFINE to an unreserved keyword.  Attached
is a patch to do that, on top of v35-0001.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index fccc26964a0..8ec9f1f265c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -17982,6 +17982,7 @@ unreserved_keyword:
             | DECLARE
             | DEFAULTS
             | DEFERRED
+            | DEFINE
             | DEFINER
             | DELETE_P
             | DELIMITER
@@ -18402,7 +18403,6 @@ reserved_keyword:
             | CURRENT_USER
             | DEFAULT
             | DEFERRABLE
-            | DEFINE
             | DESC
             | DISTINCT
             | DO
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c60b9b44a8..89dc2a4b95a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -129,7 +129,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL)
-PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("define", DEFINE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL)

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> 2 - 0001 - gram.y
> ```
> opt_row_pattern_initial_or_seek:
>             INITIAL        { $$ = true; }
>             | SEEK
>                 {
>                     ereport(ERROR,
>                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                              errmsg("SEEK is not supported"),
>                              errhint("Use INITIAL instead."),
>                              parser_errposition(@1)));
>                 }
>             | /*EMPTY*/        { $$ = true; }
>         ;
> ```
> 
> As SEEK is specially listed here, I guess it might be supported in future. If that is true, would it be better to
deferthe semantic check to later parse phase, which would future work easier.
 

From a comment in gram.y:

/*
 * If we see PARTITION, RANGE, ROWS, GROUPS, AFTER, INITIAL, SEEK or PATTERN_P
 * as the first token after the '(' of a window_specification, we want the
 * assumption to be that there is no existing_window_name; but those keywords
 * are unreserved and so could be ColIds.  We fix this by making them have the

For this purpose, we want INITIAL and SEEK to be unreserved keywords.

> 3 - 0001 - parsenodes.h
> ```
> +/*
> + * RowPatternCommonSyntax - raw representation of row pattern common syntax
> + *
> + */
> +typedef struct RPCommonSyntax
> +{
> +    NodeTag        type;
> +    RPSkipTo    rpSkipTo;        /* Row Pattern AFTER MATCH SKIP type */
> +    bool        initial;        /* true if <row pattern initial or seek> is
> +                                 * initial */
> +    List       *rpPatterns;        /* PATTERN variables (list of A_Expr) */
> +    List       *rpDefs;            /* row pattern definitions clause (list of
> +                                 * ResTarget) */
> +} RPCommonSyntax;
> +
>  /*
>   * WindowDef - raw representation of WINDOW and OVER clauses
>   *
> @@ -593,6 +618,7 @@ typedef struct WindowDef
>      char       *refname;        /* referenced window name, if any */
>      List       *partitionClause;    /* PARTITION BY expression list */
>      List       *orderClause;    /* ORDER BY (list of SortBy) */
> +    RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */
> ```
> 
> RP fields are directly defined in WindowClause, then why do we need a wrapper of RPCommonSyntax? Can we directly
defineRP fields in WindowRef as well?
 

The row pattern common syntax defined in the standard is not only used
in the WINDOW clause, but in the FROM clause. If we would support RPR
in FROM clause in the future, it would be better to use the same code
of row pattern common syntax in WINDOW clause as much as
possible. That's the reason I created RPCommonSyntax. In the
parse/analysis phase, I am not sure how the parse/analysis code would
be in FROM clause at this point. So I did not define yet another
struct for it.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> 4 - 0001 - parsenodes.h
> ```
> +    /* Row Pattern AFTER MACH SKIP clause */
> ```
> 
> Typo: MACH -> MATCH

Thanks for pointing it out. Will fix.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 19, 2025, at 12:14, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> 9 - 0002 - parse_clause.c

I am continuing to review 0003

10 - 0003
```
+    Assert(list_length(patternVariables) == list_length(defineClause));
```

Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got an
examplefrom [1]: 

```
Example 4
-- This example has three different patterns.
-- Comment two of them, to get error-free query.
SELECT company, price_date, price
  FROM stock_price
    MATCH_RECOGNIZE (
       partition by company
       order by price_date
       all rows per match
       pattern ( limit_50  up   up* down  down*  )
       define
           limit_50 as price <= 50.00,
           up   as price > prev(price),
           down as price < prev(price)
    )
   WHERE company = 'B'
   ORDER BY price_date;
```

In this example, pattern has 5 elements and define has only 3 elements.

11 - 0004 - plannodes.h
```
+    /* Row Pattern Recognition AFTER MACH SKIP clause */
+    RPSkipTo    rpSkipTo;        /* Row Pattern Skip To type */
```

Typo: MACH -> MATCH

I’d stop here, and continue 0005 tomorrow.


[1] https://link.springer.com/article/10.1007/s13222-022-00404-3

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> I’d stop here, and continue 0005 tomorrow.
>

I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a
valuablelearning process for me. 

12 - 0005 - nodeWindowAgg.c
```
+    if (string_set_get_size(str_set) == 0)
+    {
+        /* no match found in the first row */
+        register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
+        destroyStringInfo(encoded_str);
+        return;
+    }
```

encoded_str should also be destroyed if not entering the “if” clause.

13 - 0005 - nodeWindowAgg.c
```
static
int
do_pattern_match(char *pattern, char *encoded_str, int len)
{
static regex_t *regcache = NULL;
static regex_t preg;
static char patbuf[1024]; /* most recent 'pattern' is cached here */
```

Using static variable in executor seems something I have never seen, which may persistent data across queries. I think
usuallyper query data are stored in state structures. 

14 - 0005 - nodeWindowAgg.c
```
+        MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
+
+        if (regcache != NULL)
+            pg_regfree(regcache);    /* free previous re */
+
+        /* we need to convert to char to pg_wchar */
+        plen = strlen(pattern);
+        data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
+        data_len = pg_mb2wchar_with_len(pattern, data, plen);
+        /* compile re */
+        sts = pg_regcomp(&preg, /* compiled re */
+                         data,    /* target pattern */
+                         data_len,    /* length of pattern */
+                         cflags,    /* compile option */
+                         C_COLLATION_OID    /* collation */
+            );
+        pfree(data);
+
+        MemoryContextSwitchTo(oldContext);
```

Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the
commentof pg_regcomp: 
```
/*
* pg_regcomp - compile regular expression
*
* Note: on failure, no resources remain allocated, so pg_regfree()
* need not be applied to re.
*/
int
pg_regcomp(regex_t *re,
const chr *string,
size_t len,
int flags,
Oid collation)
```

“preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory
leak.

Okay, I’d stop here and continue to review 0006 next week.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Chao,

>> On Nov 18, 2025, at 19:19, Vik Fearing <vik@postgresfriends.org> wrote:
>> 
>> 
>> Because of position. Without making DEFINE a reserved keyword, how do you know that it isn't another variable in the
PATTERNclause?
 
>> 
> 
> Ah, thanks for the clarification. Now I got it.
> 
> I’m continue to review 0002.

Thanks for the review!

> 5 - 0002 - parse_clause.c
> ```
> +        ereport(ERROR,
> +                (errcode(ERRCODE_SYNTAX_ERROR),
> +                 errmsg("FRAME must start at current row when row patttern recognition is used")));
> ```
> 
> Nit typo: patttern -> pattern

Will fix (this involves regression test change because this changes
the ERROR messages in the expected file).

> 6 - 0002 - parse_clause.c
> ```
> +    /* DEFINE variable name initials */
> +    static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz”;
> ```
> 
> This string can also be const, so “static const char *”

Agreed. Will fix.

> 7 - 0002 - parse_clause.c
> ```
> +    /*
> +     * Create list of row pattern DEFINE variable name's initial. We assign
> +     * [a-z] to them (up to 26 variable names are allowed).
> +     */
> +    restargets = NIL;
> +    i = 0;
> +    initialLen = strlen(defineVariableInitials);
> +
> +    foreach(lc, windef->rpCommonSyntax->rpDefs)
> +    {
> +        char        initial[2];
> +
> +        restarget = (ResTarget *) lfirst(lc);
> +        name = restarget->name;
> ```
> 
> Looks like “name” is not used inside “foreach”.

Oops. Will fix.

> 8 - 0002 - parse_clause.c
> ```
> +    /*
> +     * Create list of row pattern DEFINE variable name's initial. We assign
> +     * [a-z] to them (up to 26 variable names are allowed).
> +     */
> +    restargets = NIL;
> +    i = 0;
> +    initialLen = strlen(defineVariableInitials);
> +
> +    foreach(lc, windef->rpCommonSyntax->rpDefs)
> +    {
> +        char        initial[2];
> ```
> 
> I guess this “foreach” for build initial list can be combined into the previous foreach loop of checking duplication.
Ifan def has no dup, then assign an initial to it.
 

You are right. Will change.

> 9 - 0002 - parse_clause.c
> ```
> +static void
> +transformPatternClause(ParseState *pstate, WindowClause *wc,
> +                       WindowDef *windef)
> +{
> +    ListCell   *lc;
> +
> +    /*
> +     * Row Pattern Common Syntax clause exists?
> +     */
> +    if (windef->rpCommonSyntax == NULL)
> +        return;
> ```
> 
> Checking  (windef->rpCommonSyntax == NULL) seems a duplicate, because transformPatternClause() is only called by
transformRPR(),and transformRPR() has done the check.
 

Yeah. transformDefineClause() already does the similar check using
Assert. What about using Assert in transPatternClause() as well?

    Assert(windef->rpCommonSyntax != NULL);

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> Okay, I’d stop here and continue to review 0006 next week.
>

I just finished reviewing 0006, and see some problems:

15 - 0006 - select.sgml
```
+[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
```

row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed by
“_syntax”.I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”. 

16 - 0006 - select.sgml
```
+ <synopsis>
+ [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
``

I think the two values are mutually exclusive, so curly braces should added surround them:

[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]

[] means optional, and {} means choose one from all alternatives.

17 - 0006 - select.sgml
```
PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
```

PATTERN definition should be placed inside (), here you missed ()

18 - 0006 - select.sgml
The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here, I
thinkSQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch. So,
maybewe can do: 

```
PATTERN ( pattern_element, [pattern_element …] )

and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern
quantifier).
```

19 - 0006 - select.sgml

I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.

20 - 0006 - select.sgml
```
+    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
+    ROW</literal> (the default) next row position is next to the last row of
```

21 - 0006 - select.sgml
```
 [ <replaceable class="parameter">frame_clause</replaceable> ]
+[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
```

Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to
frame_clause:
```
 <synopsis>
-{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
-{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [
<replaceable>frame_exclusion</replaceable>] 
+{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
[row_pattern_common_syntax]
+{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [
<replaceable>frame_exclusion</replaceable>] [row_pattern_common_syntax] 
 </synopsis>
```

So I guess row_pattern_common_syntax after frame_clause should not be added.

22 - 0006 - advance.sgml
```
<programlisting>
DEFINE
 LOWPRICE AS price <= 100,
 UP AS price > PREV(price),
 DOWN AS price < PREV(price)
</programlisting>

    Note that <function>PREV</function> returns the price column in the
```

In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This
commentapplies to multiple places. 

I will proceed 0007 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Chao,

Thank you for the review!

>> On Nov 20, 2025, at 15:33, Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> 
>> I’d stop here, and continue 0005 tomorrow.
>> 
> 
> I reviewed 0005 today. I'm still not very familiar with the executor code, so going through 0005 has also been a
valuablelearning process for me.
 
> 
> 12 - 0005 - nodeWindowAgg.c
> ```
> +    if (string_set_get_size(str_set) == 0)
> +    {
> +        /* no match found in the first row */
> +        register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED);
> +        destroyStringInfo(encoded_str);
> +        return;
> +    }
> ```
> 
> encoded_str should also be destroyed if not entering the “if” clause.

Subsequent string_set_discard() will free encode_str since encoded_str
is part of str_set. So no need to call destroyStringInfo(encoded_str)
in not entering "if" clause.

    string_set_discard(str_set);

So I think this is Ok.

> 13 - 0005 - nodeWindowAgg.c
> ```
> static
> int
> do_pattern_match(char *pattern, char *encoded_str, int len)
> {
> static regex_t *regcache = NULL;
> static regex_t preg;
> static char patbuf[1024]; /* most recent 'pattern' is cached here */
> ```
> 
> Using static variable in executor seems something I have never seen, which may persistent data across queries. I
thinkusually per query data are stored in state structures.
 

Yeah, such a usage maybe rare. But I hesitate to store the data
(regex_t) in WindowAggState because:

(1) it bloats WindowAggState which we want to avoid if possible
(2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)

> 14 - 0005 - nodeWindowAgg.c
> ```
> +        MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
> +
> +        if (regcache != NULL)
> +            pg_regfree(regcache);    /* free previous re */
> +
> +        /* we need to convert to char to pg_wchar */
> +        plen = strlen(pattern);
> +        data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
> +        data_len = pg_mb2wchar_with_len(pattern, data, plen);
> +        /* compile re */
> +        sts = pg_regcomp(&preg, /* compiled re */
> +                         data,    /* target pattern */
> +                         data_len,    /* length of pattern */
> +                         cflags,    /* compile option */
> +                         C_COLLATION_OID    /* collation */
> +            );
> +        pfree(data);
> +
> +        MemoryContextSwitchTo(oldContext);
> ```
> 
> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the
commentof pg_regcomp:
 
> ```
> /*
> * pg_regcomp - compile regular expression
> *
> * Note: on failure, no resources remain allocated, so pg_regfree()
> * need not be applied to re.
> */
> int
> pg_regcomp(regex_t *re,
> const chr *string,
> size_t len,
> int flags,
> Oid collation)
> ```
> 
> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory
leak.

I admit current patch leaves the memory unfreed even after a query
finishes. What about adding something like:

static void do_pattern_match_end(void)
{
    if (regcache != NULL)
        pg_regfree(regcache);
}

and let ExecEndWindowAgg() call it?

> Okay, I’d stop here and continue to review 0006 next week.
> 
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
> 



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Chao,

>> On Nov 21, 2025, at 13:25, Chao Li <li.evan.chao@gmail.com> wrote:
>> 
>> 
>> Okay, I’d stop here and continue to review 0006 next week.
>> 
> 
> I just finished reviewing 0006, and see some problems:
> 
> 15 - 0006 - select.sgml
> ```
> +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
> ```
> 
> row_pattern_common_syntax doesn’t look like a good name. I searched over the docs, and couldn't find a name suffixed
by“_syntax”. I think we can just name it as “row_pattern_recognition_clause” or a shorter name just “row_pattern”.
 

I believe these names are based on the SQL standard. All syntax
components do not necessary be suffixed by "clause". For example
in select.sgml,

[ WITH [ RECURSIVE ] <replaceable class="parameter">with_query</replaceable> [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replaceable> [, ...] ) ] ]
    [ { * | <replaceable class="parameter">expression</replaceable> [ [ AS ] <replaceable
class="parameter">output_name</replaceable>] } [, ...] ]
 
    [ FROM <replaceable class="parameter">from_item</replaceable> [, ...] ]
    [ WHERE <replaceable class="parameter">condition</replaceable> ]
    [ GROUP BY { ALL | [ ALL | DISTINCT ] <replaceable class="parameter">grouping_element</replaceable> [, ...] } ]
    [ HAVING <replaceable class="parameter">condition</replaceable> ]
    [ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceable
class="parameter">window_definition</replaceable>) [, ...] ]
 

"window_definition" comes from the standard and does not suffixed by
"clause". If you look into the window clause definition in the
standard, you will see:

<window clause> ::=
WINDOW <window definition list>
<window definition list> ::=
<window definition> [ { <comma> <window definition> }... ]

So the usage of terms in our document matches the standard. Likewise,
we see the definition of row pattern common syntax in the stabdard:

<window clause> ::=
WINDOW <window definition list>
<window definition list> ::=
<window definition> [ { <comma> <window definition> }... ]
<window definition> ::=
<new window name> AS <window specification>
<new window name> ::=
<window name>
<window specification> ::=
<left paren> <window specification details> <right paren>
<window specification details> ::=
[ <existing window name> ]
[ <window partition clause> ]
[ <window order clause> ]
[ <window frame clause> ]
:
:
<window frame clause> ::=
[ <row pattern measures> ]
<window frame units> <window frame extent>
[ <window frame exclusion> ]
[ <row pattern common syntax> ]

So I think "row pattern common syntax" is perfectly appropriate
name. If we change it to “row_pattern_recognition_clause”, it will
just bring confusion.  In the standard “row pattern recognition
clause” is defined RPR in FROM clause.

SELECT ... FROM table MATCH RECOGNIZE (....);

Here MATCH RECOGNIZE (....) is the “row pattern recognition clause”.

> 16 - 0006 - select.sgml
> ```
> + <synopsis>
> + [ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
> ``
> 
> I think the two values are mutually exclusive, so curly braces should added surround them:
> 
> [ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
> 
> [] means optional, and {} means choose one from all alternatives.

Agreed. Will fix.

> 17 - 0006 - select.sgml
> ```
> PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
> ```
> 
> PATTERN definition should be placed inside (), here you missed ()

Good catch. Will fix.

> 18 - 0006 - select.sgml
> The same code as 17, <replaceable class="parameter">pattern_variable_name</replaceable>[+], do you only put “+” here,
Ithink SQL allows a lot of pattern quantifiers. From 0001, gram.y, looks like +, * and ? Are supported in this patch.
So,maybe we can do:
 
> 
> ```
> PATTERN ( pattern_element, [pattern_element …] )
> 
> and pattern_element = variable name optionally followed by quantifier (reference to somewhere introducing pattern
quantifier).
> ```

Currently only *, + and ? are supported and I don't think it's worth
to invent "pattern_element".  (Actually the standard defines much more
complex syntax for PATTERN. I think we can revisit this once the basic
support for quantifier *,+ and ? are brought in the core PostgreSQL
code.

> 19 - 0006 - select.sgml
> 
> I don’t see INITIAL and SEEK are described. Even if SEEK is not supported currently, it’s worthy mentioning that.

Agreed. Will fix.

> 20 - 0006 - select.sgml
> ```
> +    after a match found. With <literal>AFTER MATCH SKIP PAST LAST
> +    ROW</literal> (the default) next row position is next to the last row of
> ```
> 
> 21 - 0006 - select.sgml
> ```
>  [ <replaceable class="parameter">frame_clause</replaceable> ]
> +[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
> ```
> 
> Looks like row_pattern_common_syntax belongs to frame_clause, and you have lately added row_pattern_common_syntax to
frame_clause:
> ```
>  <synopsis>
> -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
> -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [
<replaceable>frame_exclusion</replaceable>]
 
> +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ]
[row_pattern_common_syntax]
> +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [
<replaceable>frame_exclusion</replaceable>] [row_pattern_common_syntax]
 
>  </synopsis>
> ```
> 
> So I guess row_pattern_common_syntax after frame_clause should not be added.

Yes, row_pattern_common_syntax belongs to frame_clause. Will fix.

> 22 - 0006 - advance.sgml
> ```
> <programlisting>
> DEFINE
>  LOWPRICE AS price <= 100,
>  UP AS price > PREV(price),
>  DOWN AS price < PREV(price)
> </programlisting>
> 
>     Note that <function>PREV</function> returns the price column in the
> ```
> 
> In the “Note” line, “price” refers to a column from above example, so I think it should be tagged by <literal>. This
commentapplies to multiple places.
 

You are right. Will fix.

> I will proceed 0007 tomorrow.

Thanks!

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 24, 2025, at 08:59, Chao Li <li.evan.chao@gmail.com> wrote:
>
> I will proceed 0007 tomorrow.

I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL feature,
fromchanging the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot from it.
Thanks!

23 - 0007

As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.

24 - 0007

I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).

25 - 0007
```
+-- basic test with none greedy pattern
```

Typo: “none greedy” -> non-greedy

No comment for 0008.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL
feature,from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot
fromit. Thanks!
 

You are welcome!

> 23 - 0007
> 
> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.

Good catch. I will add the test case. In the mean time I found a bug
with gram.y part which handles '?' quantifier.  gram.y can be
successfully compiled but there's no way to put '?' quantifier in a
SQL statement.  We cannot write directly "ColId '?'" like other
quantifier '*' and '+' in the grammer because '?' is not a "self"
token.  So we let '?' matches Op first then check if it's '?'  or
not. 

            | ColId Op
                {
                    if (strcmp("?", $2))
                        ereport(ERROR,
                                errcode(ERRCODE_SYNTAX_ERROR),
                                errmsg("unsupported quantifier"),
                                parser_errposition(@2));

                    $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
                }

Another bug was with executor (nodeWindowAgg.c). The code to check
greedy quantifiers missed the case '?'.

> 24 - 0007
> 
> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).

I will add such test cases.

> 25 - 0007
> ```
> +-- basic test with none greedy pattern
> ```
> 
> Typo: “none greedy” -> non-greedy

Will fix.

> No comment for 0008.

Thanks again for the review. I will post v35 patch set soon.  Attached
is the diff from v34.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 7514f2a3848..eec2a0a9346 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -559,13 +559,14 @@ DEFINE
  DOWN AS price < PREV(price)
 </programlisting>
 
-    Note that <function>PREV</function> returns the price column in the
-    previous row if it's called in a context of row pattern recognition. Thus
-    in the second line the definition variable "UP" is <literal>TRUE</literal>
-    when the price column in the current row is greater than the price column
-    in the previous row. Likewise, "DOWN" is <literal>TRUE</literal> when the
-    price column in the current row is lower than the price column in the
-    previous row.
+    Note that <function>PREV</function> returns the <literal>price</literal>
+    column in the previous row if it's called in a context of row pattern
+    recognition. Thus in the second line the definition variable "UP"
+    is <literal>TRUE</literal> when the price column in the current row is
+    greater than the price column in the previous row. Likewise, "DOWN"
+    is <literal>TRUE</literal> when the
+    <literal>price</literal> column in the current row is lower than
+    the <literal>price</literal> column in the previous row.
    </para>
    <para>
     Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be
@@ -624,10 +625,10 @@ FROM stock
    <para>
     When a query involves multiple window functions, it is possible to write
     out each one with a separate <literal>OVER</literal> clause, but this is
-    duplicative and error-prone if the same windowing behavior is wanted
-    for several functions.  Instead, each windowing behavior can be named
-    in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
-    For example:
+    duplicative and error-prone if the same windowing behavior is wanted for
+    several functions.  Instead, each windowing behavior can be named in
+    a <literal>WINDOW</literal> clause and then referenced
+    in <literal>OVER</literal>.  For example:
 
 <programlisting>
 SELECT sum(salary) OVER w, avg(salary) OVER w
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 428bd4f7372..aebc797545e 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -937,7 +937,6 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl
 [ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
 [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable
class="parameter">operator</replaceable>] [ NULLS { FIRST | LAST } ] [, ...] ]
 
 [ <replaceable class="parameter">frame_clause</replaceable> ]
-[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
 </synopsis>
    </para>
 
@@ -1097,8 +1096,9 @@ EXCLUDE NO OTHERS
     includes following subclauses.
 
 <synopsis>
-[ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
-PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
+[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
+[ INITIAL | SEEK ]
+PATTERN ( <replaceable class="parameter">pattern_variable_name</replaceable>[*|+|?] [, ...] )
 DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <replaceable
class="parameter">expression</replaceable>[, ...]
 
 </synopsis>
     <literal>AFTER MATCH SKIP PAST LAST ROW</literal> or <literal>AFTER MATCH
@@ -1107,9 +1107,16 @@ DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <
     ROW</literal> (the default) next row position is next to the last row of
     previous match. On the other hand, with <literal>AFTER MATCH SKIP TO NEXT
     ROW</literal> next row position is always next to the last row of previous
-    match. <literal>DEFINE</literal> defines definition variables along with a
-    boolean expression. <literal>PATTERN</literal> defines a sequence of rows
-    that satisfies certain conditions using variables defined
+    match. <literal>INITIAL</literal> or <literal>SEEK</literal> defines how a
+    successful pattern matching starts from which row in a
+    frame. If <literal>INITIAL</literal> is specified, the match must start
+    from the first row in the frame. If <literal>SEEK</literal> is specified,
+    the set of matching rows do not necessarily start from the first row. The
+    default is <literal>INITIAL</literal>. Currently
+    only <literal>INITIAL</literal> is supported. <literal>DEFINE</literal>
+    defines definition variables along with a boolean
+    expression. <literal>PATTERN</literal> defines a sequence of rows that
+    satisfies certain conditions using variables defined
     in <literal>DEFINE</literal> clause. If the variable is not defined in
     the <literal>DEFINE</literal> clause, it is implicitly assumed following
     is defined in the <literal>DEFINE</literal> clause.
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 9ce072434b4..d230f43e607 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -4754,7 +4754,7 @@ update_reduced_frame(WindowObject winobj, int64 pos)
     {
         char       *quantifier = strVal(lfirst(lc1));
 
-        if (*quantifier == '+' || *quantifier == '*')
+        if (*quantifier == '+' || *quantifier == '*' || *quantifier == '?')
         {
             greedy = true;
             break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index fccc26964a0..fdfe14d54e5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -16825,7 +16825,21 @@ row_pattern_term:
             ColId    { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "", (Node *)makeString($1), NULL, @1); }
             | ColId '*'    { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", (Node *)makeString($1), NULL, @1); }
             | ColId '+'    { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", (Node *)makeString($1), NULL, @1); }
-            | ColId '?'    { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1); }
+/*
+ * '?' quantifier. We cannot write this directly "ColId '?'" because '?' is
+ * not a "self" token.  So we let '?' matches Op first then check if it's '?'
+ * or not.
+ */
+            | ColId Op
+                {
+                    if (strcmp("?", $2))
+                        ereport(ERROR,
+                                errcode(ERRCODE_SYNTAX_ERROR),
+                                errmsg("unsupported quantifier"),
+                                parser_errposition(@2));
+
+                    $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
+                }
         ;
 
 row_pattern_definition_list:
@@ -17982,6 +17996,7 @@ unreserved_keyword:
             | DECLARE
             | DEFAULTS
             | DEFERRED
+            | DEFINE
             | DEFINER
             | DELETE_P
             | DELIMITER
@@ -18402,7 +18417,6 @@ reserved_keyword:
             | CURRENT_USER
             | DEFAULT
             | DEFERRABLE
-            | DEFINE
             | DESC
             | DISTINCT
             | DO
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 3521d2780e2..8e4dc732861 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -3920,7 +3920,7 @@ transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef,
     if ((wc->frameOptions & FRAMEOPTION_START_CURRENT_ROW) == 0)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
-                 errmsg("FRAME must start at current row when row patttern recognition is used")));
+                 errmsg("FRAME must start at current row when row pattern recognition is used")));
 
     /* Transform AFTER MACH SKIP TO clause */
     wc->rpSkipTo = windef->rpCommonSyntax->rpSkipTo;
@@ -3949,7 +3949,7 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
                       List **targetlist)
 {
     /* DEFINE variable name initials */
-    static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz";
+    static const char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz";
 
     ListCell   *lc,
                *l;
@@ -3959,7 +3959,7 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
     List       *defineClause;
     char       *name;
     int            initialLen;
-    int            i;
+    int            numinitials;
 
     /*
      * If Row Definition Common Syntax exists, DEFINE clause must exist. (the
@@ -4030,8 +4030,12 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
      * equivalent.
      */
     restargets = NIL;
+    numinitials = 0;
+    initialLen = strlen(defineVariableInitials);
     foreach(lc, windef->rpCommonSyntax->rpDefs)
     {
+        char        initial[2];
+
         restarget = (ResTarget *) lfirst(lc);
         name = restarget->name;
 
@@ -4071,26 +4075,12 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
                                 name),
                          parser_errposition(pstate, exprLocation((Node *) r))));
         }
-        restargets = lappend(restargets, restarget);
-    }
-    list_free(restargets);
-
-    /*
-     * Create list of row pattern DEFINE variable name's initial. We assign
-     * [a-z] to them (up to 26 variable names are allowed).
-     */
-    restargets = NIL;
-    i = 0;
-    initialLen = strlen(defineVariableInitials);
-
-    foreach(lc, windef->rpCommonSyntax->rpDefs)
-    {
-        char        initial[2];
 
-        restarget = (ResTarget *) lfirst(lc);
-        name = restarget->name;
-
-        if (i >= initialLen)
+        /*
+         * Create list of row pattern DEFINE variable name's initial. We
+         * assign [a-z] to them (up to 26 variable names are allowed).
+         */
+        if (numinitials >= initialLen)
         {
             ereport(ERROR,
                     (errcode(ERRCODE_SYNTAX_ERROR),
@@ -4099,12 +4089,16 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
                      parser_errposition(pstate,
                                         exprLocation((Node *) restarget))));
         }
-        initial[0] = defineVariableInitials[i++];
+        initial[0] = defineVariableInitials[numinitials++];
         initial[1] = '\0';
         wc->defineInitial = lappend(wc->defineInitial,
                                     makeString(pstrdup(initial)));
+
+        restargets = lappend(restargets, restarget);
     }
+    list_free(restargets);
 
+    /* turns a list of ResTarget's into a list of TargetEntry's */
     defineClause = transformTargetList(pstate, windef->rpCommonSyntax->rpDefs,
                                        EXPR_KIND_RPR_DEFINE);
 
@@ -4120,6 +4114,8 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
 /*
  * transformPatternClause
  *        Process PATTERN clause and return PATTERN clause in the raw parse tree
+ *
+ * windef->rpCommonSyntax must exist.
  */
 static void
 transformPatternClause(ParseState *pstate, WindowClause *wc,
@@ -4127,11 +4123,7 @@ transformPatternClause(ParseState *pstate, WindowClause *wc,
 {
     ListCell   *lc;
 
-    /*
-     * Row Pattern Common Syntax clause exists?
-     */
-    if (windef->rpCommonSyntax == NULL)
-        return;
+    Assert(windef->rpCommonSyntax != NULL);
 
     wc->patternVariable = NIL;
     wc->patternRegexp = NIL;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9e17abbaec5..3cc0ce720b2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1621,7 +1621,7 @@ typedef struct WindowClause
     Index        winref;            /* ID referenced by window functions */
     /* did we copy orderClause from refname? */
     bool        copiedOrder pg_node_attr(query_jumble_ignore);
-    /* Row Pattern AFTER MACH SKIP clause */
+    /* Row Pattern AFTER MATCH SKIP clause */
     RPSkipTo    rpSkipTo;        /* Row Pattern Skip To type */
     bool        initial;        /* true if <row pattern initial or seek> is
                                  * initial */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c60b9b44a8..89dc2a4b95a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -129,7 +129,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL)
-PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("define", DEFINE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out
index 59cfed180e7..312b62fb489 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -165,7 +165,45 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
  company2 | 07-10-2023 |  1300 |             |            | 
 (20 rows)
 
--- basic test with none greedy pattern
+-- basic test using PREV. Use '?'
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP? DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(price),
+  DOWN AS price < PREV(price)
+);
+ company  |   tdate    | price | first_value | last_value | nth_second 
+----------+------------+-------+-------------+------------+------------
+ company1 | 07-01-2023 |   100 |         100 |        140 | 07-02-2023
+ company1 | 07-02-2023 |   200 |             |            | 
+ company1 | 07-03-2023 |   150 |             |            | 
+ company1 | 07-04-2023 |   140 |             |            | 
+ company1 | 07-05-2023 |   150 |         150 |         90 | 07-06-2023
+ company1 | 07-06-2023 |    90 |             |            | 
+ company1 | 07-07-2023 |   110 |         110 |        120 | 07-08-2023
+ company1 | 07-08-2023 |   130 |             |            | 
+ company1 | 07-09-2023 |   120 |             |            | 
+ company1 | 07-10-2023 |   130 |             |            | 
+ company2 | 07-01-2023 |    50 |          50 |       1400 | 07-02-2023
+ company2 | 07-02-2023 |  2000 |             |            | 
+ company2 | 07-03-2023 |  1500 |             |            | 
+ company2 | 07-04-2023 |  1400 |             |            | 
+ company2 | 07-05-2023 |  1500 |        1500 |         60 | 07-06-2023
+ company2 | 07-06-2023 |    60 |             |            | 
+ company2 | 07-07-2023 |  1100 |        1100 |       1200 | 07-08-2023
+ company2 | 07-08-2023 |  1300 |             |            | 
+ company2 | 07-09-2023 |  1200 |             |            | 
+ company2 | 07-10-2023 |  1300 |             |            | 
+(20 rows)
+
+-- basic test with none-greedy pattern
 SELECT company, tdate, price, count(*) OVER w
  FROM stock
  WINDOW w AS (
@@ -927,7 +965,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
 ERROR:  aggregate functions are not allowed in DEFINE
 LINE 11:   LOWPRICE AS price < count(*)
                                ^
--- FRAME must start at current row when row patttern recognition is used
+-- FRAME must start at current row when row pattern recognition is used
 SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
  FROM stock
  WINDOW w AS (
@@ -941,7 +979,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
   UP AS price > PREV(price),
   DOWN AS price < PREV(price)
 );
-ERROR:  FRAME must start at current row when row patttern recognition is used
+ERROR:  FRAME must start at current row when row pattern recognition is used
 -- SEEK is not supported
 SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
  FROM stock
@@ -977,3 +1015,38 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
   DOWN AS price < PREV(1)
 );
 ERROR:  row pattern navigation operation's argument must include at least one column reference
+-- Unsupported quantifier
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP~ DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(1),
+  DOWN AS price < PREV(1)
+);
+ERROR:  unsupported quantifier
+LINE 9:  PATTERN (START UP~ DOWN+)
+                          ^
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP+? DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(1),
+  DOWN AS price < PREV(1)
+);
+ERROR:  unsupported quantifier
+LINE 9:  PATTERN (START UP+? DOWN+)
+                          ^
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
index 47e67334994..ffcf5476198 100644
--- a/src/test/regress/sql/rpr.sql
+++ b/src/test/regress/sql/rpr.sql
@@ -75,7 +75,22 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
   DOWN AS price < PREV(price)
 );
 
--- basic test with none greedy pattern
+-- basic test using PREV. Use '?'
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP? DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(price),
+  DOWN AS price < PREV(price)
+);
+
+-- basic test with none-greedy pattern
 SELECT company, tdate, price, count(*) OVER w
  FROM stock
  WINDOW w AS (
@@ -438,7 +453,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
   LOWPRICE AS price < count(*)
 );
 
--- FRAME must start at current row when row patttern recognition is used
+-- FRAME must start at current row when row pattern recognition is used
 SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
  FROM stock
  WINDOW w AS (
@@ -484,3 +499,34 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
   UP AS price > PREV(1),
   DOWN AS price < PREV(1)
 );
+
+-- Unsupported quantifier
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP~ DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(1),
+  DOWN AS price < PREV(1)
+);
+
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP+? DOWN+)
+ DEFINE
+  START AS TRUE,
+  UP AS price > PREV(1),
+  DOWN AS price < PREV(1)
+);

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Chao,

Any comment on this?

>> 13 - 0005 - nodeWindowAgg.c
>> ```
>> static
>> int
>> do_pattern_match(char *pattern, char *encoded_str, int len)
>> {
>> static regex_t *regcache = NULL;
>> static regex_t preg;
>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>> ```
>> 
>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I
thinkusually per query data are stored in state structures.
 
>
>Yeah, such a usage maybe rare. But I hesitate to store the data
>(regex_t) in WindowAggState because:
>
>(1) it bloats WindowAggState which we want to avoid if possible
>(2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
>
>> 14 - 0005 - nodeWindowAgg.c
>> ```
>> +        MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>> +
>> +        if (regcache != NULL)
>> +            pg_regfree(regcache);    /* free previous re */
>> +
>> +        /* we need to convert to char to pg_wchar */
>> +        plen = strlen(pattern);
>> +        data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>> +        data_len = pg_mb2wchar_with_len(pattern, data, plen);
>> +        /* compile re */
>> +        sts = pg_regcomp(&preg, /* compiled re */
>> +                         data,    /* target pattern */
>> +                         data_len,    /* length of pattern */
>> +                         cflags,    /* compile option */
>> +                         C_COLLATION_OID    /* collation */
>> +            );
>> +        pfree(data);
>> +
>> +        MemoryContextSwitchTo(oldContext);
>> ```
>> 
>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the
commentof pg_regcomp:
 
>> ```
>> /*
>> * pg_regcomp - compile regular expression
>> *
>> * Note: on failure, no resources remain allocated, so pg_regfree()
>> * need not be applied to re.
>> */
>> int
>> pg_regcomp(regex_t *re,
>> const chr *string,
>> size_t len,
>> int flags,
>> Oid collation)
>> ```
>> 
>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory
leak.
>
>I admit current patch leaves the memory unfreed even after a query
>finishes. What about adding something like:
>
>static void do_pattern_match_end(void)
>{
>    if (regcache != NULL)
>        pg_regfree(regcache);
>}
>
>and let ExecEndWindowAgg() call it?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Chao Li
Дата:

> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
>
> Hi Chao,
>
> Any comment on this?
>
>>> 13 - 0005 - nodeWindowAgg.c
>>> ```
>>> static
>>> int
>>> do_pattern_match(char *pattern, char *encoded_str, int len)
>>> {
>>> static regex_t *regcache = NULL;
>>> static regex_t preg;
>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>>> ```
>>>
>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I
thinkusually per query data are stored in state structures. 
>>
>> Yeah, such a usage maybe rare. But I hesitate to store the data
>> (regex_t) in WindowAggState because:
>>
>> (1) it bloats WindowAggState which we want to avoid if possible
>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)

With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that
evenif I let this pass, other reviewers might have the same concern. 

Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only
allocatememory when a query needs to. 

>>
>>> 14 - 0005 - nodeWindowAgg.c
>>> ```
>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>>> +
>>> + if (regcache != NULL)
>>> + pg_regfree(regcache); /* free previous re */
>>> +
>>> + /* we need to convert to char to pg_wchar */
>>> + plen = strlen(pattern);
>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
>>> + /* compile re */
>>> + sts = pg_regcomp(&preg, /* compiled re */
>>> + data, /* target pattern */
>>> + data_len, /* length of pattern */
>>> + cflags, /* compile option */
>>> + C_COLLATION_OID /* collation */
>>> + );
>>> + pfree(data);
>>> +
>>> + MemoryContextSwitchTo(oldContext);
>>> ```
>>>
>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the
commentof pg_regcomp: 
>>> ```
>>> /*
>>> * pg_regcomp - compile regular expression
>>> *
>>> * Note: on failure, no resources remain allocated, so pg_regfree()
>>> * need not be applied to re.
>>> */
>>> int
>>> pg_regcomp(regex_t *re,
>>> const chr *string,
>>> size_t len,
>>> int flags,
>>> Oid collation)
>>> ```
>>>
>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory
leak.
>>
>> I admit current patch leaves the memory unfreed even after a query
>> finishes. What about adding something like:
>>
>> static void do_pattern_match_end(void)
>> {
>> if (regcache != NULL)
>> pg_regfree(regcache);
>> }
>>
>> and let ExecEndWindowAgg() call it?
>

I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext
here.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Chao,

Sorry, I missed this email.

>> 9 - 0002 - parse_clause.c
> 
> I am continuing to review 0003
> 
> 10 - 0003
> ```
> +    Assert(list_length(patternVariables) == list_length(defineClause));
> ```
> 
> Is this assert true? From what I learned, pattern length doesn’t have to equal to define length. For example, I got
anexample from [1]:
 

You are right. If PATTERN clause uses the same pattern variable more
than once (which is perfectly valid), the assertion fails. I will
remove the assersion and fix the subsequent forboth loop.

> ```
> Example 4
> -- This example has three different patterns.
> -- Comment two of them, to get error-free query.
> SELECT company, price_date, price
>   FROM stock_price
>     MATCH_RECOGNIZE (
>        partition by company
>        order by price_date
>        all rows per match
>        pattern ( limit_50  up   up* down  down*  )
>        define
>            limit_50 as price <= 50.00,
>            up   as price > prev(price),
>            down as price < prev(price)
>     )
>    WHERE company = 'B'
>    ORDER BY price_date;
> ```
> 
> In this example, pattern has 5 elements and define has only 3 elements.

Yes.

> 11 - 0004 - plannodes.h
> ```
> +    /* Row Pattern Recognition AFTER MACH SKIP clause */
> +    RPSkipTo    rpSkipTo;        /* Row Pattern Skip To type */
> ```
> 
> Typo: MACH -> MATCH

Will fix/

> I’d stop here, and continue 0005 tomorrow.

Thanks!

> [1] https://link.springer.com/article/10.1007/s13222-022-00404-3
> 
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
> 

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
>> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> 
>> Hi Chao,
>> 
>> Any comment on this?
>> 
>>>> 13 - 0005 - nodeWindowAgg.c
>>>> ```
>>>> static
>>>> int
>>>> do_pattern_match(char *pattern, char *encoded_str, int len)
>>>> {
>>>> static regex_t *regcache = NULL;
>>>> static regex_t preg;
>>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>>>> ```
>>>> 
>>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I
thinkusually per query data are stored in state structures.
 
>>> 
>>> Yeah, such a usage maybe rare. But I hesitate to store the data
>>> (regex_t) in WindowAggState because:
>>> 
>>> (1) it bloats WindowAggState which we want to avoid if possible
>>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
> 
> With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that
evenif I let this pass, other reviewers might have the same concern.
 
> 
> Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only
allocatememory when a query needs to.
 
> 
>>> 
>>>> 14 - 0005 - nodeWindowAgg.c
>>>> ```
>>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>>>> +
>>>> + if (regcache != NULL)
>>>> + pg_regfree(regcache); /* free previous re */
>>>> +
>>>> + /* we need to convert to char to pg_wchar */
>>>> + plen = strlen(pattern);
>>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
>>>> + /* compile re */
>>>> + sts = pg_regcomp(&preg, /* compiled re */
>>>> + data, /* target pattern */
>>>> + data_len, /* length of pattern */
>>>> + cflags, /* compile option */
>>>> + C_COLLATION_OID /* collation */
>>>> + );
>>>> + pfree(data);
>>>> +
>>>> + MemoryContextSwitchTo(oldContext);
>>>> ```
>>>> 
>>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the
commentof pg_regcomp:
 
>>>> ```
>>>> /*
>>>> * pg_regcomp - compile regular expression
>>>> *
>>>> * Note: on failure, no resources remain allocated, so pg_regfree()
>>>> * need not be applied to re.
>>>> */
>>>> int
>>>> pg_regcomp(regex_t *re,
>>>> const chr *string,
>>>> size_t len,
>>>> int flags,
>>>> Oid collation)
>>>> ```
>>>> 
>>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a
memoryleak.
 
>>> 
>>> I admit current patch leaves the memory unfreed even after a query
>>> finishes. What about adding something like:
>>> 
>>> static void do_pattern_match_end(void)
>>> {
>>> if (regcache != NULL)
>>> pg_regfree(regcache);
>>> }
>>> 
>>> and let ExecEndWindowAgg() call it?
>> 
> 
> I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext
here.

I decided to add new fields to WindowAggState:

    /* regular expression compiled result cache. Used for RPR. */
    char       *patbuf;            /* pattern to be compiled */
    regex_t        preg;            /* compiled re pattern */

Then allocate the memory for them using
winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
>> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL
feature,from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot
fromit. Thanks!
 
> 
> You are welcome!
> 
>> 23 - 0007
>> 
>> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
> 
> Good catch. I will add the test case. In the mean time I found a bug
> with gram.y part which handles '?' quantifier.  gram.y can be
> successfully compiled but there's no way to put '?' quantifier in a
> SQL statement.  We cannot write directly "ColId '?'" like other
> quantifier '*' and '+' in the grammer because '?' is not a "self"
> token.  So we let '?' matches Op first then check if it's '?'  or
> not. 
> 
>             | ColId Op
>                 {
>                     if (strcmp("?", $2))
>                         ereport(ERROR,
>                                 errcode(ERRCODE_SYNTAX_ERROR),
>                                 errmsg("unsupported quantifier"),
>                                 parser_errposition(@2));
> 
>                     $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
>                 }
> 
> Another bug was with executor (nodeWindowAgg.c). The code to check
> greedy quantifiers missed the case '?'.
> 
>> 24 - 0007
>> 
>> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
> 
> I will add such test cases.
> 
>> 25 - 0007
>> ```
>> +-- basic test with none greedy pattern
>> ```
>> 
>> Typo: “none greedy” -> non-greedy
> 
> Will fix.
> 
>> No comment for 0008.
> 
> Thanks again for the review. I will post v35 patch set soon.  Attached
> is the diff from v34.

Attached are the v35 patches for Row pattern recognition.  Chao Li
reviewed v34 thoroughly. Thank you! v35 reflects the review
comments. Major differences from v34 include:

- Make "DEFINE" an unreserved keyword. Previously it was a reserved keyword.
- Refactor transformDefineClause() to make two foreach loops into single foreach loop.
- Make '?' quantifier in PATTERN work as advertised. Test for '?' quantifier is added too.
- Unsupported quantifier test added.
- Fix get_rule_define().
- Fix memory leak related to regcomp.
- Move regcomp compiled result cache from static data to WindowAggState.
- Fix several typos.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Вложения

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:

Re: Row pattern recognition

От
Henson Choi
Дата:
SQL/RPR Patch Review Report
============================

Patch: Row Pattern Recognition (SQL:2016)
Commitfest: https://commitfest.postgresql.org/patch/4460

Review Methodology:
  This review focused on quality assessment, not line-by-line code audit.
  Key code paths and quality issues were examined with surrounding context
  when concerns arose. Documentation files were reviewed with AI-assisted
  grammar and typo checking. Code coverage was measured using gcov and
  custom analysis tools.

Limitations:
  - Not a security audit or formal verification
  - Parser and executor internals reviewed at module level, not exhaustively
  - Performance characteristics not benchmarked


TABLE OF CONTENTS
-----------------

1. Executive Summary
2. Issues Found
   2.1 Critical / Major
   2.2 Minor Issues (Review Needed)
   2.3 Minor Issues (Style)
   2.4 Suggestions for Discussion
3. Test Coverage
   3.1 Covered Areas
   3.2 Untested Items
   3.3 Unimplemented Features (No Test Needed)
4. Code Coverage Analysis
   4.1 Overall Coverage
   4.2 Coverage by File
   4.3 Uncovered Code Risk Assessment
5. Commit Analysis
6. Recommendations
7. Conclusion


1. EXECUTIVE SUMMARY
--------------------

Overall assessment: GOOD

The SQL/RPR patch demonstrates solid implementation quality within its
defined scope. Code follows PostgreSQL coding standards (with minor style
issues), test coverage is comprehensive at 96.4%, and documentation is
thorough with only minor typos.

Rating by Area:
- Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes recommended)
- Test Coverage:    Good (96.4% line coverage, 28 test cases)
- Documentation:    Good (Complete, 1 minor typo)
- Build/Regress:    Pass (make check-world, 753 tests passed)


2. ISSUES FOUND
---------------

2.1 Critical / Major

#1 [Code] Greedy pattern combinatorial explosion
   Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
   Impact: Memory exhaustion or infinite wait
   Recommendation: Add work_mem-based memory limit (error on exceed)

   Evidence - No memory limit in current code:
   - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
   - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc() unlimited
   - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit

   Only work_mem usage in RPR (nodeWindowAgg.c:1297):
     winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
   -> For tuple buffer, unrelated to pattern combination memory (StringSet)

2.2 Minor Issues (Review Needed)

#1 [Code] nodeWindowAgg.c:5849,5909,5912
   pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN elements

   Reproduction:
   - PATTERN (A B C ... Z A) (27 elements) -> OK
   - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid char: b"

2.3 Minor Issues (Style)

#1 [Code] nodeWindowAgg.c (25 blocks)
   #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove

#2 [Code] src/backend/executor/nodeWindowAgg.c
   static keyword on separate line (26 functions)

#3 [Code] src/backend/utils/adt/ruleutils.c
   Whitespace typo: "regexp !=NULL" -> "regexp != NULL"

#4 [Code] nodeWindowAgg.c:4619
   Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)

#5 [Doc] doc/src/sgml/ref/select.sgml:1128
   Typo: "maximu" -> "maximum"

2.4 Suggestions for Discussion

#1 Incremental matching with streaming NFA redesign
   Benefits:
   - O(n) time complexity per row (current: exponential in worst case)
   - Bounded memory via state merging and context absorption
   - Natural extension for OR patterns, {n,m} quantifiers, nested groups
   - Enables MEASURES clause with incremental aggregates
   - Proven approach in CEP engines (Flink, Esper)


3. TEST COVERAGE
----------------

3.1 Covered Areas

- PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
- DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
- PREV/NEXT functions: Single argument (line 44, 173)
- AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
- Aggregate integration: sum, avg, count, max, min (line 258-277)
- Window function integration: first_value, last_value, nth_value (line 34-35)
- JOIN/CTE: JOIN (line 313), WITH (line 324)
- View: VIEW creation, pg_get_viewdef (line 390-406)
- Error cases: 7 error scenarios (line 409-532)
- Large partition: 5000 row smoke test (line 360-387)
- ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)

3.2 Untested Items

Feature                  Priority   Recommendation
-------------------------------------------------------------------------------
26 variable limit        Medium     Test 26 variables success, 27th variable error
NULL value handling      Low        Test PREV(col) where col or previous row is NULL

Sample test for 26 variable limit:

    -- Should fail with 27th variable (parser error, no table needed)
    SELECT * FROM (SELECT 1 AS x) t
    WINDOW w AS (
      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
      PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
      DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS TRUE,
             g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS TRUE,
             m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS TRUE,
             s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS TRUE,
             y AS TRUE, z AS TRUE, aa AS TRUE
    );
    -- ERROR: number of row pattern definition variable names exceeds 26

Sample test for NULL handling:

    CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
    INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
    INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in middle
    INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
    INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);

    SELECT company, tdate, price, count(*) OVER w AS match_count
    FROM stock_null
    WINDOW w AS (
      PARTITION BY company
      ORDER BY tdate
      ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
      PATTERN (START UP DOWN)
      DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price < PREV(price)
    );
    -- Result: all rows show match_count = 0 (NULL breaks pattern matching)

3.3 Unimplemented Features (No Test Needed)

Per documentation (select.sgml:1133-1136):
- MEASURES clause: Not implemented
- SUBSET clause: Not implemented
- AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported

Not in documentation, verified by testing:
- {n,m} quantifier: Parser error ("syntax error at or near {")
- (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
- (A B)+ compound repetition: Parser accepts, but execution returns 0 matches


4. CODE COVERAGE ANALYSIS
-------------------------

4.1 Overall Coverage

96.4% (732 / 759 lines)

4.2 Coverage by File (major RPR-modified files)

nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
ruleutils.c:        93.1% (54/58) - pg_get_viewdef output

4.3 Uncovered Code Risk Assessment

Total: 27 lines uncovered

Medium Risk (2 items) - Test addition recommended (see section 3.2):
- parse_clause.c:4093: transformDefineClause - 27th variable error
- nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first row

Low Risk (25 items) - Defensive code / Unreachable:
- nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
- nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
- nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
- nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
- nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
- nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
- nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
- nodeWindowAgg.c:5007: search_str_set - NULL return continue
- nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
- nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
- nodeWindowAgg.c:5700: pattern_initial - Variable not found
- nodeWindowAgg.c:5776: string_set_get - Index range check
- nodeWindowAgg.c:5850: variable_pos_register - a-z range check
- nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
- nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
- parse_clause.c:3989: transformDefineClause - A_Expr type check
- parse_clause.c:4145: transformPatternClause - A_Expr type check
- ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output

Conclusion: Most uncovered code consists of defensive error handling or
code unreachable due to parser pre-validation. No security or functional risk.


5. COMMIT ANALYSIS
------------------

8 sequential commits:

Commit  Area            Files   +/-         Key Content
-------------------------------------------------------------------------------
1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
3       Rewriter        1       +109/-0     pg_get_viewdef extension
4       Planner         5       +73/-3      WindowAgg plan node extension
5       Executor        4       +1,942/-11  CORE: Pattern matching engine (+1,850)
6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
8       typedefs        1       +6/-0       pgindent support

Code Change Statistics:
- Total files: 25
- Lines added: 4,358
- Lines deleted: 39
- Net increase: +4,319 lines


6. RECOMMENDATIONS
------------------

6.1 Combinatorial Explosion Prevention (Major, Required)

Add work_mem-based memory limit for StringSet allocation.
Location: string_set_add() in nodeWindowAgg.c:5741-5750
Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)

6.2 Code Review Required (Minor, Decision Needed)

Location: nodeWindowAgg.c:5849,5909,5912
Issue: pos > NUM_ALPHABETS check intent unclear
Current: PATTERN with 28 elements causes error
Question: Is 27 element limit intentional?

6.3 Code Style Fixes (Minor)

- #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
- static keyword on separate line: 26 functions to fix
- Whitespace: "regexp !=NULL" -> "regexp != NULL"
- Error message case: "Unrecognized" -> "unrecognized"

6.4 Documentation Fixes (Minor)

- select.sgml: "maximu" -> "maximum"

6.5 Test Additions (Optional)

Black-box Tests (Functional):

Feature              Test Case                                    Priority
-------------------------------------------------------------------------------
Variable limit       26 variables success, 27 error               Medium
NULL boundary        PREV at partition first row                  Medium

White-box Tests (Coverage) - covered by above black-box tests:

File:Line                          Target Branch
-------------------------------------------------------------------------------
parse_clause.c:4093                Limit error branch (Variable limit test)
nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)


7. CONCLUSION
-------------

Test Quality: GOOD

Core functionality is thoroughly tested with comprehensive error case coverage.

The patch is well-implemented within its defined scope. Identified issues include
one major concern (combinatorial explosion) and minor style/documentation items.

Recommended actions before commit:
1. Add work_mem-based memory limit for pattern combinations (required)
2. Clarify pos > NUM_ALPHABETS check intent (review needed)
3. Fix code style issues (optional but recommended)
4. Fix documentation typo (trivial)
5. Add tests for variable limit and NULL handling (optional)

Points for discussion (optional):
6. Incremental matching with streaming NFA redesign


Attachment:
- coverage.tgz (gcov HTML report, RPR-modified code only)


---
End of Report

2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
Attached are the v33 patches for Row pattern recognition.  The
difference from v32 is that the raw parse tree printing patch is not
included in v33. This is because now that we have
debug_print_raw_parse.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Вложения

Re: Row pattern recognition

От
Henson Choi
Дата:
Hi hackers,

I ran Valgrind memcheck against the RPR (Row Pattern Recognition) implementation
to verify there are no memory issues introduced by the new Row Pattern Recognition feature.

== Test Environment ==

- PostgreSQL: 19devel (master branch)
- Valgrind: 3.22.0
- Platform: Linux x86_64 (RHEL 9)
- Tests executed: rpr.sql

== Executive Summary ==

  +----------------------------------------------------------+
  | RPR Implementation: NO MEMORY ISSUES DETECTED            |
  +----------------------------------------------------------+

All detected leaks originate from existing PostgreSQL core components.
No RPR-related functions appear in any leak stack traces.

== Backend Process 541453: Itemized Leak Analysis ==

Total: 37 bytes definitely lost in 1 block, 1 unique leak context

+================================================================================+
| #  | Bytes | Blocks | Allocator | Root Cause       | RPR? | Verdict           |
+================================================================================+
| 1  | 37    | 1      | strdup    | Postmaster init  | NO   | HARMLESS          |
+================================================================================+
     TOTAL: 37 bytes in 1 block
     RPR-RELATED LEAKS: 0

== RPR Verification ==

All leak stack traces were examined. Key functions NOT present:

  - ExecInitWindowAgg (RPR path)
  - row_pattern_initial, row_pattern_final
  - Any function from src/backend/executor/nodeWindowAgg.c (RPR code paths)
  - Any RPR-related parser functions (PATTERN, DEFINE)

The RPR tests exercised window functions with PATTERN, DEFINE clauses
(e.g., PATTERN (START UP+ DOWN+)), yet none of these code paths appear in leaks.

== Process Summary ==

+--------+----------------+-----------------+----------+--------------------+
| PID    | Type           | Definitely Lost | Errors   | Notes              |
+--------+----------------+-----------------+----------+--------------------+
| 541303 | postmaster     | 37 bytes        | 1        | strdup at startup  |
| 541453 | backend        | 37 bytes        | 12       | See above analysis |
| 541314 | checkpointer   | 37 bytes        | 1        | strdup at startup  |
| 541315 | bgwriter       | 37 bytes        | 1        | strdup at startup  |
| 541316 | walwriter      | 37 bytes        | 1        | strdup at startup  |
| 541317 | autovacuum     | 37 bytes        | 1        | strdup at startup  |
| 541318 | logical rep    | 37 bytes        | 1        | strdup at startup  |
| 541319 | syslogger      | 37 bytes        | 1        | strdup at startup  |
| Others | aux workers    | 37 bytes each   | 1 each   | strdup at startup  |
+--------+----------------+-----------------+----------+--------------------+

== Possibly Lost (LLVM JIT) ==

Additionally, ~16KB in ~165 blocks reported as "possibly lost" from LLVM JIT:
  - llvm::MDTuple, llvm::User allocations
  - Origin: llvm_compile_expr -> jit_compile_expr -> ExecReadyExpr
  - These are LLVM's internal caches, not PostgreSQL leaks

== Other Memory Errors ==

+--------------------------------------------------------------------------+
| NO memory access errors detected:                                        |
|   - Invalid read/write: 0                                                |
|   - Use of uninitialized value: 0                                        |
|   - Conditional jump on uninitialized value: 0                           |
|   - Syscall param errors: 0                                              |
|   - Overlap in memcpy/strcpy: 0                                          |
+--------------------------------------------------------------------------+

Suppressed errors (via valgrind.supp):
  - 541453 (backend): 46 errors from 3 contexts suppressed
  - Other processes: ~0 errors each suppressed

Suppression rules in valgrind.supp (12 categories):
  1. Padding        - write() syscalls with uninitialized struct padding
  2. CRC            - CRC32 calculation on padded buffers
  3. Overlap        - memcpy overlap in bootstrap/relmap
  4. glibc          - Dynamic loader internals
  5. Regex/Pattern  - RE_compile_and_cache, patternsel
  6. Planner        - query_planner memcpy overlap
  7. Cache          - RelCache, CatCache, TypeCache possible leaks
  8. XLogReader     - WAL reader allocations
  9. Parallel       - ParallelWorkerMain allocations
  10. AutoVacuum    - AutoVacWorkerMain allocations
  11. GUC/Backend   - set_config_option, InitPostgres
  12. PL/pgSQL      - plpgsql_compile, SPI_prepare, CachedPlan

All suppressed errors are known, harmless PostgreSQL behaviors.
See attached valgrind.supp for details.

== Conclusion ==

The RPR implementation introduces NO memory leaks. All detected issues
are pre-existing PostgreSQL behaviors related to:

  1. Postmaster startup allocations
  2. LLVM JIT internal caching

These are by-design patterns that trade minimal memory for performance.

== Files Attached ==

- valgrind.supp: Suppression file used for this test
- 541303.log: Postmaster log
- 541453.log: Backend log with full leak details (56KB)
- 5413xx.log: Auxiliary process logs

--
Regards


2025년 12월 23일 (화) PM 10:28, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
Attached are the v36 patches, just for rebasing.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Вложения

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson Choi,

> Hi hackers,
> 
> I ran Valgrind memcheck against the RPR (Row Pattern Recognition)
> implementation
> to verify there are no memory issues introduced by the new Row Pattern
> Recognition feature.

Thanks for the test!

> == Conclusion ==
> 
> The RPR implementation introduces NO memory leaks.

Glad to hear that.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson,

Thank you for the review! I will take care the issues (it will take
sometime).

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> SQL/RPR Patch Review Report
> ============================
> 
> Patch: Row Pattern Recognition (SQL:2016)
> Commitfest: https://commitfest.postgresql.org/patch/4460
> 
> Review Methodology:
>   This review focused on quality assessment, not line-by-line code audit.
>   Key code paths and quality issues were examined with surrounding context
>   when concerns arose. Documentation files were reviewed with AI-assisted
>   grammar and typo checking. Code coverage was measured using gcov and
>   custom analysis tools.
> 
> Limitations:
>   - Not a security audit or formal verification
>   - Parser and executor internals reviewed at module level, not exhaustively
>   - Performance characteristics not benchmarked
> 
> 
> TABLE OF CONTENTS
> -----------------
> 
> 1. Executive Summary
> 2. Issues Found
>    2.1 Critical / Major
>    2.2 Minor Issues (Review Needed)
>    2.3 Minor Issues (Style)
>    2.4 Suggestions for Discussion
> 3. Test Coverage
>    3.1 Covered Areas
>    3.2 Untested Items
>    3.3 Unimplemented Features (No Test Needed)
> 4. Code Coverage Analysis
>    4.1 Overall Coverage
>    4.2 Coverage by File
>    4.3 Uncovered Code Risk Assessment
> 5. Commit Analysis
> 6. Recommendations
> 7. Conclusion
> 
> 
> 1. EXECUTIVE SUMMARY
> --------------------
> 
> Overall assessment: GOOD
> 
> The SQL/RPR patch demonstrates solid implementation quality within its
> defined scope. Code follows PostgreSQL coding standards (with minor style
> issues), test coverage is comprehensive at 96.4%, and documentation is
> thorough with only minor typos.
> 
> Rating by Area:
> - Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
> recommended)
> - Test Coverage:    Good (96.4% line coverage, 28 test cases)
> - Documentation:    Good (Complete, 1 minor typo)
> - Build/Regress:    Pass (make check-world, 753 tests passed)
> 
> 
> 2. ISSUES FOUND
> ---------------
> 
> 2.1 Critical / Major
> 
> #1 [Code] Greedy pattern combinatorial explosion
>    Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
>    Impact: Memory exhaustion or infinite wait
>    Recommendation: Add work_mem-based memory limit (error on exceed)
> 
>    Evidence - No memory limit in current code:
>    - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
>    - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
> unlimited
>    - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
> 
>    Only work_mem usage in RPR (nodeWindowAgg.c:1297):
>      winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
>    -> For tuple buffer, unrelated to pattern combination memory (StringSet)
> 
> 2.2 Minor Issues (Review Needed)
> 
> #1 [Code] nodeWindowAgg.c:5849,5909,5912
>    pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> elements
> 
>    Reproduction:
>    - PATTERN (A B C ... Z A) (27 elements) -> OK
>    - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
> char: b"
> 
> 2.3 Minor Issues (Style)
> 
> #1 [Code] nodeWindowAgg.c (25 blocks)
>    #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
> 
> #2 [Code] src/backend/executor/nodeWindowAgg.c
>    static keyword on separate line (26 functions)
> 
> #3 [Code] src/backend/utils/adt/ruleutils.c
>    Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
> 
> #4 [Code] nodeWindowAgg.c:4619
>    Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)
> 
> #5 [Doc] doc/src/sgml/ref/select.sgml:1128
>    Typo: "maximu" -> "maximum"
> 
> 2.4 Suggestions for Discussion
> 
> #1 Incremental matching with streaming NFA redesign
>    Benefits:
>    - O(n) time complexity per row (current: exponential in worst case)
>    - Bounded memory via state merging and context absorption
>    - Natural extension for OR patterns, {n,m} quantifiers, nested groups
>    - Enables MEASURES clause with incremental aggregates
>    - Proven approach in CEP engines (Flink, Esper)
> 
> 
> 3. TEST COVERAGE
> ----------------
> 
> 3.1 Covered Areas
> 
> - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
> - PREV/NEXT functions: Single argument (line 44, 173)
> - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> - Aggregate integration: sum, avg, count, max, min (line 258-277)
> - Window function integration: first_value, last_value, nth_value (line
> 34-35)
> - JOIN/CTE: JOIN (line 313), WITH (line 324)
> - View: VIEW creation, pg_get_viewdef (line 390-406)
> - Error cases: 7 error scenarios (line 409-532)
> - Large partition: 5000 row smoke test (line 360-387)
> - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
> 
> 3.2 Untested Items
> 
> Feature                  Priority   Recommendation
> -------------------------------------------------------------------------------
> 26 variable limit        Medium     Test 26 variables success, 27th
> variable error
> NULL value handling      Low        Test PREV(col) where col or previous
> row is NULL
> 
> Sample test for 26 variable limit:
> 
>     -- Should fail with 27th variable (parser error, no table needed)
>     SELECT * FROM (SELECT 1 AS x) t
>     WINDOW w AS (
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
>       DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> TRUE,
>              g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> TRUE,
>              m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> TRUE,
>              s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> TRUE,
>              y AS TRUE, z AS TRUE, aa AS TRUE
>     );
>     -- ERROR: number of row pattern definition variable names exceeds 26
> 
> Sample test for NULL handling:
> 
>     CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
> middle
>     INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
> 
>     SELECT company, tdate, price, count(*) OVER w AS match_count
>     FROM stock_null
>     WINDOW w AS (
>       PARTITION BY company
>       ORDER BY tdate
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (START UP DOWN)
>       DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> PREV(price)
>     );
>     -- Result: all rows show match_count = 0 (NULL breaks pattern matching)
> 
> 3.3 Unimplemented Features (No Test Needed)
> 
> Per documentation (select.sgml:1133-1136):
> - MEASURES clause: Not implemented
> - SUBSET clause: Not implemented
> - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
> 
> Not in documentation, verified by testing:
> - {n,m} quantifier: Parser error ("syntax error at or near {")
> - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> - (A B)+ compound repetition: Parser accepts, but execution returns 0
> matches
> 
> 
> 4. CODE COVERAGE ANALYSIS
> -------------------------
> 
> 4.1 Overall Coverage
> 
> 96.4% (732 / 759 lines)
> 
> 4.2 Coverage by File (major RPR-modified files)
> 
> nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
> parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
> ruleutils.c:        93.1% (54/58) - pg_get_viewdef output
> 
> 4.3 Uncovered Code Risk Assessment
> 
> Total: 27 lines uncovered
> 
> Medium Risk (2 items) - Test addition recommended (see section 3.2):
> - parse_clause.c:4093: transformDefineClause - 27th variable error
> - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> row
> 
> Low Risk (25 items) - Defensive code / Unreachable:
> - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> - nodeWindowAgg.c:5776: string_set_get - Index range check
> - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> - parse_clause.c:3989: transformDefineClause - A_Expr type check
> - parse_clause.c:4145: transformPatternClause - A_Expr type check
> - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
> 
> Conclusion: Most uncovered code consists of defensive error handling or
> code unreachable due to parser pre-validation. No security or functional
> risk.
> 
> 
> 5. COMMIT ANALYSIS
> ------------------
> 
> 8 sequential commits:
> 
> Commit  Area            Files   +/-         Key Content
> -------------------------------------------------------------------------------
> 1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
> 2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
> 3       Rewriter        1       +109/-0     pg_get_viewdef extension
> 4       Planner         5       +73/-3      WindowAgg plan node extension
> 5       Executor        4       +1,942/-11  CORE: Pattern matching engine
> (+1,850)
> 6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
> 7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
> 8       typedefs        1       +6/-0       pgindent support
> 
> Code Change Statistics:
> - Total files: 25
> - Lines added: 4,358
> - Lines deleted: 39
> - Net increase: +4,319 lines
> 
> 
> 6. RECOMMENDATIONS
> ------------------
> 
> 6.1 Combinatorial Explosion Prevention (Major, Required)
> 
> Add work_mem-based memory limit for StringSet allocation.
> Location: string_set_add() in nodeWindowAgg.c:5741-5750
> Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
> 
> 6.2 Code Review Required (Minor, Decision Needed)
> 
> Location: nodeWindowAgg.c:5849,5909,5912
> Issue: pos > NUM_ALPHABETS check intent unclear
> Current: PATTERN with 28 elements causes error
> Question: Is 27 element limit intentional?
> 
> 6.3 Code Style Fixes (Minor)
> 
> - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> - static keyword on separate line: 26 functions to fix
> - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> - Error message case: "Unrecognized" -> "unrecognized"
> 
> 6.4 Documentation Fixes (Minor)
> 
> - select.sgml: "maximu" -> "maximum"
> 
> 6.5 Test Additions (Optional)
> 
> Black-box Tests (Functional):
> 
> Feature              Test Case                                    Priority
> -------------------------------------------------------------------------------
> Variable limit       26 variables success, 27 error               Medium
> NULL boundary        PREV at partition first row                  Medium
> 
> White-box Tests (Coverage) - covered by above black-box tests:
> 
> File:Line                          Target Branch
> -------------------------------------------------------------------------------
> parse_clause.c:4093                Limit error branch (Variable limit test)
> nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)
> 
> 
> 7. CONCLUSION
> -------------
> 
> Test Quality: GOOD
> 
> Core functionality is thoroughly tested with comprehensive error case
> coverage.
> 
> The patch is well-implemented within its defined scope. Identified issues
> include
> one major concern (combinatorial explosion) and minor style/documentation
> items.
> 
> Recommended actions before commit:
> 1. Add work_mem-based memory limit for pattern combinations (required)
> 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> 3. Fix code style issues (optional but recommended)
> 4. Fix documentation typo (trivial)
> 5. Add tests for variable limit and NULL handling (optional)
> 
> Points for discussion (optional):
> 6. Incremental matching with streaming NFA redesign
> 
> 
> Attachment:
> - coverage.tgz (gcov HTML report, RPR-modified code only)
> 
> 
> ---
> End of Report
> 
> 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
> 
>> Attached are the v33 patches for Row pattern recognition.  The
>> difference from v32 is that the raw parse tree printing patch is not
>> included in v33. This is because now that we have
>> debug_print_raw_parse.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>



Re: Row pattern recognition

От
Henson Choi
Дата:

Hi Ishii-san,

Glad the review was useful!

2026년 1월 2일 (금) AM 11:16, Tatsuo Ishii <ishii@postgresql.org>님이 작성:

Hi Henson,

Thank you for the review! I will take care the issues (it will take
sometime).


If you have other patches that need pre-analysis (coverage, style, valgrind, etc.), feel free to ask anytime. I enjoy this kind of work.

 
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> SQL/RPR Patch Review Report
> ============================
>
> Patch: Row Pattern Recognition (SQL:2016)
> Commitfest: https://commitfest.postgresql.org/patch/4460
>
> Review Methodology:
>   This review focused on quality assessment, not line-by-line code audit.
>   Key code paths and quality issues were examined with surrounding context
>   when concerns arose. Documentation files were reviewed with AI-assisted
>   grammar and typo checking. Code coverage was measured using gcov and
>   custom analysis tools.
>
> Limitations:
>   - Not a security audit or formal verification
>   - Parser and executor internals reviewed at module level, not exhaustively
>   - Performance characteristics not benchmarked
>
>
> TABLE OF CONTENTS
> -----------------
>
> 1. Executive Summary
> 2. Issues Found
>    2.1 Critical / Major
>    2.2 Minor Issues (Review Needed)
>    2.3 Minor Issues (Style)
>    2.4 Suggestions for Discussion
> 3. Test Coverage
>    3.1 Covered Areas
>    3.2 Untested Items
>    3.3 Unimplemented Features (No Test Needed)
> 4. Code Coverage Analysis
>    4.1 Overall Coverage
>    4.2 Coverage by File
>    4.3 Uncovered Code Risk Assessment
> 5. Commit Analysis
> 6. Recommendations
> 7. Conclusion
>
>
> 1. EXECUTIVE SUMMARY
> --------------------
>
> Overall assessment: GOOD
>
> The SQL/RPR patch demonstrates solid implementation quality within its
> defined scope. Code follows PostgreSQL coding standards (with minor style
> issues), test coverage is comprehensive at 96.4%, and documentation is
> thorough with only minor typos.
>
> Rating by Area:
> - Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
> recommended)
> - Test Coverage:    Good (96.4% line coverage, 28 test cases)
> - Documentation:    Good (Complete, 1 minor typo)
> - Build/Regress:    Pass (make check-world, 753 tests passed)
>
>
> 2. ISSUES FOUND
> ---------------
>
> 2.1 Critical / Major
>
> #1 [Code] Greedy pattern combinatorial explosion
>    Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
>    Impact: Memory exhaustion or infinite wait
>    Recommendation: Add work_mem-based memory limit (error on exceed)
>
>    Evidence - No memory limit in current code:
>    - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
>    - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
> unlimited
>    - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
>
>    Only work_mem usage in RPR (nodeWindowAgg.c:1297):
>      winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
>    -> For tuple buffer, unrelated to pattern combination memory (StringSet)
>
> 2.2 Minor Issues (Review Needed)
>
> #1 [Code] nodeWindowAgg.c:5849,5909,5912
>    pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> elements
>
>    Reproduction:
>    - PATTERN (A B C ... Z A) (27 elements) -> OK
>    - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
> char: b"
>
> 2.3 Minor Issues (Style)
>
> #1 [Code] nodeWindowAgg.c (25 blocks)
>    #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
>
> #2 [Code] src/backend/executor/nodeWindowAgg.c
>    static keyword on separate line (26 functions)
>
> #3 [Code] src/backend/utils/adt/ruleutils.c
>    Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
>
> #4 [Code] nodeWindowAgg.c:4619
>    Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)
>
> #5 [Doc] doc/src/sgml/ref/select.sgml:1128
>    Typo: "maximu" -> "maximum"
>
> 2.4 Suggestions for Discussion
>
> #1 Incremental matching with streaming NFA redesign
>    Benefits:
>    - O(n) time complexity per row (current: exponential in worst case)
>    - Bounded memory via state merging and context absorption
>    - Natural extension for OR patterns, {n,m} quantifiers, nested groups
>    - Enables MEASURES clause with incremental aggregates
>    - Proven approach in CEP engines (Flink, Esper)
>
>
> 3. TEST COVERAGE
> ----------------
>
> 3.1 Covered Areas
>
> - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
> - PREV/NEXT functions: Single argument (line 44, 173)
> - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> - Aggregate integration: sum, avg, count, max, min (line 258-277)
> - Window function integration: first_value, last_value, nth_value (line
> 34-35)
> - JOIN/CTE: JOIN (line 313), WITH (line 324)
> - View: VIEW creation, pg_get_viewdef (line 390-406)
> - Error cases: 7 error scenarios (line 409-532)
> - Large partition: 5000 row smoke test (line 360-387)
> - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
>
> 3.2 Untested Items
>
> Feature                  Priority   Recommendation
> -------------------------------------------------------------------------------
> 26 variable limit        Medium     Test 26 variables success, 27th
> variable error
> NULL value handling      Low        Test PREV(col) where col or previous
> row is NULL
>
> Sample test for 26 variable limit:
>
>     -- Should fail with 27th variable (parser error, no table needed)
>     SELECT * FROM (SELECT 1 AS x) t
>     WINDOW w AS (
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
>       DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> TRUE,
>              g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> TRUE,
>              m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> TRUE,
>              s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> TRUE,
>              y AS TRUE, z AS TRUE, aa AS TRUE
>     );
>     -- ERROR: number of row pattern definition variable names exceeds 26
>
> Sample test for NULL handling:
>
>     CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
> middle
>     INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
>
>     SELECT company, tdate, price, count(*) OVER w AS match_count
>     FROM stock_null
>     WINDOW w AS (
>       PARTITION BY company
>       ORDER BY tdate
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (START UP DOWN)
>       DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> PREV(price)
>     );
>     -- Result: all rows show match_count = 0 (NULL breaks pattern matching)
>
> 3.3 Unimplemented Features (No Test Needed)
>
> Per documentation (select.sgml:1133-1136):
> - MEASURES clause: Not implemented
> - SUBSET clause: Not implemented
> - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
>
> Not in documentation, verified by testing:
> - {n,m} quantifier: Parser error ("syntax error at or near {")
> - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> - (A B)+ compound repetition: Parser accepts, but execution returns 0
> matches
>
>
> 4. CODE COVERAGE ANALYSIS
> -------------------------
>
> 4.1 Overall Coverage
>
> 96.4% (732 / 759 lines)
>
> 4.2 Coverage by File (major RPR-modified files)
>
> nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
> parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
> ruleutils.c:        93.1% (54/58) - pg_get_viewdef output
>
> 4.3 Uncovered Code Risk Assessment
>
> Total: 27 lines uncovered
>
> Medium Risk (2 items) - Test addition recommended (see section 3.2):
> - parse_clause.c:4093: transformDefineClause - 27th variable error
> - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> row
>
> Low Risk (25 items) - Defensive code / Unreachable:
> - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> - nodeWindowAgg.c:5776: string_set_get - Index range check
> - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> - parse_clause.c:3989: transformDefineClause - A_Expr type check
> - parse_clause.c:4145: transformPatternClause - A_Expr type check
> - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
>
> Conclusion: Most uncovered code consists of defensive error handling or
> code unreachable due to parser pre-validation. No security or functional
> risk.
>
>
> 5. COMMIT ANALYSIS
> ------------------
>
> 8 sequential commits:
>
> Commit  Area            Files   +/-         Key Content
> -------------------------------------------------------------------------------
> 1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
> 2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
> 3       Rewriter        1       +109/-0     pg_get_viewdef extension
> 4       Planner         5       +73/-3      WindowAgg plan node extension
> 5       Executor        4       +1,942/-11  CORE: Pattern matching engine
> (+1,850)
> 6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
> 7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
> 8       typedefs        1       +6/-0       pgindent support
>
> Code Change Statistics:
> - Total files: 25
> - Lines added: 4,358
> - Lines deleted: 39
> - Net increase: +4,319 lines
>
>
> 6. RECOMMENDATIONS
> ------------------
>
> 6.1 Combinatorial Explosion Prevention (Major, Required)
>
> Add work_mem-based memory limit for StringSet allocation.
> Location: string_set_add() in nodeWindowAgg.c:5741-5750
> Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
>
> 6.2 Code Review Required (Minor, Decision Needed)
>
> Location: nodeWindowAgg.c:5849,5909,5912
> Issue: pos > NUM_ALPHABETS check intent unclear
> Current: PATTERN with 28 elements causes error
> Question: Is 27 element limit intentional?
>
> 6.3 Code Style Fixes (Minor)
>
> - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> - static keyword on separate line: 26 functions to fix
> - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> - Error message case: "Unrecognized" -> "unrecognized"
>
> 6.4 Documentation Fixes (Minor)
>
> - select.sgml: "maximu" -> "maximum"
>
> 6.5 Test Additions (Optional)
>
> Black-box Tests (Functional):
>
> Feature              Test Case                                    Priority
> -------------------------------------------------------------------------------
> Variable limit       26 variables success, 27 error               Medium
> NULL boundary        PREV at partition first row                  Medium
>
> White-box Tests (Coverage) - covered by above black-box tests:
>
> File:Line                          Target Branch
> -------------------------------------------------------------------------------
> parse_clause.c:4093                Limit error branch (Variable limit test)
> nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)
>
>
> 7. CONCLUSION
> -------------
>
> Test Quality: GOOD
>
> Core functionality is thoroughly tested with comprehensive error case
> coverage.
>
> The patch is well-implemented within its defined scope. Identified issues
> include
> one major concern (combinatorial explosion) and minor style/documentation
> items.
>
> Recommended actions before commit:
> 1. Add work_mem-based memory limit for pattern combinations (required)
> 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> 3. Fix code style issues (optional but recommended)
> 4. Fix documentation typo (trivial)
> 5. Add tests for variable limit and NULL handling (optional)
>
> Points for discussion (optional):
> 6. Incremental matching with streaming NFA redesign
>
>
> Attachment:
> - coverage.tgz (gcov HTML report, RPR-modified code only)
>
>
> ---
> End of Report
>
> 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
>
>> Attached are the v33 patches for Row pattern recognition.  The
>> difference from v32 is that the raw parse tree printing patch is not
>> included in v33. This is because now that we have
>> debug_print_raw_parse.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>

Best regards,

Henson 

Re: Row pattern recognition

От
Henson Choi
Дата:
Hi hackers,

I've been working on an alternative implementation approach for Row
Pattern Recognition (RPR) that addresses some limitations in the
current regex-based patch.

Key differences from the existing approach:

1. Streaming NFA instead of regex engine
   - Process rows incrementally without accumulating encoded strings
   - Avoids O(V^N) combinatorial explosion in multi-variable scenarios

2. No 26-variable limit
   - Variables identified by index, not alphabet encoding

3. Proper Lexical Order support
   - Respects PATTERN alternative order for ONE ROW PER MATCH

4. GREEDY/RELUCTANT quantifier support
   - Longest vs shortest match semantics

5. Incremental MEASURES computation
   - Aggregate values computed during matching, no rescan needed

The attached document describes the design in detail, including:
- Data structures (Pattern, MatchState, MatchContext)
- Execution flow and state transitions
- Context absorption optimization for greedy patterns
- AST optimization passes

This is still a work in progress. I'd appreciate any feedback on
the approach before proceeding with PostgreSQL integration.

Best regards,
Henson

------------------------------------------------------------------------
                         Attachment
------------------------------------------------------------------------


========================================================================
      RPR Streaming NFA Pattern Matching System Overview (DRAFT)
========================================================================

0. Motivation and Approach
------------------------------------------------------------------------

0.1 Background: Limitations of PostgreSQL RPR Patch

    The existing PostgreSQL RPR implementation (RPR-base branch) uses
    a regex-based approach:

    ┌─────────────────────────────────────────────────────────────────┐
    │  Existing Implementation (Regex-based)                          │
    ├─────────────────────────────────────────────────────────────────┤
    │                                                                 │
    │  1. Evaluate all DEFINE conditions for each row                 │
    │     → Encode matching variables as alphabets (a, b, c, ...)     │
    │                                                                 │
    │  2. Accumulate encoded string                                   │
    │     "aabbc" (A match, A match, B match, B match, C match)       │
    │                                                                 │
    │  3. Convert PATTERN to regex                                    │
    │     PATTERN (A+ B+ C) → "^a+b+c"                                │
    │                                                                 │
    │  4. Match using PostgreSQL regex engine                         │
    │     pg_regexec(&preg, encoded_str, ...)                         │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

    Limitations (based on nodeWindowAgg.c analysis):

    ┌──────────────────┬──────────────────────────────────────────────┐
    │ Limitation       │ Cause                                        │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Combinatorial    │ Cartesian product per row in                 │
    │ Explosion O(V^N) │ generate_patterns(). 3 vars, 20 rows → 3.4B  │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ 26 Variable      │ a-z alphabet encoding (NUM_ALPHABETS=26)     │
    │ Limit            │                                              │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Lexical Order    │ Encoded by DEFINE declaration order,         │
    │ Not Guaranteed   │ PATTERN alternative order ignored.           │
    │                  │ In (B|A) pattern, A encoded first            │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ No RELUCTANT     │ Only greedy flag exists, no shortest match   │
    │ Support          │ logic                                        │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ MEASURES Not     │ Rescan-based workaround, no incremental      │
    │ Implemented      │ aggregation                                  │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ O(N^2) Retry on  │ No intermediate state maintained. On failure │
    │ Match Failure    │ at row N, retry from row K+1 from scratch    │
    └──────────────────┴──────────────────────────────────────────────┘


0.2 Approach: Streaming NFA-based Pattern Matching

    We adopt an NFA-based approach similar to Apache Flink CEP:

    ┌─────────────────────────────────────────────────────────────────┐
    │  NFA-based Approach                                             │
    ├─────────────────────────────────────────────────────────────────┤
    │                                                                 │
    │  Core Ideas:                                                    │
    │    - Compile pattern to flat array (PatternElement[])           │
    │    - Perform state transitions per row (no regex matching)      │
    │    - Merge identical states to prevent redundant computation    │
    │                                                                 │
    │  Time Complexity: O(N x S x E)                                  │
    │    N: Number of input rows                                      │
    │    S: Number of concurrent active states (bounded by merge)     │
    │    E: Number of pattern elements                                │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

    Resolving Limitations:

    ┌──────────────────┬──────────────────────────────────────────────┐
    │ Original Limit   │ Resolution with Streaming NFA                │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Combinatorial    │ State transition based, identical state merge│
    │ Explosion O(V^N) │ → O(N×S×E), S proportional to pattern size   │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ 26 Variable      │ Integer varId, size adjustable as needed     │
    │ Limit            │                                              │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Lexical Order    │ #ALT branches generated in PATTERN order     │
    │ Not Guaranteed   │ → Alternative order preserved in paths[][]   │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ No RELUCTANT     │ Custom NFA enables shortest/longest match    │
    │ Support          │ control                                      │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ MEASURES Not     │ Incremental aggregation computes SUM, COUNT  │
    │ Implemented      │ etc. in real-time                            │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ O(N^2) Retry on  │ Multiple Contexts maintained concurrently    │
    │ Match Failure    │ → Parallel tracking per start point, O(N)    │
    └──────────────────┴──────────────────────────────────────────────┘

    New Advantages:

    ┌──────────────────┬──────────────────────────────────────────────┐
    │ Advantage        │ Description                                  │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ ALL ROWS Mode    │ Track all possible match paths concurrently  │
    │                  │ → Each Context branches/completes separately │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Context          │ Merge Contexts upon reaching identical state │
    │ Absorption       │ → Eliminate redundant computation, memory    │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Incremental      │ Compute SUM, COUNT etc. during matching      │
    │ Aggregation      │ → No rescan needed after completion          │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Flat Array       │ Contiguous memory layout for states[],       │
    │ Structure        │ contexts[] → Good cache locality, no pointer │
    ├──────────────────┼──────────────────────────────────────────────┤
    │ Path Sharing     │ Chunk tree + hash table for path sharing     │
    │                  │ → O(1) reference on fork, memory dedup       │
    └──────────────────┴──────────────────────────────────────────────┘


0.3 Difference from Flink: History Elimination

    ┌─────────────────────────────────────────────────────────────────┐
    │  Flink CEP (Streaming)              PostgreSQL (Batch)          │
    ├─────────────────────────────────────────────────────────────────┤
    │                                                                 │
    │  Original data flows away           Materialized data           │
    │  → Cannot re-access                 → Rescan anytime            │
    │  → Store row data in History        → No History needed         │
    │  → Pointer tree + ref counting      → Store match_start/end only│
    │                                                                 │
    │  Fork: pointer copy O(1)            Fork: copy counts[]         │
    │  MEASURES: backtrack History        MEASURES: rescan range      │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

    Our Implementation Choices:
      - Adopt Incremental Aggregation
      - Compute SUM, COUNT etc. in real-time during matching
      - Maintain only aggregate values without History pointers
      - Store path info as varId arrays in paths[][]


0.4 Design Goals

    ┌──────────────┬──────────────────────────────────────────────────┐
    │ Goal         │ Description                                      │
    ├──────────────┼──────────────────────────────────────────────────┤
    │ Linear       │ O(N×S×E), no combinatorial explosion             │
    │ Complexity   │                                                  │
    ├──────────────┼──────────────────────────────────────────────────┤
    │ Standard     │ SQL:2016 RPR semantics (GREEDY, RELUCTANT, SKIP) │
    │ Compliance   │                                                  │
    ├──────────────┼──────────────────────────────────────────────────┤
    │ Extensibility│ Future extensions: MEASURES, SUBSET, etc.        │
    ├──────────────┼──────────────────────────────────────────────────┤
    │ Simplicity   │ Flat array pattern, clear state transition rules │
    └──────────────┴──────────────────────────────────────────────────┘


0.5 Path Storage Optimization

    Risk of memory explosion when paths[][] are copied on Context fork.
    Share identical paths via chunk tree + hash table:

    ┌─────────────────────────────────────────────────────────────────┐
    │  Chunk Tree Structure                                           │
    ├─────────────────────────────────────────────────────────────────┤
    │                                                                 │
    │  Chunk: fixed-size(2) array + parent pointer + ref count        │
    │                                                                 │
    │  Example: Two paths [A,A,C,D] and [A,A,C,E]                     │
    │                                                                 │
    │      Chunk1[A,A] ← Chunk2[C,D]  (Path1)                         │
    │         RC:2    ↖                                               │
    │                   Chunk3[C,E]  (Path2)                          │
    │                                                                 │
    │      → Share parent chunk (Chunk1), create new chunk from fork  │
    │                                                                 │
    │  Hash table: (parent, values) → reuse identical chunks          │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘


1. System Architecture
------------------------------------------------------------------------

┌──────────────────────────────────────────────────────────────────────┐
│                        Processing Pipeline                             │
├─────────────────────────────────────────────────────────────────────
─┤
│                                                                        │
│  Pattern String   Parser      Pattern          Executor         Result │
│  "A+ B | C" ───▶ (AST) ───▶ (elements) ───▶ (Contexts) ───▶ Match  │
│                                                                        │
│  ┌──────────────────┐  ┌───────────────────┐  ┌───────────────────┐    │
│  │      Parser      │  │      Context      │  │     Executor      │    │
│  │                  │  │                   │  │                   │    │
│  │  Tokenize        │  │  MatchState       │  │  NFAExecutor      │    │
│  │  Build AST       │  │  MatchContext     │  │  Multi-Context    │    │
│  │  Optimize        │  │  State Transition │  │  Absorb/SKIP      │    │
│  │  Compile         │  │  Greedy/Reluctant │  │  Output Mgmt      │    │
│  └──────────────────┘  └───────────────────┘  └───────────────────┘    │
│                                                                        │
└─────────────────────────────────────────────────────────────────────


2. Document Structure
------------------------------------------------------------------------

┌──────────────┬────────────────────────────────────────────────────┐
│ Parser       │ Pattern string → PatternElement[] transformation   │
│              │  - Tokenizer: string → tokens                      │
│              │  - Parser: tokens → AST                            │
│              │  - Optimizer: AST simplification                   │
│              │  - Compiler: AST → PatternElement[]                │
├──────────────┼────────────────────────────────────────────────────┤
│ Context      │ Single Context internal behavior                   │
│              │  - MatchState: current position + counters + path  │
│              │  - MatchContext: collection of States              │
│              │  - State transition: variable match, #ALT, #END,   │
│              │    #FIN handling                                   │
│              │  - Greedy/Reluctant quantifier behavior            │
├──────────────┼────────────────────────────────────────────────────┤
│ Executor     │ Multi-Context management                           │
│              │  - NFAExecutor: main execution engine              │
│              │  - Context creation/absorption/removal             │
│              │  - SKIP modes (PAST LAST / TO NEXT)                │
│              │  - Output rows (ONE ROW / ALL ROWS)                │
├──────────────┼────────────────────────────────────────────────────┤
│ Improvements │ Future enhancements                                │
│              │  - Path storage optimization (chunk tree + hash)   │
└──────────────┴────────────────────────────────────────────────────┘


3. Key Data Structures
------------------------------------------------------------------------

3.1 Pattern
    The entire compiled pattern.

    Fields:
      - maxDepth: maximum nesting depth
      - elements[]: PatternElement array
      - variables[]: variable name array (variables[varId] → name)
      - firstMatchIsGreedy: whether first match is Greedy
                            (Context absorption condition)

3.2 PatternElement
    A single element of the compiled pattern.

    Fields:
      - varId: variable identifier or special marker (0+ variable,
               negative special)
      - reluctant: flag (true = shortest, false = longest)
      - min, max: quantifier range
      - depth: nesting depth
      - next: next element index
      - jump: alternative/repeat jump target

3.3 MatchState
    A single state during NFA execution.

    Fields:
      - elementIndex: current PatternElement position
      - counts[]: per-depth repeat counters
      - summaries[]: Summary array (creation order preserved)

    State equivalence: identical if elementIndex + counts[] match
    On merge, summaries are combined

3.4 Summary
    Aggregate values and path information.

    Fields:
      - aggregates{}: incremental aggregates (SUM, COUNT, FIRST,
                      LAST, MIN, MAX)
      - paths[][]: matched variable paths (creation order preserved)

    Merge rules:
      - If aggregate values identical → merge Summaries, combine paths
      - summaries[0].paths[0] = Lexical Order priority path

3.5 MatchContext
    Collection of States with the same start point.

    Fields:
      - matchStart: match start row (unique identifier)
      - matchEnd: match end row (stores currentRow when matchedState
                  updated)
      - states[]: active States (creation order preserved, all branch
                  paths)
      - matchedState: MatchState | null (for Greedy fallback)

3.6 NFAExecutor
    Main execution engine.

    Fields:
      - pattern: compiled pattern
      - contexts[]: active Contexts (creation order preserved)
      - completedContexts[]: completed Contexts (sorted by start,
                             pending emit)
      - skipMode: PAST_LAST / TO_NEXT
      - currentRow: currently processing row number
      - history[]: per-row execution snapshots (for debugging)


4. Execution Flow
------------------------------------------------------------------------

┌─────────────────────────────────────────────────────────────────────┐
│                         Per-Row Processing                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  FOR EACH row IN data:                                              │
│      │                                                              │
│      ├─▶ 1. tryStartNewContext(row)                                │
│      │       └─ Create Context if new match can start               │
│      │                                                              │
│      ├─▶ 2. FOR EACH context:                                      │
│      │       └─ processContext(context, row)                        │
│      │           └─ State transitions + incremental aggregation     │
│      │                                                              │
│      ├─▶ 3. mergeStates() + absorbContexts()                       │
│      │       ├─ State Merge (same elementIndex+counts)              │
│      │       ├─ Summary Merge (same aggregates → combine paths)     │
│      │       └─ Absorb later Context into earlier one               │
│      │                                                              │
│      ├─▶ 4. Remove dead/completed Contexts                         │
│      │                                                              │
│      └─▶ 5. emitRows()                                             │
│              ├─ contexts[1+] complete → queue in completedContexts  │
│              └─ contexts[0] complete → emit immediately, process    │
│                  queue                                              │
│                  └─ Iterate items where start < contexts[0].start   │
│                      ├─ PAST LAST: discard if start <= prev end     │
│                      └─ TO NEXT: end >= ctx[0].start → stop (wait)  │
│                                  end < ctx[0].start → emit          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘


5. Core Concepts
------------------------------------------------------------------------

5.1 State Transitions

    Variable element:  If condition true, move to next, add to paths
    #ALT:              Branch to all alternatives (next, jump chain)
    #END:              Check counter, repeat (jump) or escape (next)
    #FIN:              Match complete

5.2 Matching Modes (3 Axes)

    ┌──────────────────────────────────────────────────────────────────┐
    │  [Axis 1] Output Rows:    ONE ROW / ALL ROWS                     │
    │  [Axis 2] Quantifier:     GREEDY / RELUCTANT                     │
    │  [Axis 3] SKIP Option:    PAST LAST / TO NEXT                    │
    └──────────────────────────────────────────────────────────────────┘

    Output Rows:
      - ONE ROW:   Output paths[0] (Lexical Order priority path)
      - ALL ROWS:  Output all paths

    Quantifier Greediness:
      - GREEDY:    Longest match first, preserve completion and continue
      - RELUCTANT: Shortest match first, return immediately on completion

    SKIP Option:
      - PAST LAST: Multiple Contexts active, discard overlaps on emit
      - TO NEXT:   Start possible every row, allow overlapping matches

5.3 Merge Rules

    State Merge (within Context):
      - Condition: same elementIndex + counts[]
      - Action: combine summaries (preserve creation order)
      - Result: reduce State count at same position, prevent redundant
                computation

    Summary Merge (within State):
      - Condition: same aggregates{} values
      - Action: combine paths (preserve creation order)
      - Result: summaries[0].paths[0] = Lexical Order priority path

5.4 Context Absorption

    Condition: First match in pattern is Greedy (max=Infinity AND
               reluctant=false)
               (same elementIndex AND earlier.counts >= later.counts)
    Action: Remove later Context
    Result: Earlier continues with longer match, prevent redundant
            computation

5.5 Emit Flow

    - contexts[0] complete → emit immediately
    - contexts[1+] complete → queue in completedContexts
    - After contexts[0] emits, process queue (by start order):
        1. start >= contexts[0].start → stop (not yet eligible)
        2. PAST LAST: start <= previous emit's end → discard, continue
        3. TO NEXT: end >= contexts[0].start (overlaps with ctx[0])
                    → stop (this item waits until ctx[0] clears)
        4. TO NEXT: end < contexts[0].start (no overlap)
                    → emit, continue to next item


6. Module Dependencies
------------------------------------------------------------------------

  Parser ────────────────────────────────────────────────────────────
    Output: PatternElement[]
    → Context: State references PatternElement
    → Executor: passed as pattern field

  Context ───────────────────────────────────────────────────────────
    Input: PatternElement[] (read-only)
    Output: MatchContext (completed/in-progress status)
    → Executor: State transitions in processContext()

  Executor ──────────────────────────────────────────────────────────
    Input: Pattern, data rows
    Internal: Context creation/management
    Output: Match results (completedContexts)


7. Context Absorption
------------------------------------------------------------------------

    When the pattern starts with a Greedy quantifier (e.g., A+), later
    Contexts can be absorbed into earlier ones if they reach the same
    state with fewer repetitions. This eliminates redundant computation.

┌─────────────────────────────────────────────────────────────────────┐
│                     Context Absorption                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Pattern: A+ B     (A is Greedy: max=Infinity, reluctant=false)     │
│                                                                     │
│  Row 1: A matched                                                   │
│    contexts[0]: start=1, elementIndex=A, counts=[1]                 │
│                                                                     │
│  Row 2: A matched                                                   │
│    contexts[0]: start=1, elementIndex=A, counts=[2]  ← continue     │
│    contexts[1]: start=2, elementIndex=A, counts=[1]  ← new Context  │
│                                                                     │
│  Absorption check:                                                  │
│    - First element is Greedy? YES (A+)                              │
│    - Same elementIndex? YES (both at A)                             │
│    - earlier.counts[0] >= later.counts[0]? YES (2 >= 1)             │
│    → Absorb: Remove contexts[1]                                     │
│                                                                     │
│  Result:                                                            │
│    contexts[0]: start=1, counts=[2]  (only survivor)                │
│                                                                     │
│  Why: Later Context's path is subset of earlier's longer match      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Condition:
      1. Pattern's first match is Greedy (max=Infinity AND reluctant=false)
         - Simple: A+ B
         - Group:  (A B)+ C  (group repeat is also Greedy)
      2. Both Contexts at same elementIndex
      3. earlier.counts[depth] >= later.counts[depth]

    Action: Remove later Context

    Result: Earlier Context continues with longer match,
            prevents redundant computation


8. State Fork and Merge
------------------------------------------------------------------------

    States fork when multiple paths are possible (e.g., at #ALT or when
    min <= count < max at #END). States merge when they reach identical
    positions, combining their paths while preventing redundant work.

┌─────────────────────────────────────────────────────────────────────┐
│                        State Fork                                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Pattern: A{1,2} B    (A can match 1 or 2 times)                    │
│                                                                     │
│  At #END element (after A matched once, count=1):                   │
│    - count(1) >= min(1)? YES → can escape to B (next)               │
│    - count(1) < max(2)?  YES → can repeat A (jump)                  │
│                                                                     │
│  Fork into two States:                                              │
│    State1: elementIndex=B, counts=[1]     ← escaped (try B)         │
│    State2: elementIndex=A, counts=[1]     ← repeat (try more A)     │
│                                                                     │
│  Both States continue in parallel within same Context               │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Fork Trigger:
      - #END element where min <= count < max
      - #ALT element (branch to all alternatives)

    Fork Action: Create new State for each branch path

    Copy-on-Write Optimization:
      - Forked States share summaries[] via refcount
      - Only copied when one branch modifies (adds path/updates aggregate)
      - Reduces memory allocation during fork-heavy patterns


┌─────────────────────────────────────────────────────────────────────┐
│                        State Merge                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Pattern: (A | B) C                                                 │
│                                                                     │
│  After row matches both A and B conditions:                         │
│    State1: path=[A], elementIndex=C, counts=[1]                     │
│    State2: path=[B], elementIndex=C, counts=[1]                     │
│                                                                     │
│  Merge check:                                                       │
│    - Same elementIndex? YES (both at C)                             │
│    - Same counts[]? YES ([1] == [1])                                │
│    → Merge States                                                   │
│                                                                     │
│  Merged State:                                                      │
│    elementIndex=C, counts=[1]                                       │
│    summaries: [                                                     │
│      { paths: [[A]] },    ← from State1 (creation order)            │
│      { paths: [[B]] }     ← from State2                             │
│    ]                                                                │
│                                                                     │
│  Result: Single State, multiple paths preserved                     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Merge Condition:
      - Same elementIndex
      - Same counts[] array

    Merge Action: Combine summaries (preserve creation order)

    Result: Reduces State count, prevents redundant computation
            summaries[0].paths[0] = Lexical Order priority path


┌─────────────────────────────────────────────────────────────────────┐
│              Summary Array Merge (during State Merge)               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Same State = Same Future                                           │
│    States with identical (elementIndex + counts[]) will produce     │
│    identical future paths. Only past paths differ.                  │
│                                                                     │
│  Before State merge:                                                │
│    State1: elementIndex=C, counts=[1]                               │
│            summaries: [{ agg:{sum:10}, paths:[[A,C]] }]             │
│    State2: elementIndex=C, counts=[1]                               │
│            summaries: [{ agg:{sum:20}, paths:[[B,C]] }]             │
│            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                             │
│            same state = same future                                 │
│                                                                     │
│  After State merge:                                                 │
│    Merged: elementIndex=C, counts=[1]                               │
│            summaries: [                                             │
│              { agg:{sum:10}, paths:[[A,C]] },                       │
│              { agg:{sum:20}, paths:[[B,C]] }                        │
│            ]                                                        │
│                                                                     │
│  Why: Merge States now, they will have identical continuations      │
│       summaries[] combined as array (preserve creation order)       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│              Summary Merge (= Path Merge)                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  When: Two Summaries have identical aggregates{}                    │
│        → Merge Summaries, which merges their paths                  │
│                                                                     │
│  Before merge:                                                      │
│    Summary1: { aggregates: {sum:10}, paths: [[A,A]] }               │
│    Summary2: { aggregates: {sum:10}, paths: [[A,B]] }               │
│                                                                     │
│  After merge:                                                       │
│    Summary: { aggregates: {sum:10}, paths: [[A,A], [A,B]] }         │
│                                                                     │
│  Why: Same aggregate = same "value", only paths differ              │
│       Merging Summaries automatically combines their paths          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Summary Merge Condition:
      - Same aggregates{} values (SUM, COUNT, etc. all equal)

    Summary Merge Action:
      - Merge two Summaries into one
      - paths[][] arrays combined (preserve creation order)
      - paths[0] = Lexical Order priority path


┌─────────────────────────────────────────────────────────────────────┐
│                     Path Deduplication                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  When: Duplicate paths exist after merging                          │
│                                                                     │
│  Before dedup:                                                      │
│    paths: [[A,B,C], [A,B,C], [A,B,D]]                               │
│            ~~~~~~   ~~~~~~                                          │
│            duplicate                                                │
│                                                                     │
│  After dedup:                                                       │
│    paths: [[A,B,C], [A,B,D]]                                        │
│                                                                     │
│  Why: Duplicate paths provide no additional information             │
│       Remove to reduce memory and output redundancy                 │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Deduplication Condition:
      - Identical variable sequence

    Deduplication Action:
      - Keep first occurrence (preserve creation order)
      - Discard duplicates


9. Completion Handling Algorithm
------------------------------------------------------------------------

    When a State reaches #FIN (elementIndex = -1), the pattern has matched.
    Whether to finalize or continue depends on quantifier mode. GREEDY
    preserves completion and continues seeking longer matches, while
    RELUCTANT completes immediately with the shortest match.

┌──────────────────────────────────────────────────────────────────────┐
│                    Completion Handling Algorithm                     │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│              Completion State Occurs (elementIndex = -1)             │
│                               │                                      │
│                               ▼                                      │
│                ┌──────────────────────────────┐                      │
│                │   Active states remain AND   │                      │
│                │   can match pattern input?   │                      │
│                └──────────────┬───────────────┘                      │
│                               │                                      │
│            YES ┌──────────────┴─────────────────┐ NO                 │
│                │                                │                    │
│                ▼                                ▼                    │
│   ┌─────────────────────────┐    ┌─────────────────────────────┐     │
│   │    Can continue more    │    │    Cannot continue more     │     │
│   └────────────┬────────────┘    └──────────────┬──────────────┘     │
│                │                                │                    │
│                ▼                                ▼                    │
│   ┌─────────────────────────┐    ┌─────────────────────────────┐     │
│   │  Is GREEDY quantifier?  │    │     Finalize result         │     │
│   └───────────┬─────────────┘    │                             │     │
│               │                  │   If preserved → fallback   │     │
│               │             NO   │   If none → match failed    │     │
│         YES ┌─┴──────────────┐   │   → Output Lexical Order 1  │     │
│             │                │   └─────────────────────────────┘     │
│             ▼                ▼                                       │
│   ┌────────────────────┐  ┌───────────────────────┐                  │
│   │ Preserve, continue │  │  Complete immediately │                  │
│   │   (for fallback)   │  │    (RELUCTANT)        │                  │
│   └────────────────────┘  └───────────────────────┘                  │
│                                                                      │
├──────────────────────────────────────────────────────────────────────┤
│  Preservation Rule (GREEDY):                                         │
│    1. Preserve 1 State with longest completion path                  │
│    2. If same length, choose Lexical Order priority State            │
│    3. Replace when new longer completion occurs                      │
│                                                                      │
│  Preserved Content:                                                  │
│    - state: completed MatchState (elementIndex=-1, ...)              │
│    - matchEnd: currentRow at matchedState update time                │
│                                                                      │
│  Fallback Condition (GREEDY):                                        │
│    - All active states dead (input unmatched)                        │
│    - matchedState exists → finalize with preserved State             │
│    - Use preserved matchEnd (saved at update time)                   │
└──────────────────────────────────────────────────────────────────────┘


10. AST Optimization
------------------------------------------------------------------------

    After parsing, the AST undergoes three optimization passes to simplify
    structure before compilation. These reduce PatternElement count,
    improving execution performance.

┌─────────────────────────────────────────────────────────────────────┐
│              1. unwrapGroups (Unnecessary Group Removal)            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Case 1: Remove {1,1} Group wrapper                                 │
│                                                                     │
│    Before:  ((A))                   After:  A                       │
│                                                                     │
│      GROUP {1,1}                      VAR:A {1,1}                   │
│       └── GROUP {1,1}                                               │
│            └── VAR:A {1,1}                                          │
│                                                                     │
│  Case 2: Flatten nested SEQ                                         │
│                                                                     │
│    Before:  (A B) C                 After:  A B C                   │
│                                                                     │
│      SEQ                              SEQ                           │
│       ├── GROUP {1,1}                  ├── VAR:A {1,1}              │
│       │    └── SEQ                     ├── VAR:B {1,1}              │
│       │         ├── VAR:A              └── VAR:C {1,1}              │
│       │         └── VAR:B                                           │
│       └── VAR:C                                                     │
│                                                                     │
│  Case 3: Unwrap single-item SEQ/ALT                                 │
│                                                                     │
│    Before:  SEQ[A]                  After:  A                       │
│    Before:  ALT[A]                  After:  A                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│              2. removeDuplicates (Duplicate Alternative Removal)    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Before:  A | B | A                 After:  A | B                   │
│                                                                     │
│    ALT                                ALT                           │
│     ├── VAR:A {1,1}                    ├── VAR:A {1,1}              │
│     ├── VAR:B {1,1}                    └── VAR:B {1,1}              │
│     └── VAR:A {1,1}  ← duplicate                                    │
│                                                                     │
│  Comparison: astEqual() for structural equality                     │
│                                                                     │
│  Before:  (A B) | (A B) | C         After:  (A B) | C               │
│                                                                     │
│    ALT                                ALT                           │
│     ├── SEQ[A,B]                       ├── SEQ[A,B]                 │
│     ├── SEQ[A,B]  ← duplicate          └── VAR:C                    │
│     └── VAR:C                                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│              3. optimizeQuantifiers (Quantifier Optimization)       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Case 1: Merge consecutive identical variables                      │
│                                                                     │
│    Before:  A A A B                 After:  A{3} B                  │
│                                                                     │
│      SEQ                              SEQ                           │
│       ├── VAR:A {1,1}                  ├── VAR:A {3,3}              │
│       ├── VAR:A {1,1}                  └── VAR:B {1,1}              │
│       ├── VAR:A {1,1}                                               │
│       └── VAR:B {1,1}                                               │
│                                                                     │
│  Case 2: Merge nested quantifiers (when one is fixed)               │
│                                                                     │
│    Before:  (A{2}){3}               After:  A{6}                    │
│                                                                     │
│      GROUP {3,3}                      VAR:A {6,6}                   │
│       └── VAR:A {2,2}                                               │
│                                                                     │
│    Calculation: min = 2*3 = 6, max = 2*3 = 6                        │
│                                                                     │
│  Note: Only safe when inner or outer quantifier has min == max      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Optimization Order:
      1. unwrapGroups    - Remove unnecessary structure
      2. removeDuplicates - Eliminate redundant alternatives
      3. optimizeQuantifiers - Merge quantifiers

    Combined Example:

      Input: "((A | A) B B)"

      Step 1 (unwrapGroups):
        GROUP{1,1} -> SEQ, inner GROUP{1,1} -> ALT
        Result: SEQ[ALT[A,A], B, B]

      Step 2 (removeDuplicates):
        ALT[A,A] -> A
        Result: SEQ[A, B, B]

      Step 3 (optimizeQuantifiers):
        B B -> B{2}
        Result: SEQ[A, B{2}]


11. AST to PatternElement[] Compilation
------------------------------------------------------------------------

    The optimized AST is flattened into a linear PatternElement array.
    Each element contains pointers (next, jump) for navigation and depth
    for counter indexing. This flat structure enables efficient state
    transitions during NFA execution.

┌─────────────────────────────────────────────────────────────────────┐
│                    AST Flattening Example                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Pattern: "(A | B)+ C"                                              │
│                                                                     │
│  AST:                          PatternElement[]:                    │
│                                                                     │
│    SEQ                         [0] #ALT  next:1              d:1    │
│     ├── GROUP {1,∞}            [1] A     next:3, jump:2      d:2    │
│     │    └── ALT               [2] B     next:3, jump:-1     d:2    │
│     │         ├── VAR:A        [3] #END  next:4, jump:0      d:0    │
│     │         └── VAR:B                  min:1, max:∞               │
│     └── VAR:C {0,1}            [4] C     next:5, min:0, max:1 d:0   │
│                                [5] #FIN  next:-1                    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

    Pointer Rules:

      next pointer:
        - Sequential elements: index of next element
        - Last element before #FIN: points to #FIN
        - #FIN: -1 (terminal)

      jump pointer:
        - #END: group start index (loop back)
        - ALT branch first element: next alternative start
        - Last alternative: -1

      depth:
        - Top level: 0
        - Inside GROUP/ALT: parent depth + 1
        - #END: parent depth (for repeat counter indexing)

    Flattening Process:

      1. Traverse AST recursively
      2. For each node type:
         - VAR: emit PatternElement with varId, min, max, depth
         - GROUP: flatten content, emit #END with min/max/jump
         - ALT: emit #ALT, flatten each alternative with jump chain
         - SEQ: flatten each item sequentially
      3. Emit #FIN at end
      4. Resolve next pointers (forward references)


┌─────────────────────────────────────────────────────────────────────┐
│                    Pointer Resolution Example                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Pattern: "A | B | C"                                               │
│                                                                     │
│  [0] #ALT  next:1, jump:-1                                          │
│       │                                                             │
│       ├──▶ [1] A  next:4, jump:2  ──┐                              │
│       │                              │                              │
│       ├──▶ [2] B  next:4, jump:3  ──┤ (all point to #FIN)          │
│       │                              │                              │
│       └──▶ [3] C  next:4, jump:-1 ──┘                              │
│                                                                     │
│  [4] #FIN  next:-1                                                  │
│                                                                     │
│  #ALT.next = first branch (A)                                       │
│  Each branch.jump = next branch (or -1 for last)                    │
│  Each branch.next = element after ALT (#FIN)                        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson,

Thank you for the detailed design documentation. While learning the
document, just a quick comment and question.

> Hi hackers,
> 
> I've been working on an alternative implementation approach for Row
> Pattern Recognition (RPR) that addresses some limitations in the
> current regex-based patch.
> 
> Key differences from the existing approach:
> 
> 1. Streaming NFA instead of regex engine
>    - Process rows incrementally without accumulating encoded strings
>    - Avoids O(V^N) combinatorial explosion in multi-variable scenarios

Sounds good. As you already pointed out, current approach has a memory
space problem. If Streaming NFA could solve the issue, it would be
really good.

> 2. No 26-variable limit
>    - Variables identified by index, not alphabet encoding

Sounds good.

> 3. Proper Lexical Order support
>    - Respects PATTERN alternative order for ONE ROW PER MATCH

Here do you have "ONE ROW PER MATCH" option of RPR in your mind? In my
understanding it's in MATCH_RECOGNIZE clause, but not in RPR in Window
clause. (ALL ROWS PER MATCH is not applicable either in RPR in
Window clause).

> 4. GREEDY/RELUCTANT quantifier support
>    - Longest vs shortest match semantics

Sounds good.

> 5. Incremental MEASURES computation
>    - Aggregate values computed during matching, no rescan needed
> 
> The attached document describes the design in detail, including:
> - Data structures (Pattern, MatchState, MatchContext)
> - Execution flow and state transitions
> - Context absorption optimization for greedy patterns
> - AST optimization passes
> 
> This is still a work in progress. I'd appreciate any feedback on
> the approach before proceeding with PostgreSQL integration.

I understand you are still in the design phase and sorry for jumping
into an implementation question. but If you have an idea, please
advice:

How do you handle '{' and '}' in the PATTERN clause in the raw parser?
I am not a parser expert but it seems it's not easy to handle '{' and
'}' in gram.y.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Henson Choi
Дата:
Hi Ishii-san,

Thank you for the quick feedback and helpful spec clarifications!

2026년 1월 3일 (토) AM 10:05, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
Hi Henson,

Thank you for the detailed design documentation. While learning the
document, just a quick comment and question.

> Hi hackers,
>
> I've been working on an alternative implementation approach for Row
> Pattern Recognition (RPR) that addresses some limitations in the
> current regex-based patch.
>
> Key differences from the existing approach:
>
> 1. Streaming NFA instead of regex engine
>    - Process rows incrementally without accumulating encoded strings
>    - Avoids O(V^N) combinatorial explosion in multi-variable scenarios

Sounds good. As you already pointed out, current approach has a memory
space problem. If Streaming NFA could solve the issue, it would be
really good.

> 2. No 26-variable limit
>    - Variables identified by index, not alphabet encoding

Sounds good.

> 3. Proper Lexical Order support
>    - Respects PATTERN alternative order for ONE ROW PER MATCH

Here do you have "ONE ROW PER MATCH" option of RPR in your mind? In my
understanding it's in MATCH_RECOGNIZE clause, but not in RPR in Window
clause. (ALL ROWS PER MATCH is not applicable either in RPR in
Window clause).


You're absolutely right - thank you for the correction!

I'm currently exploring what this NFA structure could theoretically
support, rather than strictly binding to the spec yet. Since I'm
still learning from Oracle manuals and your code without access to
the full SQL:2016 standard, I mixed up MATCH_RECOGNIZE syntax with
Window clause RPR - sorry for the confusion!

Once I have a clearer understanding of the spec, I'll properly
define how it integrates with PostgreSQL's grammar.

Your spec knowledge is invaluable as I continue learning. Thank you!
 
> 4. GREEDY/RELUCTANT quantifier support
>    - Longest vs shortest match semantics

Sounds good.

> 5. Incremental MEASURES computation
>    - Aggregate values computed during matching, no rescan needed
>
> The attached document describes the design in detail, including:
> - Data structures (Pattern, MatchState, MatchContext)
> - Execution flow and state transitions
> - Context absorption optimization for greedy patterns
> - AST optimization passes
>
> This is still a work in progress. I'd appreciate any feedback on
> the approach before proceeding with PostgreSQL integration.

I understand you are still in the design phase and sorry for jumping
into an implementation question. but If you have an idea, please
advice:


Not at all - implementation questions are valuable advice that
help catch what I might miss in the design phase!
 
How do you handle '{' and '}' in the PATTERN clause in the raw parser?
I am not a parser expert but it seems it's not easy to handle '{' and
'}' in gram.y.


You're right - it's not straightforward. Both gram.y and scan.l
need to be modified together.

I've attached a rough patch showing one possible approach. However,
since the OR handling has also been modified in SQL/PGQ, there will
likely be conflicts that need to be resolved when integrating.
 
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Вложения

Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson,

>> Here do you have "ONE ROW PER MATCH" option of RPR in your mind? In my
>> understanding it's in MATCH_RECOGNIZE clause, but not in RPR in Window
>> clause. (ALL ROWS PER MATCH is not applicable either in RPR in
>> Window clause).
>>
>>
> You're absolutely right - thank you for the correction!
> 
> I'm currently exploring what this NFA structure could theoretically
> support, rather than strictly binding to the spec yet. Since I'm
> still learning from Oracle manuals and your code without access to
> the full SQL:2016 standard, I mixed up MATCH_RECOGNIZE syntax with
> Window clause RPR - sorry for the confusion!

No problem at all. MATCH_RECOGNIZE and Window clause RPR are quite
similar. In the mean time "Trino" manual maybe useful for you if you
don't have access to the SQL standard.
https://trino.io/docs/current/sql/pattern-recognition-in-window.html

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson,

Thank you for the comprehensive review!

> SQL/RPR Patch Review Report
> ============================
> 
> Patch: Row Pattern Recognition (SQL:2016)
> Commitfest: https://commitfest.postgresql.org/patch/4460
> 
> Review Methodology:
>   This review focused on quality assessment, not line-by-line code audit.
>   Key code paths and quality issues were examined with surrounding context
>   when concerns arose. Documentation files were reviewed with AI-assisted
>   grammar and typo checking. Code coverage was measured using gcov and
>   custom analysis tools.
> 
> Limitations:
>   - Not a security audit or formal verification
>   - Parser and executor internals reviewed at module level, not exhaustively
>   - Performance characteristics not benchmarked
> 
> 
> TABLE OF CONTENTS
> -----------------
> 
> 1. Executive Summary
> 2. Issues Found
>    2.1 Critical / Major
>    2.2 Minor Issues (Review Needed)
>    2.3 Minor Issues (Style)
>    2.4 Suggestions for Discussion
> 3. Test Coverage
>    3.1 Covered Areas
>    3.2 Untested Items
>    3.3 Unimplemented Features (No Test Needed)
> 4. Code Coverage Analysis
>    4.1 Overall Coverage
>    4.2 Coverage by File
>    4.3 Uncovered Code Risk Assessment
> 5. Commit Analysis
> 6. Recommendations
> 7. Conclusion
> 
> 
> 1. EXECUTIVE SUMMARY
> --------------------
> 
> Overall assessment: GOOD

Glad to hear that.

> The SQL/RPR patch demonstrates solid implementation quality within its
> defined scope. Code follows PostgreSQL coding standards (with minor style
> issues), test coverage is comprehensive at 96.4%, and documentation is
> thorough with only minor typos.
> 
> Rating by Area:
> - Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
> recommended)
> - Test Coverage:    Good (96.4% line coverage, 28 test cases)
> - Documentation:    Good (Complete, 1 minor typo)
> - Build/Regress:    Pass (make check-world, 753 tests passed)
> 
> 
> 2. ISSUES FOUND
> ---------------
> 
> 2.1 Critical / Major
> 
> #1 [Code] Greedy pattern combinatorial explosion
>    Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
>    Impact: Memory exhaustion or infinite wait
>    Recommendation: Add work_mem-based memory limit (error on exceed)

Not sure it's a good solution. I think it's acceptable for users that
the query execution getting slow down when there's not enough
work_mem, but ending up with ERROR would not be acceptable. In order
to run the query without ERROR (and the query getting slow) when
there's not enough memory, RPR needs to use temporary files. Problem
is, if the access pattern is random, accessing files in random manner
would be extremely slow. Unfortunately RPR's pattern match access
pattern is surely random in the current patch.  Thus I think adding
memory usage check is not worth the trouble. What do you think?

Instead we need to look for alternative algorithm to perform the
search.  I hope streaming NFA is a good candidate.

>    Evidence - No memory limit in current code:
>    - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
>    - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
> unlimited
>    - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
> 
>    Only work_mem usage in RPR (nodeWindowAgg.c:1297):
>      winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
>    -> For tuple buffer, unrelated to pattern combination memory (StringSet)
>
> 2.2 Minor Issues (Review Needed)
> 
> #1 [Code] nodeWindowAgg.c:5849,5909,5912
>    pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> elements

Yeah, in variable_pos_fetch() and variable_pos_register(), following
lines are wrong.
    if (pos < 0 || pos > NUM_ALPHABETS)
    if (index < 0 || index > NUM_ALPHABETS)

They should have been:
    if (pos < 0 || pos >= NUM_ALPHABETS)
    if (index < 0 || index >= NUM_ALPHABETS)

Will fix.

>    Reproduction:
>    - PATTERN (A B C ... Z A) (27 elements) -> OK
>    - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
> char: b"
> 
> 2.3 Minor Issues (Style)
> 
> #1 [Code] nodeWindowAgg.c (25 blocks)
>    #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove

I replaced each of them to "printf...". This way, each debug line is
not accompanied with a query, which is good for developers.

> #2 [Code] src/backend/executor/nodeWindowAgg.c
>    static keyword on separate line (26 functions)

Fixed.

> #3 [Code] src/backend/utils/adt/ruleutils.c
>    Whitespace typo: "regexp !=NULL" -> "regexp != NULL"

Fixed.

> #4 [Code] nodeWindowAgg.c:4619
>    Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)

Fixed.

> #5 [Doc] doc/src/sgml/ref/select.sgml:1128
>    Typo: "maximu" -> "maximum"

Fixed.

> 2.4 Suggestions for Discussion
> 
> #1 Incremental matching with streaming NFA redesign
>    Benefits:
>    - O(n) time complexity per row (current: exponential in worst case)
>    - Bounded memory via state merging and context absorption
>    - Natural extension for OR patterns, {n,m} quantifiers, nested groups
>    - Enables MEASURES clause with incremental aggregates
>    - Proven approach in CEP engines (Flink, Esper)

I looking forward to join the discussion.

> 3. TEST COVERAGE
> ----------------
> 
> 3.1 Covered Areas
> 
> - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
> - PREV/NEXT functions: Single argument (line 44, 173)
> - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> - Aggregate integration: sum, avg, count, max, min (line 258-277)
> - Window function integration: first_value, last_value, nth_value (line
> 34-35)
> - JOIN/CTE: JOIN (line 313), WITH (line 324)
> - View: VIEW creation, pg_get_viewdef (line 390-406)
> - Error cases: 7 error scenarios (line 409-532)
> - Large partition: 5000 row smoke test (line 360-387)
> - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
> 
> 3.2 Untested Items
> 
> Feature                  Priority   Recommendation
> -------------------------------------------------------------------------------
> 26 variable limit        Medium     Test 26 variables success, 27th
> variable error
> NULL value handling      Low        Test PREV(col) where col or previous
> row is NULL
> 
> Sample test for 26 variable limit:
> 
>     -- Should fail with 27th variable (parser error, no table needed)
>     SELECT * FROM (SELECT 1 AS x) t
>     WINDOW w AS (
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
>       DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> TRUE,
>              g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> TRUE,
>              m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> TRUE,
>              s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> TRUE,
>              y AS TRUE, z AS TRUE, aa AS TRUE
>     );
>     -- ERROR: number of row pattern definition variable names exceeds 26

Good cacth. I will add the test. BTW, the sample test SQL can be
simplied like this:

SELECT * FROM (SELECT 1 AS x) t
 WINDOW w AS (
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
 DEFINE a AS TRUE
);

Because if a pattern varible (for example "b") is not in the DEFINE
clause, it is treated as if being defined "b AS TRUE" per spec. Same
thing can be said to other pattern variables c ... aa).

> Sample test for NULL handling:
> 
>     CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
> middle
>     INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
>
>     SELECT company, tdate, price, count(*) OVER w AS match_count
>     FROM stock_null
>     WINDOW w AS (
>       PARTITION BY company
>       ORDER BY tdate
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (START UP DOWN)
>       DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> PREV(price)
>     );
>     -- Result: all rows show match_count = 0 (NULL breaks pattern matching)

Thanks. I will add this test.

> 3.3 Unimplemented Features (No Test Needed)
> 
> Per documentation (select.sgml:1133-1136):
> - MEASURES clause: Not implemented
> - SUBSET clause: Not implemented
> - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
> 
> Not in documentation, verified by testing:
> - {n,m} quantifier: Parser error ("syntax error at or near {")
> - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> - (A B)+ compound repetition: Parser accepts, but execution returns 0
> matches
> 
> 
> 4. CODE COVERAGE ANALYSIS
> -------------------------
> 
> 4.1 Overall Coverage
> 
> 96.4% (732 / 759 lines)
> 
> 4.2 Coverage by File (major RPR-modified files)
> 
> nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
> parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
> ruleutils.c:        93.1% (54/58) - pg_get_viewdef output
> 
> 4.3 Uncovered Code Risk Assessment
> 
> Total: 27 lines uncovered
> 
> Medium Risk (2 items) - Test addition recommended (see section 3.2):
> - parse_clause.c:4093: transformDefineClause - 27th variable error
> - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> row
> 
> Low Risk (25 items) - Defensive code / Unreachable:
> - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> - nodeWindowAgg.c:5776: string_set_get - Index range check
> - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> - parse_clause.c:3989: transformDefineClause - A_Expr type check
> - parse_clause.c:4145: transformPatternClause - A_Expr type check
> - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
> 
> Conclusion: Most uncovered code consists of defensive error handling or
> code unreachable due to parser pre-validation. No security or functional
> risk.
> 
> 
> 5. COMMIT ANALYSIS
> ------------------
> 
> 8 sequential commits:
> 
> Commit  Area            Files   +/-         Key Content
> -------------------------------------------------------------------------------
> 1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
> 2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
> 3       Rewriter        1       +109/-0     pg_get_viewdef extension
> 4       Planner         5       +73/-3      WindowAgg plan node extension
> 5       Executor        4       +1,942/-11  CORE: Pattern matching engine
> (+1,850)
> 6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
> 7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
> 8       typedefs        1       +6/-0       pgindent support
> 
> Code Change Statistics:
> - Total files: 25
> - Lines added: 4,358
> - Lines deleted: 39
> - Net increase: +4,319 lines
> 
> 
> 6. RECOMMENDATIONS
> ------------------
> 
> 6.1 Combinatorial Explosion Prevention (Major, Required)
> 
> Add work_mem-based memory limit for StringSet allocation.
> Location: string_set_add() in nodeWindowAgg.c:5741-5750
> Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)

See the comment above.

> 6.2 Code Review Required (Minor, Decision Needed)
> 
> Location: nodeWindowAgg.c:5849,5909,5912
> Issue: pos > NUM_ALPHABETS check intent unclear
> Current: PATTERN with 28 elements causes error
> Question: Is 27 element limit intentional?

I think these are addressed in the attached v37 patch.

> 6.3 Code Style Fixes (Minor)
> 
> - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> - static keyword on separate line: 26 functions to fix
> - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> - Error message case: "Unrecognized" -> "unrecognized"

Fixed in v37.

> 6.4 Documentation Fixes (Minor)
> 
> - select.sgml: "maximu" -> "maximum"

Fixed in v37.

> 6.5 Test Additions (Optional)
> 
> Black-box Tests (Functional):
> 
> Feature              Test Case                                    Priority
> -------------------------------------------------------------------------------
> Variable limit       26 variables success, 27 error               Medium
> NULL boundary        PREV at partition first row                  Medium
> 
> White-box Tests (Coverage) - covered by above black-box tests:
> 
> File:Line                          Target Branch
> -------------------------------------------------------------------------------
> parse_clause.c:4093                Limit error branch (Variable limit test)
> nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)

Test added in the v37 patch.

> 7. CONCLUSION
> -------------
> 
> Test Quality: GOOD
> 
> Core functionality is thoroughly tested with comprehensive error case
> coverage.
> 
> The patch is well-implemented within its defined scope. Identified issues
> include
> one major concern (combinatorial explosion) and minor style/documentation
> items.
> 
> Recommended actions before commit:
> 1. Add work_mem-based memory limit for pattern combinations (required)
> 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> 3. Fix code style issues (optional but recommended)
> 4. Fix documentation typo (trivial)
> 5. Add tests for variable limit and NULL handling (optional)
> 
> Points for discussion (optional):
> 6. Incremental matching with streaming NFA redesign

Definitely we need discussion on this.

> Attachment:
> - coverage.tgz (gcov HTML report, RPR-modified code only)
> 
> 
> ---
> End of Report

Again thank you for the review. Attached are the v37 patches. This
time I added a short description about the patch to the 001 patch. In
the file description about what are implemented and waht not
implemented. Hope this helps to understand the current status of the
patches.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Вложения

Re: Row pattern recognition

От
Henson Choi
Дата:
Hi Ishii-san,

I've been working on an NFA implementation prototype.

This is a very early prototype - still buggy and incomplete,
but shows the basic approach. Attached HTML file - open in
browser (runs locally, no server needed).

I used JavaScript for quick prototyping and visualization.
The actual PostgreSQL implementation would be in C, but the
algorithm logic remains the same.

To test:
1. Use default pattern
2. Enter variables sequentially: A, B, C, D, E, F, G, H, I
3. Try different patterns and options in the UI

Implemented:
- Patterns: *, +, ?, (A|B), (A B)+
- Reluctant quantifiers: A+?, A*? (switches to reluctant mode)
  Note: This part probably has issues
- Execution trace with state visualization
- Context management: multiple matches, absorption
- Options: SKIP modes, output modes

Many rough edges, especially reluctant matching.
Just sharing the direction.

My responses may be slow due to company work during weekdays.

Best regards,
Henson
Вложения

Re: Row pattern recognition

От
Jacob Champion
Дата:
On Sun, Jan 4, 2026 at 7:50 AM Henson Choi <assam258@gmail.com> wrote:
> I've been working on an NFA implementation prototype.

Neat! In case it helps, you may also be interested in my (now very out
of date) mails at [1] and [2].

At one point I was also working on an independent implementation of
the preferment rules, to make it easier for developers to intuit how
matches were made, but I can't find that email at the moment. This may
have some overlap with "state visualization" for the engine? I have a
3k-line patch based on top of v22 of the patchset as of September
2024, if there is interest in that, but no promises on quality -- I
would have posted it if I thought it was ready for review :D

Thanks,
--Jacob

[1] https://postgr.es/m/96b4d3c8-1415-04db-5629-d2617ac7a156%40timescale.com
[2] https://postgr.es/m/CAGu%3Du8gcyAQAsDLp-01R21%3DM4bpAiGP9PZz0OyiMo5_bM30PMA%40mail.gmail.com



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
> On Sun, Jan 4, 2026 at 7:50 AM Henson Choi <assam258@gmail.com> wrote:
>> I've been working on an NFA implementation prototype.
> 
> Neat! In case it helps, you may also be interested in my (now very out
> of date) mails at [1] and [2].
> 
> At one point I was also working on an independent implementation of
> the preferment rules, to make it easier for developers to intuit how
> matches were made, but I can't find that email at the moment.

Maybe this?

https://www.postgresql.org/message-id/CAOYmi%2Bns3kHjC83ap_BCfJCL0wfO5BJ_sEByOEpgNOrsPhqQTg%40mail.gmail.com

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Row pattern recognition

От
Jacob Champion
Дата:
On Mon, Jan 5, 2026 at 3:26 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
> Maybe this?
>
> https://www.postgresql.org/message-id/CAOYmi%2Bns3kHjC83ap_BCfJCL0wfO5BJ_sEByOEpgNOrsPhqQTg%40mail.gmail.com

That's the one, thanks!

--Jacob



Re: Row pattern recognition

От
Tatsuo Ishii
Дата:
Hi Henson,

Thank you for the design proposal. I'm still learning it, and I have a
few questions for now.

> 3. Proper Lexical Order support
>    - Respects PATTERN alternative order for ONE ROW PER MATCH

RPR in WINDOW clause does not allow to specify "ONE ROW PER MATCH".
(nor ALL ROWS PER MATCH). So I am not sure what you mean here.

> 4. GREEDY/RELUCTANT quantifier support
>    - Longest vs shortest match semantics
> 
> 5. Incremental MEASURES computation
>    - Aggregate values computed during matching, no rescan needed

In my understanding MEASURES does not directly connect to Aggregate
computation with rescan. Can you elaborate why implementing MEASURES
allows to avoid recan for aggregate computation?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp