Re: Row pattern recognition

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: Row pattern recognition
Дата
Msg-id 20230721.151648.412762379013769790.t-ishii@sranhm.sra.co.jp
обсуждение исходный текст
Ответ на Re: Row pattern recognition  (Jacob Champion <jchampion@timescale.com>)
Ответы Re: Row pattern recognition  (Jacob Champion <jchampion@timescale.com>)
Список pgsql-hackers
Hi,

> Hi Ishii-san,
> 
> On 7/19/23 22:15, Tatsuo Ishii wrote:
>> Currently my patch has a limitation for the sake of simple
>> implementation: a pattern like "A+" is parsed and analyzed in the raw
>> parser. This makes subsequent process much easier because the pattern
>> element variable (in this case "A") and the quantifier (in this case
>> "+") is already identified by the raw parser. However there are much
>> more cases are allowed in the standard as you already pointed out. For
>> those cases probably we should give up to parse PATTERN items in the
>> raw parser, and instead the raw parser just accepts the elements as
>> Sconst?
> 
> Is there a concern that the PATTERN grammar can't be represented in
> Bison? I thought it was all context-free...

I don't know at this point. I think context-free is not enough to be
repsented in Bison. The grammer also needs to be LALR(1).  Moreover,
adding the grammer to existing parser may generate shift/reduce
errors.

> Or is the concern that the
> parse tree of the pattern is hard to feed into a regex engine?

One small concern is how to convert pattern variables to regex
expression which our regex enginer understands. Suppose,

PATTERN UP+

Currently I convert "UP+" to "U+" so that it can be fed to the regexp
engine. In order to do that, we need to know which part of the pattern
(UP+) is the pattern variable ("UP"). For "UP+" it's quite easy. But
for more complex regular expressions it would be not, unless PATTERN
grammer can be analyzed by our parser to know which part is the
pattern variable.

>> Any comments, especially on the PREV/NEXT implementation part is
>> welcome. Currently the DEFINE expression like "price > PREV(price)" is
>> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
>> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
>> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
>> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
>> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
>> hack and should be gotten ride of before v1 is committed. Better idea?
> 
> I'm not familiar enough with this code yet to offer very concrete
> suggestions, sorry... But at some point in the future, we need to be
> able to skip forward and backward from arbitrary points, like
> 
>     DEFINE B AS B.price > PREV(FIRST(A.price), 3)
> 
> so there won't be just one pair of "previous and next" tuples.

Yes, I know.

> Maybe
> that can help clarify the design? It feels like it'll need to eventually
> be a "real" function that operates on the window state, even if it
> doesn't support all of the possible complexities in v1.

Unfortunately an window function can not call other window functions.

> Taking a closer look at the regex engine:
> 
> It looks like the + qualifier has trouble when it meets the end of the
> frame. For instance, try removing the last row of the 'stock' table in
> rpr.sql; some of the final matches will disappear unexpectedly. Or try a
> pattern like
> 
>     PATTERN ( a+ )
>      DEFINE a AS TRUE
> 
> which doesn't seem to match anything in my testing.
> 
> There's also the issue of backtracking in the face of reclassification,
> as I think Vik was alluding to upthread. The pattern
> 
>     PATTERN ( a+ b+ )
>      DEFINE a AS col = 2,
>             b AS col = 2
> 
> doesn't match a sequence of values (2 2 ...) with the current
> implementation, even with a dummy row at the end to avoid the
> end-of-frame bug.
> 
> (I've attached two failing tests against v2, to hopefully better
> illustrate, along with what I _think_ should be the correct results.)

Thanks. I will look into this.

> I'm not quite understanding the match loop in evaluate_pattern(). It
> looks like we're building up a string to pass to the regex engine, but
> by the we call regexp_instr, don't we already know whether or not the
> pattern will match based on the expression evaluation we've done?

For "+" yes. But for more complex regular expression like '{n}', we
need to call our regexp engine to check if the pattern matches.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Junwang Zhao
Дата:
Сообщение: table_open/table_close with different lock mode
Следующее
От: Gurjeet Singh
Дата:
Сообщение: Re: MERGE ... RETURNING