Re: Row pattern recognition
| От | Tatsuo Ishii |
|---|---|
| Тема | Re: Row pattern recognition |
| Дата | |
| Msg-id | 20260111.094131.1067216778241374352.ishii@postgresql.org обсуждение исходный текст |
| Ответ на | Re: Row pattern recognition (Henson Choi <assam258@gmail.com>) |
| Список | pgsql-hackers |
>> 3. Proper Lexical Order support >> > - Respects PATTERN alternative order for ONE ROW PER MATCH >> >> RPR in WINDOW clause does not allow to specify "ONE ROW PER MATCH". >> (nor ALL ROWS PER MATCH). So I am not sure what you mean here. > > > You're absolutely right to point this out. I've been working without access > to the SQL:2016 standard, relying on Oracle manuals and your implementation, > which led me to incorrectly treat Window clause RPR as a variant of > MATCH_RECOGNIZE. > > I now realize there are fundamental differences between R010 (RPR in window > functions) and R020 (MATCH_RECOGNIZE), and I was conflating the two. My > company is supportive of this work, and we're planning to purchase the > standard next week so I can properly understand the spec requirements. > > Thank you for catching this - it's exactly the kind of spec guidance I need > as I continue learning. Glad to hear that you are planning to purchase the standard. I was advised by Vik and Jacob the same thing and I really appreciate them for the suggestion. Without access to 19075-5, I think it's not possible to implement RPR. > > 5. Incremental MEASURES computation > >> > - Aggregate values computed during matching, no rescan needed >> >> In my understanding MEASURES does not directly connect to Aggregate >> computation with rescan. Can you elaborate why implementing MEASURES >> allows to avoid recan for aggregate computation? >> > > Let me clarify what I meant by "incremental aggregation" and "rescan": > In the NFA design, I'm building infrastructure for incremental aggregate > computation during pattern matching - maintaining SUM, COUNT, etc. as the > match progresses. When a match completes, if only aggregate functions are > needed, the result can be produced without accessing the original rows > again. Oh, ok. > I used "rescan" to contrast this with what I assumed was the existing > approach: match first, then aggregate over the matched row range afterward. > However, I haven't studied your implementation carefully enough to know if > this assumption is correct. As far as RPR concerns, you are correct. Current implementation does the "rescan". > Could you clarify how aggregates are currently computed after pattern > matching > in your implementation? This would help me understand whether the > incremental > approach actually provides a benefit, or if I'm solving a problem that > doesn't > exist. Yes, if RPR is enabled, we restart to compute aggregation from the head of the "reduced frame" (which means the starting row of the pattern matching), to the end of matching rows. Without RPR, PostgreSQL reuses the previous aggregate result in some conditions to avoid the restarting (see the long comment in eval_windowaggregates()). But currently I think it's not possible for RPR to avoid it. The reason was explained in the v17 patch (posted on 2024/4/28): - In 0005 executor patch, aggregation in RPR always restarts for each row. This is necessary to run aggregates on no matching (due to skipping) or empty matching (due to no pattern variables matches) rows to produce NULL (most aggregates) or 0 (count) properly. In v16 I had a hack using a flag to force the aggregation results to be NULL in case of no match or empty match in finalize_windowaggregate(). v17 eliminates the dirty hack. > Regarding MEASURES - I incorrectly connected it to this aggregation > discussion. > As you noted, MEASURES is a separate R020 feature, not part of R010. The > incremental aggregation infrastructure would support both cases, but they're > distinct features. I think R010 has MEASURES too. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
В списке pgsql-hackers по дате отправления: