Обсуждение: Patch: Add tsmatch JSONPath operator for granular Full Text Search
Hi,
in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
but never tackle it. Until yesterday that is.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB structures—
solving issues with structural ambiguity and query complexity.
Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.
SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.
The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
Вложения
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Hi,in real-life I work a lot with json & fts search, here's a feature I've always wished I had,but never tackle it. Until yesterday that is.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');This patch introduces a tsmatch boolean operator to the JSONPath engine.By integrating FTS natively into path expressions,this operator allows for high-precision filtering of nested JSONB structures—solving issues with structural ambiguity and query complexity.Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:- Imprecise Global Indexingjsonb_to_tsvector aggregates text into a flat vector.This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").- Complex SQL WorkaroundsAchieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicateswithin the same JSON object - something jsonb_to_tsvector cannot do.SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');In a flat vector, the association between "Alice" and "performance" is lost.tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,tsmatch is architecturally modeled after the standard-defined like_regex.The implementation follows the like_regex precedent:it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.Caching is scoped to the JsonPathExecContext,ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.This initial implementation uses plainto_tsquery.However, the grammar is designed to support a "mode" flag (similar to like_regex flags)in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
Вложения
> On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
>
> On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> Hi,
>
> in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> but never tackle it. Until yesterday that is.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
>
> This patch introduces a tsmatch boolean operator to the JSONPath engine.
> By integrating FTS natively into path expressions,
> this operator allows for high-precision filtering of nested JSONB structures—
> solving issues with structural ambiguity and query complexity.
>
> Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> - Imprecise Global Indexing
> jsonb_to_tsvector aggregates text into a flat vector.
> This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
>
> - Complex SQL Workarounds
> Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
>
> One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> within the same JSON object - something jsonb_to_tsvector cannot do.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
>
> In a flat vector, the association between "Alice" and "performance" is lost.
> tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
>
> While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> tsmatch is architecturally modeled after the standard-defined like_regex.
>
> The implementation follows the like_regex precedent:
> it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> Caching is scoped to the JsonPathExecContext,
> ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
>
> This initial implementation uses plainto_tsquery.
> However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
>
> Here's a v2, that implements the tsqparser clause
>
> So this should now work too
>
> select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser
"w")<v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
Hi Florents,
Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current
master.Also, the CF reported a failure test case, please take a look.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
> On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
>
> On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> Hi,
>
> in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> but never tackle it. Until yesterday that is.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
>
> This patch introduces a tsmatch boolean operator to the JSONPath engine.
> By integrating FTS natively into path expressions,
> this operator allows for high-precision filtering of nested JSONB structures—
> solving issues with structural ambiguity and query complexity.
>
> Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> - Imprecise Global Indexing
> jsonb_to_tsvector aggregates text into a flat vector.
> This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
>
> - Complex SQL Workarounds
> Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
>
> One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> within the same JSON object - something jsonb_to_tsvector cannot do.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
>
> In a flat vector, the association between "Alice" and "performance" is lost.
> tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
>
> While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> tsmatch is architecturally modeled after the standard-defined like_regex.
>
> The implementation follows the like_regex precedent:
> it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> Caching is scoped to the JsonPathExecContext,
> ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
>
> This initial implementation uses plainto_tsquery.
> However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
>
> Here's a v2, that implements the tsqparser clause
>
> So this should now work too
>
> select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
Hi Florents,
Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
Hi Evan,
thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
I noticed the failure too, but I'm having a hard time reproducing it tbh.
This fails for Debian Trixie with Meson. The same with Autoconf passes...
Вложения
> On Feb 27, 2026, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> > On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
> >
> >
> >
> >
> > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> > Hi,
> >
> > in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> > but never tackle it. Until yesterday that is.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > This patch introduces a tsmatch boolean operator to the JSONPath engine.
> > By integrating FTS natively into path expressions,
> > this operator allows for high-precision filtering of nested JSONB structures—
> > solving issues with structural ambiguity and query complexity.
> >
> > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> > - Imprecise Global Indexing
> > jsonb_to_tsvector aggregates text into a flat vector.
> > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> >
> > - Complex SQL Workarounds
> > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> >
> > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> > within the same JSON object - something jsonb_to_tsvector cannot do.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > In a flat vector, the association between "Alice" and "performance" is lost.
> > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> >
> > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> > tsmatch is architecturally modeled after the standard-defined like_regex.
> >
> > The implementation follows the like_regex precedent:
> > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> > Caching is scoped to the JsonPathExecContext,
> > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared
statements.
> >
> > This initial implementation uses plainto_tsquery.
> > However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> >
> > Here's a v2, that implements the tsqparser clause
> >
> > So this should now work too
> >
> > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car"
tsqparser"w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>
> Hi Florents,
>
> Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current
master.Also, the CF reported a failure test case, please take a look.
>
> Hi Evan,
> thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
>
> I noticed the failure too, but I'm having a hard time reproducing it tbh.
> This fails for Debian Trixie with Meson. The same with Autoconf passes...
>
> https://github.com/Florents-Tselai/postgres/runs/65098077968
>
>
>
>
> <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
```
Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column
asthe J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version
isa bit longer in terms of number of lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in
thesame structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
```
This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert
here,or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across
arrayitems. Given the same tsconfig should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
@? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss@"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL
terminated.I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of
appendBinaryStringInfo,it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Вложения
On Mon, Mar 2, 2026 at 5:44 AM Chao Li <li.evan.chao@gmail.com> wrote:
> On Feb 27, 2026, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> > On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
> >
> >
> >
> >
> > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> > Hi,
> >
> > in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> > but never tackle it. Until yesterday that is.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > This patch introduces a tsmatch boolean operator to the JSONPath engine.
> > By integrating FTS natively into path expressions,
> > this operator allows for high-precision filtering of nested JSONB structures—
> > solving issues with structural ambiguity and query complexity.
> >
> > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> > - Imprecise Global Indexing
> > jsonb_to_tsvector aggregates text into a flat vector.
> > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> >
> > - Complex SQL Workarounds
> > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> >
> > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> > within the same JSON object - something jsonb_to_tsvector cannot do.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > In a flat vector, the association between "Alice" and "performance" is lost.
> > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> >
> > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> > tsmatch is architecturally modeled after the standard-defined like_regex.
> >
> > The implementation follows the like_regex precedent:
> > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> > Caching is scoped to the JsonPathExecContext,
> > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
> >
> > This initial implementation uses plainto_tsquery.
> > However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> >
> > Here's a v2, that implements the tsqparser clause
> >
> > So this should now work too
> >
> > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>
> Hi Florents,
>
> Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
>
> Hi Evan,
> thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
>
> I noticed the failure too, but I'm having a hard time reproducing it tbh.
> This fails for Debian Trixie with Meson. The same with Autoconf passes...
>
> https://github.com/Florents-Tselai/postgres/runs/65098077968
>
>
>
>
> <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
```
Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
```
This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss @"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```
Here's a v4 which incorporates most of Evan's comments & feedback
- shifted tsquery compilation logic to use persistent cache within JsonPathExecContext
- Fixed a binary serialization alignment issue which caused Dixie to fail earlier
- I've refactor and simplified the grammar per Evan's input by adding a tsmatch_opts rule.
- Also updated the docs per Evan's comments
Вложения
Here's a v5 which is v4 + pgindent
On Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <florents.tselai@gmail.com> wrote:
On Mon, Mar 2, 2026 at 5:44 AM Chao Li <li.evan.chao@gmail.com> wrote:
> On Feb 27, 2026, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> > On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
> >
> >
> >
> >
> > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> > Hi,
> >
> > in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> > but never tackle it. Until yesterday that is.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > This patch introduces a tsmatch boolean operator to the JSONPath engine.
> > By integrating FTS natively into path expressions,
> > this operator allows for high-precision filtering of nested JSONB structures—
> > solving issues with structural ambiguity and query complexity.
> >
> > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> > - Imprecise Global Indexing
> > jsonb_to_tsvector aggregates text into a flat vector.
> > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> >
> > - Complex SQL Workarounds
> > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> >
> > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> > within the same JSON object - something jsonb_to_tsvector cannot do.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > In a flat vector, the association between "Alice" and "performance" is lost.
> > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> >
> > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> > tsmatch is architecturally modeled after the standard-defined like_regex.
> >
> > The implementation follows the like_regex precedent:
> > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> > Caching is scoped to the JsonPathExecContext,
> > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
> >
> > This initial implementation uses plainto_tsquery.
> > However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> >
> > Here's a v2, that implements the tsqparser clause
> >
> > So this should now work too
> >
> > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>
> Hi Florents,
>
> Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
>
> Hi Evan,
> thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
>
> I noticed the failure too, but I'm having a hard time reproducing it tbh.
> This fails for Debian Trixie with Meson. The same with Autoconf passes...
>
> https://github.com/Florents-Tselai/postgres/runs/65098077968
>
>
>
>
> <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
```
Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
```
This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss @"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```Here's a v4 which incorporates most of Evan's comments & feedback- shifted tsquery compilation logic to use persistent cache within JsonPathExecContext- Fixed a binary serialization alignment issue which caused Dixie to fail earlier- I've refactor and simplified the grammar per Evan's input by adding a tsmatch_opts rule.- Also updated the docs per Evan's comments
Вложения
On Sat, Apr 4, 2026 at 12:38 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Here's a v5 which is v4 + pgindentOn Fri, Apr 3, 2026 at 12:56 PM Florents Tselai <florents.tselai@gmail.com> wrote:On Mon, Mar 2, 2026 at 5:44 AM Chao Li <li.evan.chao@gmail.com> wrote:
> On Feb 27, 2026, at 13:59, Florents Tselai <florents.tselai@gmail.com> wrote:
>
>
>
> On Thu, Feb 26, 2026 at 8:48 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> > On Feb 1, 2026, at 19:02, Florents Tselai <florents.tselai@gmail.com> wrote:
> >
> >
> >
> >
> > On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents.tselai@gmail.com> wrote:
> > Hi,
> >
> > in real-life I work a lot with json & fts search, here's a feature I've always wished I had,
> > but never tackle it. Until yesterday that is.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > This patch introduces a tsmatch boolean operator to the JSONPath engine.
> > By integrating FTS natively into path expressions,
> > this operator allows for high-precision filtering of nested JSONB structures—
> > solving issues with structural ambiguity and query complexity.
> >
> > Currently, users must choose between two suboptimal paths for FTS-ing nested JSON:
> > - Imprecise Global Indexing
> > jsonb_to_tsvector aggregates text into a flat vector.
> > This ignores JSON boundaries, leading to false positives when the same key (e.g., "body")
> > appears in different contexts (e.g., a "Product Description" vs. a "Customer Review").
> >
> > - Complex SQL Workarounds
> > Achieving 100% precision requires unnesting the document via jsonb_array_elements and LATERAL joins.
> > This leads to verbose SQL and high memory overhead from generating intermediate heap tuples.
> >
> > One of the most significant advantages of tsmatch is its ability to participate in multi-condition predicates
> > within the same JSON object - something jsonb_to_tsvector cannot do.
> >
> > SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body tsmatch "performance")');
> >
> > In a flat vector, the association between "Alice" and "performance" is lost.
> > tsmatch preserves this link by evaluating the FTS predicate in-place during path traversal.
> >
> > While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an FTS operator,
> > tsmatch is architecturally modeled after the standard-defined like_regex.
> >
> > The implementation follows the like_regex precedent:
> > it is a non-indexable predicate that relies on GIN path-matching for pruning and heap re-checks for precision.
> > Caching is scoped to the JsonPathExecContext,
> > ensuring 'compile-once' efficiency per execution without violating the stability requirements of prepared statements.
> >
> > This initial implementation uses plainto_tsquery.
> > However, the grammar is designed to support a "mode" flag (similar to like_regex flags)
> > in future iterations to toggle between to_tsquery, websearch_to_tsquery, and phraseto_tsquery.
> >
> > Here's a v2, that implements the tsqparser clause
> >
> > So this should now work too
> >
> > select jsonb_path_query_array('["fast car", "slow car", "fast and furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w") <v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
>
> Hi Florents,
>
> Grant pinged me about this. I can review it in coming days. Can you please rebase it? I failed to apply to current master. Also, the CF reported a failure test case, please take a look.
>
> Hi Evan,
> thanks for having a look. The conflict was due to the intro of pg_fallthrough. Not related to this patch .
>
> I noticed the failure too, but I'm having a hard time reproducing it tbh.
> This fails for Debian Trixie with Meson. The same with Autoconf passes...
>
> https://github.com/Florents-Tselai/postgres/runs/65098077968
>
>
>
>
> <v3-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch>
I have reviewed v3 and traced a few test cases. Here comes my review comments:
1
```
+ <replaceable>string</replaceable> <literal>tsmatch</literal> <replaceable>string</replaceable>
+ <optional> <literal>tsconfig</literal> <replaceable>string</replaceable> </optional>
+ <optional> <literal>tsqparser</literal> <replaceable>string</replaceable> </optional>
```
For all “replaceable”, instead of “string”, would it be better to use something more descriptive? For example:
```
<replaceable>json_string</replaceable> <literal>tsmatch</literal> <replaceable>query</replaceable>
<optional> <literal>tsconfig</literal> <replaceable>config_name</replaceable> </optional>
<optional> <literal>tsqparser</literal> <replaceable>parser_mode</replaceable> </optional>
```
2 - jsonpath_gram.y
```
+static bool makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem ** result,
+ struct Node *escontext);
```
Format Nit: Looking at the existing code, the J in the second and following lines, should be placed in the same column as the J in the first line.
3 - jsonpath_gram.y
```
+ | expr TSMATCH_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) and NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, NULL, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsquery_parser (4th) */
+ if (! makeItemTsMatch($1, &$3, &$5, NULL, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ /* Pass NULL for tsconfig (3rd) */
+ if (! makeItemTsMatch($1, &$3, NULL, &$5, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
+ | expr TSMATCH_P STRING_P TSCONFIG_P STRING_P TSQUERYPARSER_P STRING_P
+ {
+ JsonPathParseItem *jppitem;
+ if (! makeItemTsMatch($1, &$3, &$5, &$7, &jppitem, escontext))
+ YYABORT;
+ $$ = jppitem;
+ }
```
Feels a little redundant, repeatedly calls makeItemTsMatch. See the attached diff for a simplification. But my version is a bit longer in terms of number of lines. So, up to you.
4 - jsonpath_gram.y
```
+static bool
+makeItemTsMatch(JsonPathParseItem *doc,
+ JsonPathString *tsquery,
+ JsonPathString *tsconfig,
+ JsonPathString *tsquery_parser,
+ JsonPathParseItem **result,
+ struct Node *escontext)
```
makeItemTsMatch doesn’t need to return a bool. Actually, now it never returns false, instead, it just ereport(ERROR).
5 - jsonpath.h
```
+ struct
+ {
+ int32 doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ int32 tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
+ struct
+ {
+ JsonPathParseItem *doc;
+ char *tsquery;
+ uint32 tsquerylen;
+ JsonPathParseItem *tsconfig;
+ char *tsqparser;
+ uint32 tsqparser_len;
+ } tsmatch;
} value;
```
tsquerylen doesn’t have _ before len, and tsqparser_len, would it be better to make naming conventions consistent in the same structure?
6 - jsonpath_exec.c
```
#include "tsearch/ts_utils.h"
#include "tsearch/ts_cache.h"
#include "utils/regproc.h"
#include "catalog/namespace.h"
static JsonPathBool
executeTsMatch(JsonPathItem *jsp, JsonbValue *str, JsonbValue *rarg,
void *param)
```
Why don’t put these includes to the header section together with other includes?
7 - jsonpath_exec.c
```
+ else
+ {
+ /*
+ * Fallback or Error for unknown flags (should be caught by
+ * parser)
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized tsqparser flag")));
+ }
```
This “else” should never be entered as the same check has been done by makeItemTsMatch. So, maybe just use an Assert here, or pg_unreachable().
8 - jsonpath_exec.c
```
+ /* Setup Context (Run ONLY once per predicate) */
+ if (!cxt->initialized)
```
While tracing this SQL:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb
evantest-# @@ '$.tags[*] ? (@ tsmatch "run" tsconfig "english")';
?column?
----------
(1 row)
```
I noticed that, when process “jogging”, cxt->initialized is still false, meaning that, the cxt is not reused across array items. Given the same tsconfig should apply to all array items, I think cxt should be reused.
9 - jsonpath_exec.c
```
+ /* Select Parser and Compile Query */
+ parser_mode = jsp->content.tsmatch.tsqparser;
+ parser_len = jsp->content.tsmatch.tsqparser_len;
+
+ if (parser_len > 0)
+ {
+ /* Dispatch based on flag */
+ if (pg_strncasecmp(parser_mode, "pl", parser_len) == 0)
```
Nit: parser_mode is only used inside if (parser_len > 0), it can be defined inside the “if”.
10 - jsonpath_gram.y
```
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("invalid tsquery_parser value: \"%s\"", tsquery_parser->val),
+ errhint("Valid values are \"pl\", \"ph\", and \"w\".")));
```
When tested a case with an invalid parser, I got:
```
evantest=# SELECT '{"tags": ["running", "jogging"]}'::jsonb @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqparser "pss")';
ERROR: invalid tsquery_parser value: "pss @"
LINE 2: @? '$.tags[*] ? (@ tsmatch "run" tsconfig "english" tsqpar...
^
HINT: Valid values are "pl", "ph", and "w".
```
You can see the it shows a bad looking invalid value. I think that’s because tsquery_parser->val is not NULL terminated. I fixed this problem with:
```
errmsg("invalid tsquery_parser value: \"%.*s\"", (int) tsquery_parser->len, tsquery_parser->val),
```
This change is also included in the attached diff file.
11 - jsonpath.c
```
+ if (printBracketes)
+ appendStringInfoChar(buf, ')');
+ break;
+
if (printBracketes)
appendStringInfoChar(buf, ')');
```
Duplicate code. Looks like a copy-pasto.
12 - jsonpath.c
```
+ /* Write the Main Query String */
+ appendBinaryStringInfo(buf,
+ &item->value.tsmatch.tsquerylen,
+ sizeof(item->value.tsmatch.tsquerylen));
+ appendBinaryStringInfo(buf,
+ item->value.tsmatch.tsquery,
+ item->value.tsmatch.tsquerylen);
+ appendStringInfoChar(buf, '\0');
```
I don’t think we need to manually append ‘\0’ after appendBinaryStringInfo. Looking at the header comment of appendBinaryStringInfo, it says that a trailing null will be added.
```
/*
* appendBinaryStringInfo
*
* Append arbitrary binary data to a StringInfo, allocating more space
* if necessary. Ensures that a trailing null byte is present.
*/
void
appendBinaryStringInfo(StringInfo str, const void *data, int datalen)
```Here's a v4 which incorporates most of Evan's comments & feedback- shifted tsquery compilation logic to use persistent cache within JsonPathExecContext- Fixed a binary serialization alignment issue which caused Dixie to fail earlier- I've refactor and simplified the grammar per Evan's input by adding a tsmatch_opts rule.- Also updated the docs per Evan's comments
Andrew was kind enough to give some comments off-list.
Here's a v6
- Fixed soft-error path in makeItemTsMatch - ereport replaced with errsave, return value now checked at the grammar call site with YYABORT
- Fixed pg_strncasecmp prefix match bug in parser-mode validation - "p" was being accepted as "pl"
- Fixed get_ts_config_oid to throw on unknown tsconfig instead of passing InvalidOid downstream
- Grammar restructured to make tsconfig and tsqparser order-independent
- Removed unused JsonTsMatchContext struct and some misplaced #includes