Обсуждение: tsearch patch and namespace pollution
I find the following additions to pg_proc in the current tsearch2 patch:
proc | prorettype
------------------------------------------+------------pg_ts_parser_is_visible(oid) |
booleanpg_ts_dict_is_visible(oid) | booleanpg_ts_template_is_visible(oid) |
booleanpg_ts_config_is_visible(oid) | booleantsvectorin(cstring) |
tsvectortsvectorout(tsvector) | cstringtsvectorsend(tsvector) |
byteatsqueryin(cstring) | tsquerytsqueryout(tsquery) |
cstringtsquerysend(tsquery) | byteagtsvectorin(cstring) |
gtsvectorgtsvectorout(gtsvector) | cstringtsvector_lt(tsvector,tsvector) |
booleantsvector_le(tsvector,tsvector) | booleantsvector_eq(tsvector,tsvector) |
booleantsvector_ne(tsvector,tsvector) | booleantsvector_ge(tsvector,tsvector) |
booleantsvector_gt(tsvector,tsvector) | booleantsvector_cmp(tsvector,tsvector) |
integerlength(tsvector) | integerstrip(tsvector) |
tsvectorsetweight(tsvector,"char") | tsvectortsvector_concat(tsvector,tsvector) |
tsvectorvq_exec(tsvector,tsquery) | booleanqv_exec(tsquery,tsvector) |
booleantt_exec(text,text) | booleanct_exec(character varying,text) |
booleantq_exec(text,tsquery) | booleancq_exec(character varying,tsquery) |
booleantsquery_lt(tsquery,tsquery) | booleantsquery_le(tsquery,tsquery) |
booleantsquery_eq(tsquery,tsquery) | booleantsquery_ne(tsquery,tsquery) |
booleantsquery_ge(tsquery,tsquery) | booleantsquery_gt(tsquery,tsquery) |
booleantsquery_cmp(tsquery,tsquery) | integertsquery_and(tsquery,tsquery) |
tsquerytsquery_or(tsquery,tsquery) | tsquerytsquery_not(tsquery) |
tsquerytsq_mcontains(tsquery,tsquery) | booleantsq_mcontained(tsquery,tsquery) |
booleannumnode(tsquery) | integerquerytree(tsquery) |
textrewrite(tsquery,tsquery,tsquery) | tsqueryrewrite(tsquery,text) |
tsqueryrewrite_accum(tsquery,tsquery[]) | tsqueryrewrite_finish(tsquery) |
tsqueryrewrite(tsquery[]) | tsquerystat(text) |
recordstat(text,text) | recordrank(real[],tsvector,tsquery,integer) |
realrank(real[],tsvector,tsquery) | realrank(tsvector,tsquery,integer) | realrank(tsvector,tsquery)
| realrank_cd(real[],tsvector,tsquery,integer) | realrank_cd(real[],tsvector,tsquery) |
realrank_cd(tsvector,tsquery,integer) | realrank_cd(tsvector,tsquery) | realtoken_type(oid)
| recordtoken_type(text) | recordparse(oid,text) |
recordparse(text,text) | recordlexize(oid,text) |
text[]lexize(text,text) | text[]headline(oid,text,tsquery,text) |
textheadline(oid,text,tsquery) | textheadline(text,text,tsquery,text) |
textheadline(text,text,tsquery) | textheadline(text,tsquery,text) | textheadline(text,tsquery)
| textto_tsvector(oid,text) | tsvectorto_tsvector(text,text) |
tsvectorto_tsquery(oid,text) | tsqueryto_tsquery(text,text) |
tsqueryplainto_tsquery(oid,text) | tsqueryplainto_tsquery(text,text) |
tsqueryto_tsvector(text) | tsvectorto_tsquery(text) |
tsqueryplainto_tsquery(text) | tsquerytsvector_update_trigger() |
triggerget_ts_config_oid(text) | oidget_current_ts_config() | oid
(82 rows)
(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)
While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as "parse" or "stat"
with argument types as generic as "text". I think we need to put
a "ts_" prefix on some of these. Specifically, I find these names
totally unacceptable without a ts_ prefix:
stat(text) | recordstat(text,text) | record
token_type(oid) | recordtoken_type(text) | record
parse(oid,text) | recordparse(text,text) | record
lexize(oid,text) | text[]lexize(text,text) | text[]
These guys might be all right given that some of their arguments are
tsvector or tsquery, but it's not completely convincing --- think about
the case where an argument is given as an undecorated literal string.
It's also not all that clear that they are related to text searching.
I'm for putting a ts_ prefix on them too:
rank(real[],tsvector,tsquery,integer) | realrank(real[],tsvector,tsquery) |
realrank(tsvector,tsquery,integer) | realrank(tsvector,tsquery) |
realrank_cd(real[],tsvector,tsquery,integer)| realrank_cd(real[],tsvector,tsquery) |
realrank_cd(tsvector,tsquery,integer) | realrank_cd(tsvector,tsquery) | real
rewrite(tsquery,tsquery,tsquery) | tsqueryrewrite(tsquery,text) |
tsqueryrewrite_accum(tsquery,tsquery[]) | tsqueryrewrite_finish(tsquery) |
tsqueryrewrite(tsquery[]) | tsquery
headline(oid,text,tsquery,text) | textheadline(oid,text,tsquery) |
textheadline(text,text,tsquery,text) | textheadline(text,text,tsquery) |
textheadline(text,tsquery,text) | textheadline(text,tsquery) | text
These guys are just plain badly named, as it's completely unobvious that
they have anything to do with tsearch (or what they do at all, actually).
Furthermore the "varchar" variants seem entirely redundant with the
"text" ones:
vq_exec(tsvector,tsquery) | booleanqv_exec(tsquery,tsvector) | booleantt_exec(text,text)
| booleanct_exec(character varying,text) | booleantq_exec(text,tsquery)
|booleancq_exec(character varying,tsquery) | boolean
Comments, suggestions?
regards, tom lane
Tom Lane wrote: > I find the following additions to pg_proc in the current tsearch2 patch: It seems a lot of these are useless and just bloat. I will mark a few: > proc | prorettype > ------------------------------------------+------------ > pg_ts_parser_is_visible(oid) | boolean > pg_ts_dict_is_visible(oid) | boolean > pg_ts_template_is_visible(oid) | boolean > pg_ts_config_is_visible(oid) | boolean Why would anyone look these up via OID rather than name? > tsvectorin(cstring) | tsvector > tsvectorout(tsvector) | cstring > tsvectorsend(tsvector) | bytea > tsqueryin(cstring) | tsquery > tsqueryout(tsquery) | cstring > tsquerysend(tsquery) | bytea > gtsvectorin(cstring) | gtsvector > gtsvectorout(gtsvector) | cstring > tsvector_lt(tsvector,tsvector) | boolean > tsvector_le(tsvector,tsvector) | boolean > tsvector_eq(tsvector,tsvector) | boolean > tsvector_ne(tsvector,tsvector) | boolean > tsvector_ge(tsvector,tsvector) | boolean > tsvector_gt(tsvector,tsvector) | boolean > tsvector_cmp(tsvector,tsvector) | integer > length(tsvector) | integer > strip(tsvector) | tsvector > setweight(tsvector,"char") | tsvector > tsvector_concat(tsvector,tsvector) | tsvector > vq_exec(tsvector,tsquery) | boolean > qv_exec(tsquery,tsvector) | boolean > tt_exec(text,text) | boolean > ct_exec(character varying,text) | boolean > tq_exec(text,tsquery) | boolean > cq_exec(character varying,tsquery) | boolean > tsquery_lt(tsquery,tsquery) | boolean > tsquery_le(tsquery,tsquery) | boolean > tsquery_eq(tsquery,tsquery) | boolean > tsquery_ne(tsquery,tsquery) | boolean > tsquery_ge(tsquery,tsquery) | boolean > tsquery_gt(tsquery,tsquery) | boolean > tsquery_cmp(tsquery,tsquery) | integer > tsquery_and(tsquery,tsquery) | tsquery > tsquery_or(tsquery,tsquery) | tsquery > tsquery_not(tsquery) | tsquery > tsq_mcontains(tsquery,tsquery) | boolean > tsq_mcontained(tsquery,tsquery) | boolean > numnode(tsquery) | integer > querytree(tsquery) | text > rewrite(tsquery,tsquery,tsquery) | tsquery > rewrite(tsquery,text) | tsquery > rewrite_accum(tsquery,tsquery[]) | tsquery > rewrite_finish(tsquery) | tsquery > rewrite(tsquery[]) | tsquery > stat(text) | record > stat(text,text) | record > rank(real[],tsvector,tsquery,integer) | real > rank(real[],tsvector,tsquery) | real > rank(tsvector,tsquery,integer) | real > rank(tsvector,tsquery) | real > rank_cd(real[],tsvector,tsquery,integer) | real > rank_cd(real[],tsvector,tsquery) | real > rank_cd(tsvector,tsquery,integer) | real > rank_cd(tsvector,tsquery) | real Do we realy need this many ranking functions? > token_type(oid) | record Again, why by OID? > token_type(text) | record > parse(oid,text) | record > parse(text,text) | record > lexize(oid,text) | text[] > lexize(text,text) | text[] > headline(oid,text,tsquery,text) | text > headline(oid,text,tsquery) | text > headline(text,text,tsquery,text) | text > headline(text,text,tsquery) | text > headline(text,tsquery,text) | text > headline(text,tsquery) | text > to_tsvector(oid,text) | tsvector > to_tsvector(text,text) | tsvector > to_tsquery(oid,text) | tsquery Why OID again for the configuration? I just don't see the use case and it is bloat and causes confusion. > to_tsquery(text,text) | tsquery > plainto_tsquery(oid,text) | tsquery > plainto_tsquery(text,text) | tsquery Again, OID. I asked Oleg about this and he said: > Bruce, just remove oid argument specification from documentation. so I think we can go ahead and remove cases where the configuration name or object is specified by oid. I have already removed them from the documentation and I though the patch had them removed too, but I guess not. Admittedly this API has been in flux. > to_tsvector(text) | tsvector > to_tsquery(text) | tsquery > plainto_tsquery(text) | tsquery > tsvector_update_trigger() | trigger > get_ts_config_oid(text) | oid > get_current_ts_config() | oid > (82 rows) > > (This list omits functions with INTERNAL arguments, as those are of > no particular concern to users.) Also @@ accepts TEXT @@ TEXT, at least according to the documentation. Is it clear to anyone which is tsquery and which tsvector? Is this something we want to support? > While most of these are probably OK, I'm disturbed by the prospect > that we are commandeering names as generic as "parse" or "stat" > with argument types as generic as "text". I think we need to put > a "ts_" prefix on some of these. Specifically, I find these names > totally unacceptable without a ts_ prefix: > > stat(text) | record > stat(text,text) | record > > token_type(oid) | record > token_type(text) | record > > parse(oid,text) | record > parse(text,text) | record > > lexize(oid,text) | text[] > lexize(text,text) | text[] > > These guys might be all right given that some of their arguments are > tsvector or tsquery, but it's not completely convincing --- think about > the case where an argument is given as an undecorated literal string. > It's also not all that clear that they are related to text searching. > I'm for putting a ts_ prefix on them too: > > rank(real[],tsvector,tsquery,integer) | real > rank(real[],tsvector,tsquery) | real > rank(tsvector,tsquery,integer) | real > rank(tsvector,tsquery) | real > rank_cd(real[],tsvector,tsquery,integer) | real > rank_cd(real[],tsvector,tsquery) | real > rank_cd(tsvector,tsquery,integer) | real > rank_cd(tsvector,tsquery) | real > > rewrite(tsquery,tsquery,tsquery) | tsquery > rewrite(tsquery,text) | tsquery > rewrite_accum(tsquery,tsquery[]) | tsquery > rewrite_finish(tsquery) | tsquery > rewrite(tsquery[]) | tsquery > > headline(oid,text,tsquery,text) | text > headline(oid,text,tsquery) | text > headline(text,text,tsquery,text) | text > headline(text,text,tsquery) | text > headline(text,tsquery,text) | text > headline(text,tsquery) | text > > These guys are just plain badly named, as it's completely unobvious that > they have anything to do with tsearch (or what they do at all, actually). > Furthermore the "varchar" variants seem entirely redundant with the > "text" ones: > > vq_exec(tsvector,tsquery) | boolean > qv_exec(tsquery,tsvector) | boolean > tt_exec(text,text) | boolean > ct_exec(character varying,text) | boolean > tq_exec(text,tsquery) | boolean > cq_exec(character varying,tsquery) | boolean > > Comments, suggestions? I would be happy if all text search functions began with 'ts', 'ts_', or 'to_ts', and if we don't clean this up now, it is going to be harder in the future. I think users can expect some migration for text search in 8.3 as a benefit of getting into core and be dump-able. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > I would be happy if all text search functions began with 'ts', 'ts_', or > 'to_ts', and if we don't clean this up now, it is going to be harder in > the future. +1 from me. \df is also much more useful then. > I think users can expect some migration for text search in > 8.3 as a benefit of getting into core and be dump-able. I guess so. Especially if you change some functions, they will have to change source code anyway. So you can as well cleanup all functions that don't fit into a sound naming schema. Best Regards Michael Paesold