Обсуждение: pgsql: Phrase full text search.
Phrase full text search. Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery. On-disk and binary in/out format of tsquery are backward compatible. It has two side effect: - change order for tsquery, so, users, who has a btree index over tsquery, should reindex it - less number of parenthesis in tsquery output, and tsquery becomes more readable Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov Reviewers: Alexander Korotkov, Artur Zakirov Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/bb140506df605fab58f48926ee1db1f80bdafb59 Modified Files -------------- contrib/tsearch2/expected/tsearch2.out | 56 ++--- doc/src/sgml/datatype.sgml | 9 +- doc/src/sgml/func.sgml | 39 ++++ doc/src/sgml/textsearch.sgml | 182 ++++++++++++++- src/backend/tsearch/to_tsany.c | 187 +++++++-------- src/backend/tsearch/ts_parse.c | 15 +- src/backend/tsearch/ts_selfuncs.c | 3 +- src/backend/tsearch/wparser_def.c | 31 ++- src/backend/utils/adt/tsginidx.c | 57 +++-- src/backend/utils/adt/tsgistidx.c | 4 +- src/backend/utils/adt/tsquery.c | 311 +++++++++++++++++++------ src/backend/utils/adt/tsquery_cleanup.c | 362 +++++++++++++++++++++++++++-- src/backend/utils/adt/tsquery_op.c | 54 ++++- src/backend/utils/adt/tsquery_util.c | 11 +- src/backend/utils/adt/tsrank.c | 263 ++++++++++++++------- src/backend/utils/adt/tsvector.c | 2 +- src/backend/utils/adt/tsvector_op.c | 326 +++++++++++++++++++++++--- src/backend/utils/adt/tsvector_parser.c | 10 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_operator.h | 3 + src/include/catalog/pg_proc.h | 7 + src/include/tsearch/ts_public.h | 22 +- src/include/tsearch/ts_type.h | 30 ++- src/include/tsearch/ts_utils.h | 15 +- src/test/regress/expected/tsdicts.out | 36 ++- src/test/regress/expected/tsearch.out | 395 +++++++++++++++++++++++++++++--- src/test/regress/expected/tstypes.out | 369 ++++++++++++++++++++++++++++- src/test/regress/sql/tsdicts.sql | 3 + src/test/regress/sql/tsearch.sql | 101 ++++++++ src/test/regress/sql/tstypes.sql | 75 +++++- 30 files changed, 2536 insertions(+), 444 deletions(-)
Teodor Sigaev <teodor@sigaev.ru> writes:
> Phrase full text search.
Hasn't this patch broken on-disk compatibility of type tsquery by
renumbering the values of QueryOperator.operator? I'm looking at
the patch delta in ts_type.h.
regards, tom lane
I wrote:
> ... I'm looking at the patch delta in ts_type.h.
BTW, while I'm looking at that: comparePos() was a perfectly OK
name for a static function within tsvector.c, but it seems like a
pretty horrid name for a globally exposed linker symbol. Please
rename it to something less generic.
regards, tom lane
>> Phrase full text search.
>
> Hasn't this patch broken on-disk compatibility of type tsquery by
> renumbering the values of QueryOperator.operator? I'm looking at
> the patch delta in ts_type.h.
Distance field is placed exactly in hole between two uint8_t fields and uint32_t
field, as I known any known platform which we support uses 4-byte aligment for
int32 type. Am I wrong? If yes then I will move distance to the end of struct.
QueryOpertor struct isn't used directly to store to disk, it's used in union
QueryItem.
sizeof(QueryItem) = 12
sizeof(QueryOperator) = 8, so we can add distance to the end without growning
size of QueryItem.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
>> Hasn't this patch broken on-disk compatibility of type tsquery by
>> renumbering the values of QueryOperator.operator? I'm looking at
>> the patch delta in ts_type.h.
> Distance field is placed exactly in hole between two uint8_t fields and uint32_t
> field, as I known any known platform which we support uses 4-byte aligment for
> int32 type. Am I wrong?
No, I'm worried about the fact that you changed the OP_xxx constants.
Won't that cause a pre-existing tsquery operator to be read incorrectly?
Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
they were before, which means you need to give up on the assumption that
the numerical values of the OP_xxx constants correspond directly to their
syntactic priority. But that assumption was never going to survive the
next tsquery expansion anyway. I'd suggest a static const array mapping
the OP values into their syntactic priorities.
regards, tom lane
> Assuming that I'm right, you need to revert OP_AND/OP_OR/OP_NOT to what
> they were before, which means you need to give up on the assumption that
> the numerical values of the OP_xxx constants correspond directly to their
> syntactic priority. But that assumption was never going to survive the
> next tsquery expansion anyway. I'd suggest a static const array mapping
> the OP values into their syntactic priorities.
Oh, I see. Will fix.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/