Re: Simplifying Text Search

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Simplifying Text Search
Дата
Msg-id 1194936519.2644.261.camel@ebony.site
обсуждение исходный текст
Ответ на Re: Simplifying Text Search  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: Simplifying Text Search  ("Pavel Stehule" <pavel.stehule@gmail.com>)
Список pgsql-hackers
On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote:
> > > Simon Riggs <simon@2ndquadrant.com> writes:
> > > > So we end up with a normal sounding function that is overloaded to
> > > > provide all of the various goodies.
> > > 
> > > As best I can tell, @@ does exactly this already.  This is just a
> > > different spelling of the same capability, and I don't actually
> > > find it better.  Why is "text_search(x,y)" better than "x @@ y"?
> > > We don't recommend that people write "texteq(x,y)" instead of
> > > "x = y".
> > 
> > Most people don't understand those differences. x = y means "make sure
> > they are the same" to most people. They don't see what you (and I) see:
> > function and operator interchangeability. So text_search() is better
> > than @@ and = is better than texteq(). Life ain't neat...
> > 
> > Right now, Full Text Search SQL looks like complete gibberish and it
> > dissuades many people from using what is an awesome set of features. I
> > just want to add a little sugar to help people get started.
> 
> I realized this when editing the documentation but not clearly.  I
> noticed that:
> 
>     http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING
> 
>     tsvector @@ tsquery
>     tsquery  @@ tsvector
>     text @@ tsquery
>     text @@ text
> 
>     The first two of these we saw already. The form text @@ tsquery  is
>     equivalent to to_tsvector(x) @@ y. The form text @@ text  is equivalent
>     to to_tsvector(x) @@ plainto_tsquery(y).
> 
> was quite odd, especially the "text @@ text" case, and in fact it makes
> casting almost required unless you can remember which one is a query and
> which is a vector (hint, the vector is first).  What really adds to the
> confusion is that the operator is two _identical_ characters, meaning
> the operator is symetric, and it behave symetric if you cast one side,
> but as vector @@ query if you don't.

I'm thinking we can have an inlinable function

contains(text, text) returns int 

Return values limited to just 0 or 1 or NULL, as with SQL/MM.
It's close to SQL/MM, but not exact.

contains(sourceText, searchText) is a macro for

case to_tsvector(default_text_search_config, sourceText) @@
to_tsquery(default_text_search_config, searchText)
when true then 1
when false then 0
else null
end

that allows us to write indexable queries like this

WHERE contains(sourceText, searchText) > 0

where we must still have built the index on a constant config.
Not checked that still works yet, maybe not, in which case something
slightly more complex to make sure its still indexable. This is the
difficult part.

So changes are:
- add SQL function
- simplify first 2 pages of docs using this function

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christopher Browne
Дата:
Сообщение: Re: How to keep a table in memory?
Следующее
От: "Pavel Stehule"
Дата:
Сообщение: Re: Simplifying Text Search