Re: Updated tsearch documentation

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Updated tsearch documentation
Дата
Msg-id 200706202219.l5KMJiH05570@momjian.us
обсуждение исходный текст
Ответ на Re: Updated tsearch documentation  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: Updated tsearch documentation  (Oleg Bartunov <oleg@sai.msu.su>)
Re: Updated tsearch documentation  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-hackers
Oleg Bartunov wrote:
> On Wed, 20 Jun 2007, Bruce Momjian wrote:
> >> Comments to editorial work of Bruce Momjian.
> >>
> >> fulltext-intro.sgml:
> >>
> >> it is useful to have a predefined list of lexemes.
> >>
> >> Bruce, here should be list of types of lexemes !
> >
> > Agreed.  Are the list of lexemes parser-specific?
> >
> 
> yes, it it parser which defines types of lexemes.

OK, how will users get a list of supported lexemes?  Do we need a list
per supported parser?

> >> fulltext-opfunc.sgml:
> >>
> >> All of the following functions that accept a configuration argument can
> >> use either an integer <!-- why an integer --> or a textual configuration
> >> name to select a configuration.
> >>
> >> originally it was integer id, probably better use <type>oid</type>
> >
> > Uh, my question is why are you allowing specification as an integer/oid
> > when the name works just fine.  I don't see the value in allowing
> > numbers here.
> 
> for compatibility reason. Hmm, indeed, i don't recall where oid's could be 
> important.

Well, if neither of ussee no reason for it, let's remove it.  We don't
need to support a feature that has no usefulness.

> >> This returns the query used for searching an index. It can be used to test
> >> for an empty query. The <command>SELECT</> below returns <literal>'T'</>,
> >> <!-- lowercase? --> which corresponds to an empty query since GIN indexes
> >> do not support negate queries (a full index scan is inefficient):
> >>
> >>> capital case. This looks cumbersome, probably querytree() should
> >>> just return NULL.
> >
> > Agreed.
> >
> >> The integer option controls several behaviors which is done using bit-wise
> >> fields and <literal>|</literal> (for example, <literal>2|4</literal>):
> >> <!-- why so complex? -->
> >>
> >>> to avoid 2 arguments
> >
> > But I don't see why you would want to set two of those values --- they
> > seem mutually exclusive, e.g.
> >
> >     1 divides the rank by the 1 + logarithm of the document length
> >     2 divides the rank by the length itself
> >
> > I assume you do either one, not both.
> 
> but what's about others variants ?

OK, here is the full list:
0 (the default) ignores document length1 divides the rank by the 1 + logarithm of the document length2 divides the rank
bythe length itself4 divides the rank by the mean harmonic distance between extents8 divides the rank by the number of
uniquewords in document16 divides the rank by 1 + logarithm of the number of unique words in   document
 

so which ones would be both enabled?

> 
> What I missed is the definition of extent.
> 
> >From http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking
> Extent is a shortest and non-nested sequence of words, which satisfy a query.

I don't understand how that relates to this.

> >
> >> its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n
> >> if none is specified that the current configuration is used.
> >>
> >>> I don't understand this question
> >
> > Same issue as above --- why allow a number here when the name works just
> > fine.  We don't allow tables to be specified by number, so why
> > configurations?
> >
> >> <para>
> >> <!-- why?  -->
> >> Note that the cascade dropping of the <function>headline</function> function
> >> cause dropping of the <literal>parser</literal> used in fulltext configuration
> >> <replaceable>tsname</replaceable>.
> >> </para>
> >>
> >>> hmm, probably it should be reversed - cascade dropping of the parser cause
> >>> dropping of the headline function.
> >
> > Agreed.
> >
> >>
> >> In example below, <literal>fulltext_idx</literal> is
> >> a GIN index:<!-- why isn't this automatic -->
> >>
> >>> It's explained above. The problem is that current index api doesn't allow
> >>> to say if search was lossy or exact, so to preserve performance of
> >>> GIN index we had to introduce @@@ operator, which is the same as @@, but
> >>> lossy.
> >
> > Well, then we have to fix the API.  Telling users to use a different
> > operator based on what index is defined is just bad style.
> 
> This was raised by Heikki and we discussed it a bit in Ottawa, but it's
> unclear if it's doable for 8.3.  @@@ operator is in rare use, so we could
> say it will be improved in future versions.

Uh, I am wondering if we just have to force heap access in all cases
until it is fixed.

> >> nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym>
> >> definition like ' one 1:11' will not work since lexeme type
> >> <token>digit</token> is not assigned to the <acronym>TZ</acronym>.
> >> <!-- what do these numbers mean? -->
> >> </para>
> >
> > OK, I changed it to be clearer.
> >
> >>> nothing special, just numbers for example.
> >>
> >> <function>ts_debug</> displays information about every token of
> >> <replaceable class="PARAMETER">document</replaceable> as produced by the
> >> parser and processed by the configured dictionaries using the configuration
> >> specified by <replaceable class="PARAMETER">cfgname</replaceable> or
> >> <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid
> >>
> >>> don't understand this comment. ts_debug accepts cfgname or its oid
> >
> > Again, no need for oid.
> 
> We need to decide if we need oids as user-visible argument. I don't see
> any value, probably Teodor think other way.

This is a good time to clean up the API because there are going to be
user-visible changes anyway.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: GUC time unit spelling a bit inconsistent
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: GUC time unit spelling a bit inconsistent