Обсуждение: hunspell and tsearch2 ?

Поиск
Список
Период
Сортировка

hunspell and tsearch2 ?

От
"Dirk Lutzebäck"
Дата:
Hi,

we have issues with compound words in tsearch2 using the german (ispell)
dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
openoffice dict file to ispell suitable for tsearch):

# select ts_lexize('german_ispell', 'vollklimatisiert');     ts_lexize
-------------------- {vollklimatisiert}
(1 row)

This should return atleast
 {vollklimatisiert, voll, klimatisiert}


The issue with compound words in ispell has been addressed in hunspell.
But this has not been integrated fully to tsearch2 (according to the
documentation).

Are there any plans to fully integrate hunspell into tsearch2? What is
needed to do this? What is the functional delta which is missing? Maybe
we can help...


Thanks for help

Dirk





Re: hunspell and tsearch2 ?

От
Robert Haas
Дата:
On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
<dirk.lutzebaeck@thinkproject.com> wrote:
> we have issues with compound words in tsearch2 using the german (ispell)
> dictionary. This has been discussed before but there is no real solution
> using the recommended german dictionary at
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
> openoffice dict file to ispell suitable for tsearch):
>
> # select ts_lexize('german_ispell', 'vollklimatisiert');
>      ts_lexize
> --------------------
>  {vollklimatisiert}
> (1 row)
>
> This should return atleast
>
>  {vollklimatisiert, voll, klimatisiert}
>
>
> The issue with compound words in ispell has been addressed in hunspell. But
> this has not been integrated fully to tsearch2 (according to the
> documentation).

Just out of curiosity, which part of the documentation are you looking
at?  The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.

> Are there any plans to fully integrate hunspell into tsearch2? What is
> needed to do this? What is the functional delta which is missing? Maybe we
> can help...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: hunspell and tsearch2 ?

От
"Dirk Lutzebäck"
Дата:
<div class="moz-cite-prefix">Hi Robert,<br /><br /> there is a note in the pg documentation chapter<br /><br
/><blockquote>12.6.5Ispell Dictionary<br /></blockquote><blockquote><b>Note:</b><span class="APPLICATION">
MySpell</span>does not support compound words. <span class="APPLICATION">Hunspell</span> has sophisticated support for
compoundwords. At present, <span class="PRODUCTNAME">PostgreSQL</span> implements only the basic compound word
operationsof Hunspell.<br /></blockquote> Regards<br /> Dirk<br /><br /><br /> On 08/30/2012 05:39 PM, Robert Haas
wrote:<br/></div><blockquote cite="mid:CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com"
type="cite"><prewrap="">On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck 
<a class="moz-txt-link-rfc2396E"
href="mailto:dirk.lutzebaeck@thinkproject.com"><dirk.lutzebaeck@thinkproject.com></a>wrote: 
</pre><blockquote type="cite"><pre wrap="">we have issues with compound words in tsearch2 using the german (ispell)
dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
<a class="moz-txt-link-freetext"
href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2">http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2</a>
(convertold 
openoffice dict file to ispell suitable for tsearch):

# select ts_lexize('german_ispell', 'vollklimatisiert');    ts_lexize
--------------------{vollklimatisiert}
(1 row)

This should return atleast
{vollklimatisiert, voll, klimatisiert}


The issue with compound words in ispell has been addressed in hunspell. But
this has not been integrated fully to tsearch2 (according to the
documentation).
</pre></blockquote><pre wrap="">
Just out of curiosity, which part of the documentation are you looking
at?  The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.

</pre><blockquote type="cite"><pre wrap="">Are there any plans to fully integrate hunspell into tsearch2? What is
needed to do this? What is the functional delta which is missing? Maybe we
can help...
</pre></blockquote><pre wrap="">
</pre></blockquote><br /><br /><div class="moz-signature">-- <br /><p> Mit freundlichen Grüßen / Best regards,
<p><b>thinkproject! International GmbH & Co. KG</b><p> Dirk Lutzebäck<br /> Geschäftsführer / Managing Director,
CTO<p> Tel +49 30 921 017 90<br /> Fax +49 30 921 017 50<br /><a class="moz-txt-link-abbreviated"
href="mailto:dirk.lutzebaeck@thinkproject.com">dirk.lutzebaeck@thinkproject.com</a><br/><p> Rechtliche Informationen
zumAbsender (Impressum): <a href="http://www.thinkproject.com/de/info">www.thinkproject.com/de/info</a><p> Legal
information(imprint): <a href="http://www.thinkproject.com/en/info">www.thinkproject.com/en/info</a></div>