Обсуждение: KEEPONLYALNUM for pg_trgm is not documented
contrib/pg_trgm in 9.1 becomes more attractive feature by index supports for LIKE operators, but only alphabet and numeric characters are indexed by default. But, we can modify KEEPONLYALNUM in the source code to keep all characters in n-gram words. However, the limitation and KEEPONLYALNUM are not documented in the page: http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html An additonal documentation patches acceptable? The issues would be a FAQ for non-English users. I heard that pg_trgm will be one of the *killer features* of 9.1 in Japan, where N-gram based text search is preferred. -- Itagaki Takahiro
On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > contrib/pg_trgm in 9.1 becomes more attractive feature by index supports > for LIKE operators, but only alphabet and numeric characters are indexed > by default. But, we can modify KEEPONLYALNUM in the source code to > keep all characters in n-gram words. > > However, the limitation and KEEPONLYALNUM are not documented in the page: > http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html > > An additonal documentation patches acceptable? The issues would be a FAQ for > non-English users. I heard that pg_trgm will be one of the *killer features* > of 9.1 in Japan, where N-gram based text search is preferred. +10 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Mar 11, 2011 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports >> for LIKE operators, but only alphabet and numeric characters are indexed >> by default. But, we can modify KEEPONLYALNUM in the source code to >> keep all characters in n-gram words. >> >> However, the limitation and KEEPONLYALNUM are not documented in the page: >> http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html >> >> An additonal documentation patches acceptable? The issues would be a FAQ for >> non-English users. I heard that pg_trgm will be one of the *killer features* >> of 9.1 in Japan, where N-gram based text search is preferred. > > +10 It's certainly not too late for doc patches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > contrib/pg_trgm in 9.1 becomes more attractive feature by index supports > for LIKE operators, but only alphabet and numeric characters are indexed > by default. But, we can modify KEEPONLYALNUM in the source code to > keep all characters in n-gram words. > However, the limitation and KEEPONLYALNUM are not documented in the page: > http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html > An additonal documentation patches acceptable? The issues would be a FAQ for > non-English users. I heard that pg_trgm will be one of the *killer features* > of 9.1 in Japan, where N-gram based text search is preferred. I'm not sure it's really a great idea to encourage people to use custom builds with modified versions of that symbol. And those not using custom builds will just be frustrated. If we think this is an important feature then we ought to work out a better way to expose the functionality. (Personally I wonder how useful pg_trgm is at all in multibyte encodings. Its idea of a trigram is 3 bytes, not 3 characters...) regards, tom lane
Robert Haas wrote: > On Fri, Mar 11, 2011 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Fri, Mar 11, 2011 at 5:52 PM, Itagaki Takahiro > > <itagaki.takahiro@gmail.com> wrote: > >> contrib/pg_trgm in 9.1 becomes more attractive feature by index supports > >> for LIKE operators, but only alphabet and numeric characters are indexed > >> by default. But, we can modify KEEPONLYALNUM in the source code to > >> keep all characters in n-gram words. > >> > >> However, the limitation and KEEPONLYALNUM are not documented in the page: > >> ?http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html > >> > >> An additonal documentation patches acceptable? The issues would be a FAQ for > >> non-English users. I heard that pg_trgm will be one of the *killer features* > >> of 9.1 in Japan, where N-gram based text search is preferred. > > > > +10 > > It's certainly not too late for doc patches. I have applied the attached documention patch to 9.0, 9.1, and current to mention that only ascii alphanumeric characters are processed by contrib/pg_trgm. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml new file mode 100644 index 18f0f3e..30e5355 *** a/doc/src/sgml/pgtrgm.sgml --- b/doc/src/sgml/pgtrgm.sgml *************** *** 9,15 **** <para> The <filename>pg_trgm</filename> module provides functions and operators ! for determining the similarity of text based on trigram matching, as well as index operator classes that support fast searching for similar strings. </para> --- 9,16 ---- <para> The <filename>pg_trgm</filename> module provides functions and operators ! for determining the similarity of <acronym>ASCII</> ! alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings. </para>