Обсуждение: BUG #13964: unexpected result from to_tsvector

Поиск
Список
Период
Сортировка

BUG #13964: unexpected result from to_tsvector

От
ruxandra.durus@vauban.ro
Дата:
The following bug has been logged on the website:

Bug reference:      13964
Logged by:          Ruxandra
Email address:      ruxandra.durus@vauban.ro
PostgreSQL version: 9.5.1
Operating system:   Cent OS 6.7
Description:

Hello,

  My version of PostgreSQL is:
"PostgreSQL 9.5beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit"

More details about the operating system:
Linux javatesting 2.6.32-573.7.1.el6.x86_64 #1 SMP Tue Sep 22 22:00:00 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

   I am using pgAdmin version 1.20.0 to query the database.

   I am using your full text search (which works great), but i have a small
problem:
SELECT to_tsvector('simple', 'test@vauban-reg.ro');

returns "'test@vauban-reg.ro':1"

which is exactly what I need.


But when I run :

SELECT to_tsvector('simple', 'test@123-reg.ro');

I get:
"'123':2 'reg.ro':3 'test':1"

instead of "'test@123-reg.ro':1"

>From the documentation here
http://www.postgresql.org/docs/current/static/pgtrgm.html , point F.30.4. I
understood that with "simple" option only space is a separator for the
stems. Is it a bug or am I doing something wrong?

Thank you for your time,
Ruxandra Durus

Re: BUG #13964: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
On 17.02.2016 11:00, ruxandra.durus@vauban.ro wrote:
>
>    My version of PostgreSQL is:
> "PostgreSQL 9.5beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-16), 64-bit"
>
> More details about the operating system:
> Linux javatesting 2.6.32-573.7.1.el6.x86_64 #1 SMP Tue Sep 22 22:00:00 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>
>     I am using pgAdmin version 1.20.0 to query the database.
>
>     I am using your full text search (which works great), but i have a small
> problem:
> SELECT to_tsvector('simple', 'test@vauban-reg.ro');
>
> returns "'test@vauban-reg.ro':1"
>
> which is exactly what I need.
>
>
> But when I run :
>
> SELECT to_tsvector('simple', 'test@123-reg.ro');
>
> I get:
> "'123':2 'reg.ro':3 'test':1"
>
> instead of "'test@123-reg.ro':1"
>
>>From the documentation here
> http://www.postgresql.org/docs/current/static/pgtrgm.html , point F.30.4. I
> understood that with "simple" option only space is a separator for the
> stems. Is it a bug or am I doing something wrong?
>
> Thank you for your time,
> Ruxandra Durus
>

Hi,

It seems that this is a text search parser issue. More informative queries:

=> SELECT * FROM ts_debug('simple', 'test@vauban-reg.ro');
  alias |  description  |       token        | dictionaries | dictionary
|       lexemes
-------+---------------+--------------------+--------------+------------+----------------------
  email | Email address | test@vauban-reg.ro | {simple}     | simple
  | {test@vauban-reg.ro}
(1 row)

=> SELECT * FROM ts_debug('simple', 'test@123-reg.ro');
    alias   |   description    | token  | dictionaries | dictionary |
lexemes
-----------+------------------+--------+--------------+------------+----------
  asciiword | Word, all ASCII  | test   | {simple}     | simple     | {test}
  blank     | Space symbols    | @      | {}           |            |
  uint      | Unsigned integer | 123    | {simple}     | simple     | {123}
  blank     | Space symbols    | -      | {}           |            |
  host      | Host             | reg.ro | {simple}     | simple     |
{reg.ro}
(5 rows)


Attached patch can fix it. Is this a bug? Should I create a record in
the commitfest?

This patch also allows to parser work with emails '123@123-reg.ro' and
'test@123_reg.ro' correctly.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Вложения

Re: BUG #13964: unexpected result from to_tsvector

От
Ruxandra Durus
Дата:
SGVsbG8sDQoNCglUaGFuayB5b3UgZm9yIHlvdXIgZmFzdCByZXNwb25zZSwgYnV0IEkgY2Fubm90
IGluc3RhbGwgcGF0Y2hlcywgYW5kIEkgZG9uJ3Qga25vdyBob3cuIEkgaG9wZSB0aGlzIGZpeCB3
aWxsIGJlIGluY2x1ZGVkIGluIGEgdmVyc2lvbiBvZiBQb3N0Z3JlU1FMIGluIHRoZSBmdXR1cmUu
DQoNClRoYW5rIHlvdSBmb3IgeW91ciB0aW1lLA0KUnV4YW5kcmEgRHVydXMNCg0KLS0tLS1Pcmln
aW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IEFydHVyIFpha2lyb3YgW21haWx0bzphLnpha2lyb3ZA
cG9zdGdyZXNwcm8ucnVdIA0KU2VudDogVGh1cnNkYXksIEZlYnJ1YXJ5IDE4LCAyMDE2IDEyOjU0
IFBNDQpUbzogUnV4YW5kcmEgRHVydXM7IHBnc3FsLWJ1Z3NAcG9zdGdyZXNxbC5vcmcNClN1Ympl
Y3Q6IFJlOiBbQlVHU10gQlVHICMxMzk2NDogdW5leHBlY3RlZCByZXN1bHQgZnJvbSB0b190c3Zl
Y3Rvcg0KDQpPbiAxNy4wMi4yMDE2IDExOjAwLCBydXhhbmRyYS5kdXJ1c0B2YXViYW4ucm8gd3Jv
dGU6DQo+DQo+ICAgIE15IHZlcnNpb24gb2YgUG9zdGdyZVNRTCBpczoNCj4gIlBvc3RncmVTUUwg
OS41YmV0YTEgb24geDg2XzY0LXBjLWxpbnV4LWdudSwgY29tcGlsZWQgYnkgZ2NjIChHQ0MpIA0K
PiA0LjQuNw0KPiAyMDEyMDMxMyAoUmVkIEhhdCA0LjQuNy0xNiksIDY0LWJpdCINCj4NCj4gTW9y
ZSBkZXRhaWxzIGFib3V0IHRoZSBvcGVyYXRpbmcgc3lzdGVtOg0KPiBMaW51eCBqYXZhdGVzdGlu
ZyAyLjYuMzItNTczLjcuMS5lbDYueDg2XzY0ICMxIFNNUCBUdWUgU2VwIDIyIDIyOjAwOjAwIA0K
PiBVVEMNCj4gMjAxNSB4ODZfNjQgeDg2XzY0IHg4Nl82NCBHTlUvTGludXgNCj4NCj4gICAgIEkg
YW0gdXNpbmcgcGdBZG1pbiB2ZXJzaW9uIDEuMjAuMCB0byBxdWVyeSB0aGUgZGF0YWJhc2UuDQo+
DQo+ICAgICBJIGFtIHVzaW5nIHlvdXIgZnVsbCB0ZXh0IHNlYXJjaCAod2hpY2ggd29ya3MgZ3Jl
YXQpLCBidXQgaSBoYXZlIGEgDQo+IHNtYWxsDQo+IHByb2JsZW06DQo+IFNFTEVDVCB0b190c3Zl
Y3Rvcignc2ltcGxlJywgJ3Rlc3RAdmF1YmFuLXJlZy5ybycpOw0KPg0KPiByZXR1cm5zICIndGVz
dEB2YXViYW4tcmVnLnJvJzoxIg0KPg0KPiB3aGljaCBpcyBleGFjdGx5IHdoYXQgSSBuZWVkLg0K
Pg0KPg0KPiBCdXQgd2hlbiBJIHJ1biA6DQo+DQo+IFNFTEVDVCB0b190c3ZlY3Rvcignc2ltcGxl
JywgJ3Rlc3RAMTIzLXJlZy5ybycpOw0KPg0KPiBJIGdldDoNCj4gIicxMjMnOjIgJ3JlZy5ybyc6
MyAndGVzdCc6MSINCj4NCj4gaW5zdGVhZCBvZiAiJ3Rlc3RAMTIzLXJlZy5ybyc6MSINCj4NCj4+
RnJvbSB0aGUgZG9jdW1lbnRhdGlvbiBoZXJlDQo+IGh0dHA6Ly93d3cucG9zdGdyZXNxbC5vcmcv
ZG9jcy9jdXJyZW50L3N0YXRpYy9wZ3RyZ20uaHRtbCAsIHBvaW50IA0KPiBGLjMwLjQuIEkgdW5k
ZXJzdG9vZCB0aGF0IHdpdGggInNpbXBsZSIgb3B0aW9uIG9ubHkgc3BhY2UgaXMgYSANCj4gc2Vw
YXJhdG9yIGZvciB0aGUgc3RlbXMuIElzIGl0IGEgYnVnIG9yIGFtIEkgZG9pbmcgc29tZXRoaW5n
IHdyb25nPw0KPg0KPiBUaGFuayB5b3UgZm9yIHlvdXIgdGltZSwNCj4gUnV4YW5kcmEgRHVydXMN
Cj4NCg0KSGksDQoNCkl0IHNlZW1zIHRoYXQgdGhpcyBpcyBhIHRleHQgc2VhcmNoIHBhcnNlciBp
c3N1ZS4gTW9yZSBpbmZvcm1hdGl2ZSBxdWVyaWVzOg0KDQo9PiBTRUxFQ1QgKiBGUk9NIHRzX2Rl
YnVnKCdzaW1wbGUnLCAndGVzdEB2YXViYW4tcmVnLnJvJyk7DQogIGFsaWFzIHwgIGRlc2NyaXB0
aW9uICB8ICAgICAgIHRva2VuICAgICAgICB8IGRpY3Rpb25hcmllcyB8IGRpY3Rpb25hcnkgDQp8
ICAgICAgIGxleGVtZXMNCi0tLS0tLS0rLS0tLS0tLS0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tLS0t
LS0tKy0tLS0tLS0tLS0tLS0tKy0tLS0tLS0tLS0tLSstLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQog
IGVtYWlsIHwgRW1haWwgYWRkcmVzcyB8IHRlc3RAdmF1YmFuLXJlZy5ybyB8IHtzaW1wbGV9ICAg
ICB8IHNpbXBsZSANCiAgfCB7dGVzdEB2YXViYW4tcmVnLnJvfQ0KKDEgcm93KQ0KDQo9PiBTRUxF
Q1QgKiBGUk9NIHRzX2RlYnVnKCdzaW1wbGUnLCAndGVzdEAxMjMtcmVnLnJvJyk7DQogICAgYWxp
YXMgICB8ICAgZGVzY3JpcHRpb24gICAgfCB0b2tlbiAgfCBkaWN0aW9uYXJpZXMgfCBkaWN0aW9u
YXJ5IHwgDQpsZXhlbWVzDQotLS0tLS0tLS0tLSstLS0tLS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0r
LS0tLS0tLS0tLS0tLS0rLS0tLS0tLS0tLS0tKy0tLS0NCi0tLS0tLS0tLS0tKy0tLS0tLS0tLS0t
LS0tLS0tLSstLS0tLS0tLSstLS0tLS0tLS0tLS0tLSstLS0tLS0tLS0tLS0rLS0tLQ0KLS0tLS0t
LS0tLS0rLS0tLS0tLS0tLS0tLS0tLS0tKy0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tKy0tLS0tLS0t
LS0tLSstLQ0KICBhc2NpaXdvcmQgfCBXb3JkLCBhbGwgQVNDSUkgIHwgdGVzdCAgIHwge3NpbXBs
ZX0gICAgIHwgc2ltcGxlICAgICB8IHt0ZXN0fQ0KICBibGFuayAgICAgfCBTcGFjZSBzeW1ib2xz
ICAgIHwgQCAgICAgIHwge30gICAgICAgICAgIHwgICAgICAgICAgICB8DQogIHVpbnQgICAgICB8
IFVuc2lnbmVkIGludGVnZXIgfCAxMjMgICAgfCB7c2ltcGxlfSAgICAgfCBzaW1wbGUgICAgIHwg
ezEyM30NCiAgYmxhbmsgICAgIHwgU3BhY2Ugc3ltYm9scyAgICB8IC0gICAgICB8IHt9ICAgICAg
ICAgICB8ICAgICAgICAgICAgfA0KICBob3N0ICAgICAgfCBIb3N0ICAgICAgICAgICAgIHwgcmVn
LnJvIHwge3NpbXBsZX0gICAgIHwgc2ltcGxlICAgICB8IA0Ke3JlZy5yb30NCig1IHJvd3MpDQoN
Cg0KQXR0YWNoZWQgcGF0Y2ggY2FuIGZpeCBpdC4gSXMgdGhpcyBhIGJ1Zz8gU2hvdWxkIEkgY3Jl
YXRlIGEgcmVjb3JkIGluIHRoZSBjb21taXRmZXN0Pw0KDQpUaGlzIHBhdGNoIGFsc28gYWxsb3dz
IHRvIHBhcnNlciB3b3JrIHdpdGggZW1haWxzICcxMjNAMTIzLXJlZy5ybycgYW5kICd0ZXN0QDEy
M19yZWcucm8nIGNvcnJlY3RseS4NCg0KLS0NCkFydHVyIFpha2lyb3YNClBvc3RncmVzIFByb2Zl
c3Npb25hbDogaHR0cDovL3d3dy5wb3N0Z3Jlc3Byby5jb20gUnVzc2lhbiBQb3N0Z3JlcyBDb21w
YW55DQo=

Re: BUG #13964: unexpected result from to_tsvector

От
Jim Nasby
Дата:
On 2/18/16 4:53 AM, Artur Zakirov wrote:
> Attached patch can fix it. Is this a bug? Should I create a record in
> the commitfest?

Unless someone commits it in the next day or so, I would absolutely put
it in the commitfest.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com