Обсуждение: BUG #3433: regexp \m and \M don't work for cyrillic
The following bug has been logged online:
Bug reference: 3433
Logged by: Andriy Rysin
Email address: arysin@gmail.com
PostgreSQL version: 8.2.4
Operating system: Linux
Description: regexp \m and \M don't work for cyrillic
Details:
psql krym
krym=> \encoding
UTF8
krym=> create table test (txt varchar);
CREATE TABLE
krym=> insert into test values ('latin');
INSERT 0 1
krym=> insert into test values ('киÑилиÑÑ');
INSERT 0 1
krym=> select * from test;
txt
----------
latin
киÑилиÑÑ
(2 rows)
krym=> select * from test where txt ~* E'\\mla';
txt
-------
latin
(1 row)
krym=> select * from test where txt ~* E'\\mкиÑ';
txt
-----
(0 rows)
escaping specials in regular expressions \m and \M for beginning of word and
end of word work for latin symbols bug don't for cyrillic
"Andriy Rysin" <arysin@gmail.com> writes:
> escaping specials in regular expressions \m and \M for beginning of word and
> end of word work for latin symbols bug don't for cyrillic
Sorry, the locale-specific regex features only work on single-byte
characters at the moment. In any case you'd need to be using a Russian
locale (maybe you are, but you didn't say). I'd expect this feature
to work with Cyrillic letters in ru_RU locale + KOI8 encoding, but not
elsewhere.
regards, tom lane
MjAwNy83LzcsIFRvbSBMYW5lIDx0Z2xAc3NzLnBnaC5wYS51cz46Cj4KPiAi QW5kcml5IFJ5c2luIiA8YXJ5c2luQGdtYWlsLmNvbT4gd3JpdGVzOgo+ID4g ZXNjYXBpbmcgc3BlY2lhbHMgaW4gcmVndWxhciBleHByZXNzaW9ucyBcbSBh bmQgXE0gZm9yIGJlZ2lubmluZyBvZiB3b3JkCj4gYW5kCj4gPiBlbmQgb2Yg d29yZCB3b3JrIGZvciBsYXRpbiBzeW1ib2xzIGJ1ZyBkb24ndCBmb3IgY3ly aWxsaWMKPgo+IFNvcnJ5LCB0aGUgbG9jYWxlLXNwZWNpZmljIHJlZ2V4IGZl YXR1cmVzIG9ubHkgd29yayBvbiBzaW5nbGUtYnl0ZQo+IGNoYXJhY3RlcnMg YXQgdGhlIG1vbWVudC4gIEluIGFueSBjYXNlIHlvdSdkIG5lZWQgdG8gYmUg dXNpbmcgYSBSdXNzaWFuCj4gbG9jYWxlIChtYXliZSB5b3UgYXJlLCBidXQg eW91IGRpZG4ndCBzYXkpLiAgSSdkIGV4cGVjdCB0aGlzIGZlYXR1cmUKPiB0 byB3b3JrIHdpdGggQ3lyaWxsaWMgbGV0dGVycyBpbiBydV9SVSBsb2NhbGUg KyBLT0k4IGVuY29kaW5nLCBidXQgbm90Cj4gZWxzZXdoZXJlLgoKCkhpIFRv bSwKCkkgd2FzIHVzaW5nIGVuX1VTLlVURi04IGxvY2FsZSBidXQgeW91J3Jl IHJpZ2h0IGV2ZW4gaWYgSSBjcmVhdGUgbXkgY2x1c3Rlcgp3aXRoIHVrX1VB LlVURi04IHN0aWxsIFxtIHdvdWxkIG5vdCB3b3JrIGZvciBjeXJpbGxpYyBi dXQgd291bGQgY29udGludWUgdG8Kd29yayBmb3IgbGF0aW4gY2hhcnMuIEkg Y2FuJ3Qgd29yayB3aXRoIHNpbmdsZS1ieXRlIGVuY29kaW5ncyBhcyBJIGhh dmUgc29tZQpzeW1ib2xzIGZyb20gVW5pY29kZSBpbiBteSBwcm9qZWN0IGFu ZCBldmVyeXRoaW5nIGVsc2UgaXMgaW4gVW5pY29kZSBzbwpjb252ZXJ0aW5n IGRhdGEgZm9ydGggYW5kIGJhY2sgd291bGQgYmUgcXVpdGUgYSBkcmFnLgoK U28gY3VycmVudGx5IG15IG9ubHkgd29ya2Fyb3VuZCBmb3IgXG0gaXMgdG8g dXNlIChefFteWzphbHBoYTpdXSkgdGhvdWdoCls6YWxwaGE6XSBldmVuIGlu IHVrX1VBLlVURi04IG1lYW5zIGxhdGluIGNoYXJhY3RlciwgdGh1cyBJIGhh dmUgdG8gc3BlY2lmeQpzeW1ib2xzIGRpcmVjdGx5LCBlLmcuIChefFte0LAt 0Y/RltGU0ZfSkV0pIHdoaWNoIG1heSBiZSBnb29kIGlmIEkgZG9uJ3QgY2Fy ZSB0bwpzZXBhcmF0ZSBSdXNzaWFuIGFuZCBVa3JhaW5pYW4gYnV0IGlmIEkg ZG8gSSdkIGhhdmUgdG8gYmUgZXZlbiBtb3JlIHNwZWNpZmljCmZvciBwdXJl IFVrcmFpbmlhbjogKF58W17QsC3RjNGO0Y/RltGU0ZfSkV0pIChhc3N1bWlu ZyBJIHJlbWVtYmVyIGFib3V0CmNhc2Utc2Vuc2l0aXZpdHkgb2YgbXkgcmVn ZXhwIGFuZCBhc3N1bWluZyBJIGtub3cgVVRGLTggY29kZXMpLgoKVGhvdWdo IEkgYWdyZWUgSSBtaXNzZWQgdGhlIGZhY3QgdGhhdCBcbSBpcyBsb2NhbGUt c3BlY2lmaWMgKGFzIGl0IGhhcyB0bwprbm93IHByb3BlciBub24td29yZCBh bmQgd29yZCBjaGFycyBmb3IgbG9jYWxlKSBhbmQgdGh1cyBjYW4ndCB3b3Jr IGZvciBhbGwKbG9jYWxlcyBldmVuIGlmIHVzaW5nIFVuaWNvZGUgYW5kIG15 IG9yaWdpbmFsIHRlc3QgaW4gZW5fVVMgbG9jYWxlIHdhcyBub3QKdmFsaWQs IGl0IHN0aWxsIHdvdWxkIGJlIG5pY2UgdG8gaGF2ZSB0d28gdGhpbmdzOgox KSBtdWx0aWJ5dGUgc3VwcG9ydCBmb3IgbG9jYWxlLXNwZWNpZmljIHJlZ2V4 cHMgbGlrZSBcbSBhbmQgWzphbHBoYTpdCjIpIGJlIGFibGUgdG8gdGVsbCBy ZWdleHAgd2hpY2ggTENfQ1RZUEUgdG8gdXNlIGZvciBzcGVjaWZpYyBpbnZv Y2F0aW9uIGF0Cmxlc3Qgb24gU1FMLXN0YXRlbWVudCBsZXZlbCwgdGhpcyB3 b3VsZCBiZSBleHRyZW1lbHkgdXNlZnVsIGZvcgptdWx0aS1saW5ndWFsIHBy b2plY3RzLCBlLmcuIGRpY3Rpb25hcmllcyAod2hpY2ggaXMgdGhlIHR5cGUg b2YgbXkgcHJvamVjdApCVFcpLCBob3BlZnVsbHkgdGhleSBhcmUgbm90IHRv IHRpZ2h0bHkgY29ubmVjdGVkIHRvIExDX0NUWVBFIG9mIHRoZQpjbHVzdGVy LgpJIHVuZGVyc3RhbmQgdGhvdWdoIHRoYXQgdGhlc2UgdHdvIG5vdCBxdWl0 ZSBqdXN0IGJ1ZyBmaXhlcyBhbmQgd2lsbCByZXF1aXJlCnNvbWUgZWZmb3J0 IHRvIGltcGxlbWVudC4KClRoYW5rcywKQW5kcml5Cg==