Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
| От | Heikki Linnakangas |
|---|---|
| Тема | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |
| Дата | |
| Msg-id | 6387cb3e-aec8-41a0-acef-bacdbfb435db@iki.fi обсуждение исходный текст |
| Ответ на | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation (Heikki Linnakangas <hlinnaka@iki.fi>) |
| Ответы |
Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
|
| Список | pgsql-bugs |
On 02/12/2025 18:36, Heikki Linnakangas wrote: > On 02/12/2025 18:24, Laurenz Albe wrote: >> On Tue, 2025-12-02 at 10:03 +0000, PG Bug reporting form wrote: >>> PostgreSQL version: 18.1 >>> >>> When using a nondeterministic ICU collation, the replace() function >>> fails to >>> replace a substring when that substring appears at the end of the input >>> string. >>> >>> Occurrences of the same substring earlier in the string are replaced >>> normally. >>> >>> Specific collation used: >>> create collation test_nondeterministic ( >>> provider = icu, >>> locale = 'und-u-ks-level2', >>> deterministic = false >>> ) >>> >>> -- Replace final character under nondeterministic collation >>> SELECT replace( >>> 'testx' COLLATE "test_nondeterministic", >>> 'x' COLLATE "test_nondeterministic", >>> 'y') AS res1; >> >> I can reproduce the problem, and the attached patch fixes it for me. > > +1, looks good to me. Let's also add a regression test for this. I added a simple test for this, and I think this is still not quite right. I added the following to collate.icu.utf test: CREATE TABLE test4nfd (a int, b text); INSERT INTO test4nfd VALUES (1, 'cote'), (2, 'côte'), (3, 'coté'), (4, 'côté'); UPDATE test4nfd SET b = normalize(b, nfd); -- This shows why replace should be greedy. Otherwise, in the NFD -- case, the match would stop before the decomposed accents, which -- would leave the accents in the results. SELECT a, b, replace(b COLLATE ignore_accents, 'co', 'ma') FROM test4; a | b | replace ---+------+--------- 1 | cote | mate 2 | côte | mate 3 | coté | maté 4 | côté | maté (4 rows) SELECT a, b, replace(b COLLATE ignore_accents, 'co', 'ma') FROM test4nfd; a | b | replace ---+------+--------- 1 | cote | mate 2 | côte | mate 3 | coté | maté 4 | côté | maté (4 rows) +-- Test for match at the end of the string. (We had a bug on that +-- once) +SELECT a, b, replace(b COLLATE ignore_accents, 'te', 'ma') FROM test4nfd; + a | b | replace +---+------+--------- + 1 | cote | coma + 2 | côte | coma + 3 | coté | coma + 4 | côté | coma +(4 rows) + In the added test query, the accents on the 'o' are stripped, which doesn't look correct. - Heikki
В списке pgsql-bugs по дате отправления: