Re: Better locale-specific-character-class handling for regexps

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Better locale-specific-character-class handling for regexps
Дата	5 сентября 2016 г. 10:05:37
Msg-id	e2d076ae-4685-f164-5a4a-05e7a0918793@iki.fi обсуждение исходный текст
Ответ на	Re: Better locale-specific-character-class handling for regexps (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Better locale-specific-character-class handling for regexps (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On 09/04/2016 08:44 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 08/23/2016 03:54 AM, Tom Lane wrote:
>> +1 for this patch in general. Some regression test cases would be nice.
>
> I'm not sure how to write such tests without introducing insurmountable
> platform dependencies --- particularly on platforms with weak support for
> UTF8 locales, such as OS X.  All the interesting cases require knowing
> what iswalpha() etc will return for some high character codes.
>
> What I did to test it during development was to set MAX_SIMPLE_CHR to
> something in the ASCII range, so that the high-character-code paths could
> be tested without making any assumptions about locale classifications for
> non-ASCII characters.  I'm not sure that's a helpful idea for regression
> testing purposes, though.
>
> I guess I could follow the lead of collate.linux.utf8.sql and produce
> a test that's only promised to pass on one platform with one encoding,
> but I'm not terribly excited by that.  AFAIK that test file does not
> get run at all in the buildfarm or in the wild.

I'm not too worried if the tests don't get run regularly, but I don't 
like the idea that only works on one platform. This code is unlikely to 
be accidentally broken by unrelated changes in the backend, as the 
regexp code is very well isolated. But for someone hacks on the regexp 
library in the future, having a test suite to tickle all these 
corner-cases would be very handy.

Another class of regressions would be that something changes in the way 
a locale treats some characters, and that breaks an application. That 
would be very difficult to test for in a platform-independent way. That 
wouldn't really our bug, though, but the locale's.

Since we're now de facto maintainers of this regexp library, and our 
version could be used somewhere else than PostgreSQL too, it would 
actually be nice to have a regression suite that's independent from the 
pg_regress infrastructure, and wouldn't need a server to run. Perhaps a 
stand-alone C program that compiles the regexp code with mock versions 
of pg_wc_is* functions. Or perhaps a magic collation OID that makes 
pg_wc_is* functions to return hard-coded values for particular inputs.

- Heikki

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 05 сентября 2016 г., 10:05:12
Сообщение: Re: LSN as a recovery target

Следующее

От: Haribabu Kommi
Дата: 05 сентября 2016 г., 10:10:20
Сообщение: Re: pg_hba_file_settings view patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Better locale-specific-character-class handling for regexps

Предыдущее

Следующее