Re: Better locale-specific-character-class handling for regexps

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Better locale-specific-character-class handling for regexps
Дата
Msg-id e2d076ae-4685-f164-5a4a-05e7a0918793@iki.fi
обсуждение исходный текст
Ответ на Re: Better locale-specific-character-class handling for regexps  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Better locale-specific-character-class handling for regexps  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 09/04/2016 08:44 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 08/23/2016 03:54 AM, Tom Lane wrote:
>> +1 for this patch in general. Some regression test cases would be nice.
>
> I'm not sure how to write such tests without introducing insurmountable
> platform dependencies --- particularly on platforms with weak support for
> UTF8 locales, such as OS X.  All the interesting cases require knowing
> what iswalpha() etc will return for some high character codes.
>
> What I did to test it during development was to set MAX_SIMPLE_CHR to
> something in the ASCII range, so that the high-character-code paths could
> be tested without making any assumptions about locale classifications for
> non-ASCII characters.  I'm not sure that's a helpful idea for regression
> testing purposes, though.
>
> I guess I could follow the lead of collate.linux.utf8.sql and produce
> a test that's only promised to pass on one platform with one encoding,
> but I'm not terribly excited by that.  AFAIK that test file does not
> get run at all in the buildfarm or in the wild.

I'm not too worried if the tests don't get run regularly, but I don't 
like the idea that only works on one platform. This code is unlikely to 
be accidentally broken by unrelated changes in the backend, as the 
regexp code is very well isolated. But for someone hacks on the regexp 
library in the future, having a test suite to tickle all these 
corner-cases would be very handy.

Another class of regressions would be that something changes in the way 
a locale treats some characters, and that breaks an application. That 
would be very difficult to test for in a platform-independent way. That 
wouldn't really our bug, though, but the locale's.

Since we're now de facto maintainers of this regexp library, and our 
version could be used somewhere else than PostgreSQL too, it would 
actually be nice to have a regression suite that's independent from the 
pg_regress infrastructure, and wouldn't need a server to run. Perhaps a 
stand-alone C program that compiles the regexp code with mock versions 
of pg_wc_is* functions. Or perhaps a magic collation OID that makes 
pg_wc_is* functions to return hard-coded values for particular inputs.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: LSN as a recovery target
Следующее
От: Haribabu Kommi
Дата:
Сообщение: Re: pg_hba_file_settings view patch