Обсуждение: BUG #11523: Regular expressions work differently on different platforms
BUG #11523: Regular expressions work differently on different platforms
От
dmigowski@ikoffice.de
Дата:
The following bug has been logged on the website:
Bug reference: 11523
Logged by: Daniel Migowski
Email address: dmigowski@ikoffice.de
PostgreSQL version: 9.1.2
Operating system: Debian Linux 6.0.6 + Windows 7
Description:
I recently found that regular expressions, or specifically the [:space:]
shorthand escape work differntly on Windows and Linux. On Linux the
non-brakeable space is not included in the shorthand escape, on windows it
is. The following statement is therefore true on Windows and false on
Linux:
select convert_from(E'\\xA0'::bytea,'ISO8859-1') ~ '\s'
This brakes email validation here, and the insert of a linux created backup
into my windows machine. Is it possible to fix that? Is there a reason that
UTF-8 on Linux differs from UTF-8 on Windows?
dmigowski@ikoffice.de writes:
> I recently found that regular expressions, or specifically the [:space:]
> shorthand escape work differntly on Windows and Linux. On Linux the
> non-brakeable space is not included in the shorthand escape, on windows it
> is.
That would depend on what locale you're using for LC_CTYPE. We can't do
much about the fact that locale definitions vary across platforms. In
principle you could use C locale, which *is* standardized, but that cure
may be worse than the disease for your purposes.
You could always spell it out with whatever set of characters you consider
whitespace: [ \t\r\n] or something like that. For purposes like email
address validation, the set of whitespace characters allowed by the
relevant RFCs is probably smaller than most locales' [:space:] anyway.
regards, tom lane