Обсуждение: Path case sensitivity on windows
Bug #4694 (http://archives.postgresql.org/message-id/200903050848.n258mVgm046178@wwwmaster.postgresql.org) shows a very strange behaviour on windows when you use a different case PATH >From what I can tell, this is because dir_strcmp() is case sensitive, but paths on windows are really case-insensitive. Attached patch fixes this in my testcase. Can anybody spot something wrong with it? If not, I'll apply once I've finished my test runs:-) //Magnus diff --git a/src/port/path.c b/src/port/path.c index 708306d..d7bd353 100644 --- a/src/port/path.c +++ b/src/port/path.c @@ -427,7 +427,12 @@ dir_strcmp(const char *s1, const char *s2) { while (*s1 && *s2) { +#ifndef WIN32 if (*s1 != *s2 && +#else + /* On windows, paths are case-insensitive */ + if (tolower(*s1) != tolower(*s2) && +#endif !(IS_DIR_SEP(*s1) && IS_DIR_SEP(*s2))) return (int) *s1 - (int) *s2; s1++, s2++;
Magnus Hagander <magnus@hagander.net> writes:
> Attached patch fixes this in my testcase. Can anybody spot something
> wrong with it?
It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?
regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Attached patch fixes this in my testcase. Can anybody spot something >> wrong with it? > > It depends on tolower(), which is going to have LC_CTYPE-dependent > behavior, which is surely wrong? Not sure, really :) That's the encoding we'd get the paths in in the first place, is it not? Or are you just saying we should be using pg_tolower()? (which I forgot about yet again) //Magnus
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> It depends on tolower(), which is going to have LC_CTYPE-dependent
>> behavior, which is surely wrong?
> Or are you just saying we should be using pg_tolower()? (which I forgot
> about yet again)
Well, I'd be happier with pg_tolower, because I know what it does.
But the real question here is what does "case insensitivity" on
file names actually mean in Windows --- ie, what happens to non-ASCII
letters?
regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Tom Lane wrote: >>> It depends on tolower(), which is going to have LC_CTYPE-dependent >>> behavior, which is surely wrong? > >> Or are you just saying we should be using pg_tolower()? (which I forgot >> about yet again) > > Well, I'd be happier with pg_tolower, because I know what it does. > But the real question here is what does "case insensitivity" on > file names actually mean in Windows --- ie, what happens to non-ASCII > letters? The filesystem itself is UTF-16. I would assume the "system default" locale controls the case insensitivity, but I'm not sure about that. Reading up some, it seems the collation is actually stored in a hidden file on the NTFS volume... It seems to differ between different versions of windows from what I can tell, but since this is written to the fs, it's ok. I have not found a way to actually *get* the locale.. Or even to compare two filenames. There is a function called GetFullPathName(), but I'm not sure how to use it for this. However. I don't think it's really critical that we deal with all corner cases for this. It's not likely that the user would be using any really weird locale-specific combinations *differently* in the PATH variable vs the commandline, or something like that... And this only shows up when the binary is found in the PATH and not through a fully specified directory. This is, AFAICT, the only case where they can differ. This is the reason why we haven't had any reports of this before - nobody using the installer, or doing even a "normal style" install would ever end up in this situation. //Magnus
Magnus Hagander <magnus@hagander.net> writes:
> And this only shows up when the binary is found in the PATH and not
> through a fully specified directory. This is, AFAICT, the only case
> where they can differ. This is the reason why we haven't had any reports
> of this before - nobody using the installer, or doing even a "normal
> style" install would ever end up in this situation.
Hmm. Well, if we use pg_tolower then it will only do the right thing
for ASCII letters, but it seems like non-ASCII in the path leading to
the postgres binaries would be pretty dang unusual. (And I am not
convinced tolower() would get it right either --- it certainly won't
if the encoding is multibyte.)
On balance I'd suggest just using pg_tolower and figuring it's close
enough.
regards, tom lane
On Thursday 02 April 2009 18:29:45 Tom Lane wrote: > Hmm. Well, if we use pg_tolower then it will only do the right thing > for ASCII letters, but it seems like non-ASCII in the path leading to > the postgres binaries would be pretty dang unusual. Well, Windows localizes the directory names like C:\Program Files, so it is entirely plausible to have non-ASCII path names across the board in certain locales.