Обсуждение: Windows Application Issues | PostgreSQL | REF # 48475607

Поиск
Список
Период
Сортировка

Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

I’m sorry to bother you if you are not the right person. In that case, could you please help to point me in the right direction for the right contact? Thanks a lot!

Hi Team,

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test organization to notify developers when issues have been identified in their applications. We’re reaching out to notify you of a potential issue in one of your applications.

The issue details are below, please review it when you have a moment. Our goal is to work with you to address this issue and to understand what your expected timeline to address this issue might be. If you have any questions about the details below or have already addressed this issue in a forthcoming update, please let me know.

Account

PostgreSQL Global Development Group

Product

PostgreSQL

Reference #

48475607

Issue

Postgres.exe crash observed while installing the application.

 

Environment: Desktop
OS: Windows 11
App Version: 16.1

Repro Steps:

  1. Deploy windows server 2019 Turkey Build.
  2. Patch the machine till WU and enable roles & feature in server manager.
  3. Restart the machine to configure updates & observe winver 17763.5329
  4. Login as Administrator.
  5. Download and install the application: https://www.postgresql.org/download/
  6. Click on Setup and in setup launched click on next.
  7. Click next on installation directory window.
  8. In Select components window make sure all are selected .and click on next .
  9. In data directory window click on next.
  10. In Password window provide the password and click on next.
  11. In port window click on next .
  12. Click on next default in advanced settings ,summary & ready to install window.
  13. While the application is installing ,
  14. Observe..

 

Observations:

Postgres.exe crash is observed while installing the application.

Expected Results:

Should not observe any crash while installing the application

Resource

For any questions on app development (or) submission on windows, contact Windows Dev Center. https://developer.microsoft.com/en-us/windows/support

 

Thanks!

Haifang

 

 

 

Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that
this is the Turkey -> Türkiye issue caused by a recent operating
system upgrade, which is apparently now hitting the server edition of
Windows based on the new inflow of bug reports.  I think we know
approximately how to fix it, and there are several possible
workarounds for a system that is already in this state, but interested
parties who know and care about the relevant OS need to get involved
to make progress.  It's too late for Turkish but ideally we'll be able
to stop this from happening the next time a country changes its name.
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed
fix all the way to the finish line, so all I can do is post what I
know and hope that open source will happen.



Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Thomas,

Thanks a lot for your previous reply. Still need to double check with you:

You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the
changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Thursday, May 2, 2024 4:51 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

[You don't often get email from thomas.munro@gmail.com. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a
recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow
ofbug reports.  I think we know approximately how to fix it, and there are several possible workarounds for a system
thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to
makeprogress.  It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country
changesits name.
 
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can
dois post what I know and hope that open source will happen.
 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Thomas and team,

Some suggestion from our engineers, please take a look and let me know if you have any question:

1. Could you please use _wsetlocale API in case of wide string locale parameter, this is what the recommended way as
parMicrosoft documentation as well. (Reference: setlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170,wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170)
2. Additionally you can work around in case you don’t want to change the API to update their mapping table with
Turkish_Turin case of Turkish_Türkiye as you have done for others locale with same issue as well in your code
 
(reference PostgreSQL Source Code: src/port/win32setlocale.c Source File:
https://doxygen.postgresql.org/win32setlocale_8c_source.html,Line number 66-67)
 

Please let me know if there is any other questions.

Regards!
Haifang

-----Original Message-----
From: Haifang Wang (Centific Technologies Inc)
Sent: Monday, May 13, 2024 11:52 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Hi Thomas,

Thanks a lot for your previous reply. Still need to double check with you:

You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by the
changesmade on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <mailto:thomas.munro@gmail.com>
Sent: Thursday, May 2, 2024 4:51 PM
To: Haifang Wang (Centific Technologies Inc) <mailto:v-haiwang@microsoft.com>
Cc: mailto:pgsql-bugs@lists.postgresql.org
Subject: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

[You don't often get email from mailto:thomas.munro@gmail.com. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification]
 

On Fri, May 3, 2024 at 7:05 AM Haifang Wang (Centific Technologies
Inc) <mailto:v-haiwang@microsoft.com> wrote:
> Deploy windows server 2019 Turkey Build.
...
> Postgres.exe crash is observed while installing the application.

Hi,

"Crash" with no details is not a very useful report, but I guess that this is the Turkey -> Türkiye issue caused by a
recentoperating system upgrade, which is apparently now hitting the server edition of Windows based on the new inflow
ofbug reports.  I think we know approximately how to fix it, and there are several possible workarounds for a system
thatis already in this state, but interested parties who know and care about the relevant OS need to get involved to
makeprogress.  It's too late for Turkish but ideally we'll be able to stop this from happening the next time a country
changesits name.
 
I have written everything I know about the issue here:

https://www.postgresql.org/message-id/flat/18196-b10f93dfbde3d7db%40postgresql.org

Not being a Windows user, I have not been able to take that proposed fix all the way to the finish line, so all I can
dois post what I know and hope that open source will happen.
 

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by
thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future? 

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with
"initdb", unless you request a default locale with --locale, initdb
uses setlocale("") to query the system/user default locale.  It then
records that string in postgresql.conf, and also in the pg_database
catalog.  On POSIX systems, that captures something like "tr-TR.UTF-8"
or similar.  On Windows, that captures something like
"Turkish_Turkey.1254".  Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again.  In rare cases
where a country changes its name, a Windows update *renames* the
locale, and then those calls fail, because the old name is not
recognised anymore.

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to
discover the user account's default locale, because it returns BCP47
names like "tr-TR".  It is probably less likely for a country or
language to change its ISO code (but not impossible[1]), than for the
English-language names of them to change.  Even if you reject this
idea because technically they can both change, there are other reasons
why we should not be storing this "display"-style names anywhere,
including that PostgreSQL needs to store them in a place where the
encoding must be ASCII, which Türkiye is not.  And finally, the
Windows manual explicitly warns us about this[2]: "We don't recommend
this form for locale strings embedded in code or serialized to
storage: These strings are more likely to be changed by an operating
system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL
code was written, so it did what it had to at the time.)

There are further problems to resolve:

1.  We don't know if we should put encodings (AKA codepages?) on the
end of those strings or not.  There is some confusion about what
exactly it does, and how it interacts with the "ACP".  I'd be worried
that if you don't put the endings on, perhaps it can change under your
feet.  (I suspect that part of the discussion on that other thread
took some wrong turns, based on citext_utf8 results that were actually
probably misleading.)
2.  If we do decide to put the encoding suffixes on, which encoding
should we be suggesting?  It appears from anecdotal reports that most
PostgreSQL-on-Windows users are stuck in the past, using the old
pre-UTF-8 language-specific encodings.  Should we be encouraging UTF-8
use by default, if we can?  Maybe that is a separate question.

Then there is the practical question of what to do with an
already-broken system.  One idea would be to introduce a "locale
remapping" file, pgdata/pg_locale.map where you can write things
like "Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate,
rename, etc locales in Windows, so you can re-create the old name.  I
believe that is how some people have fixed their broken databases.  I
don't know if there is a good reference/blog/article on that, that we
should be pointing people towards if they show up with broken systems.

Patches, testing, research are welcome!  Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually
use/know Windows.  Even if I had been gung-ho about committing
that in 16 without feedback from Windows users, it would have been too
late to help Turkish users with existing databases.

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Thomas,

Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again.

I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test
organizationto notify developers when issues have been identified in their applications. The issue I reported in this
mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously,
itis caused by recent operating system upgrade.
 

The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale API.
thisis what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you
havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code:
src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number
66-67).

Please let me know if there is any misunderstanding.

Thanks!
Haifang


-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 13, 2024 3:12 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 6:51 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> You said the Turkey -> Türkiye issue caused by a recent operating system upgrade, you mean this crash is caused by
thechanges made on Windows side? If that is the case, you prefer to leave the bug there? Any plan for the future?
 

First, let me restate the problem:

When you create a database cluster (= a PostgreSQL instance) with "initdb", unless you request a default locale with
--locale,initdb uses setlocale("") to query the system/user default locale.  It then records that string in
postgresql.conf,and also in the pg_database catalog.  On POSIX systems, that captures something like "tr-TR.UTF-8"
 
or similar.  On Windows, that captures something like "Turkish_Turkey.1254".  Later, PostgreSQL uses newlocale() or
setlocale() functions to access that locale again.  In rare cases where a country changes its name, a Windows update
*renames*the locale, and then those calls fail, because the old name is not recognised anymore.
 

I proposed a partial solution that should help avoid the problem in future:

I think that initdb should instead call GetUserDefaultLocaleName() to discover the user account's default locale,
becauseit returns BCP47 names like "tr-TR".  It is probably less likely for a country or language to change its ISO
code(but not impossible[1]), than for the English-language names of them to change.  Even if you reject this idea
becausetechnically they can both change, there are other reasons why we should not be storing this "display"-style
namesanywhere, including that PostgreSQL needs to store them in a place where the encoding must be ASCII, which Türkiye
isnot.  And finally, the Windows manual explicitly warns us about this[2]: "We don't recommend this form for locale
stringsembedded in code or serialized to
 
storage: These strings are more likely to be changed by an operating system update than the locale name form."

(Note: the BCP47 support in Windows did not exist when that PostgreSQL code was written, so it did what it had to at
thetime.)
 

There are further problems to resolve:

1.  We don't know if we should put encodings (AKA codepages?) on the end of those strings or not.  There is some
confusionabout what exactly it does, and how it interacts with the "ACP".  I'd be worried that if you don't put the
endingson, perhaps it can change under your feet.  (I suspect that part of the discussion on that other thread took
somewrong turns, based on citext_utf8 results that were actually probably misleading.) 2.  If we do decide to put the
encodingsuffixes on, which encoding should we be suggesting?  It appears from anecdotal reports that most
PostgreSQL-on-Windowsusers are stuck in the past, using the old
 
pre-UTF-8 language-specific encodings.  Should we be encouraging UTF-8 use by default, if we can?  Maybe that is a
separatequestion.
 

Then there is the practical question of what to do with an already-broken system.  One idea would be to introduce a
"localeremapping" file, pgdata/pg_locale.map where you can write things like
"Turkish_Turkey.1254"="Turkish_Türkiye.1254".

I think there might be a GUI tool that allows you to duplicate, rename, etc locales in Windows, so you can re-create
theold name.  I believe that is how some people have fixed their broken databases.  I don't know if there is a good
reference/blog/articleon that, that we should be pointing people towards if they show up with broken systems.
 

Patches, testing, research are welcome!  Even though I put forward that
BCP47 idea, it was based on reading the manual, so the "unresolved"
questions may in fact be very easy to resolve by people who actually use/know Windows.  Even if I had been gung-ho
aboutcommitting that in 16 without feedback from Windows users, it would have been too late to help Turkish users with
existingdatabases.
 

[1] https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
[2]
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, May 14, 2024 at 10:27 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Thanks for your reply. But I'm not sure if there is any miscommunication. Let me make it clear again.

(Sorry, it seems our emails crossed.)

> I’m a program manager in the Windows App Assure ISV Outreach Team at Microsoft. We work with Microsoft’s test
organizationto notify developers when issues have been identified in their applications. The issue I reported in this
mailis an issue we found in our testing and I believe it also impact a lot of end users. Like you mentioned previously,
itis caused by recent operating system upgrade. 

Thanks for doing that, and yes, it affects a lot of users, and this is
not the first time.  It is still possible for it to be the last...

> The solution I shared in my last email was suggested our engineers. It would be great is you could use _wsetlocale
API.this is what the recommended way as par Microsoft documentation as well. (Reference: setlocale wsetlocale:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170

I don't understand.  Can you explain why _wsetlocale() is better than
setlocale()?  They behave identically except one takes wide
characters, which doesn't seem to solve any problem we have.

> If you don’t want to change the API to update their mapping table with Turkish_Tur in case of Turkish_Türkiye as you
havedone for others locale with same issue as well in your code (reference PostgreSQL Source Code:
src/port/win32setlocale.cSource File: https://doxygen.postgresql.org/win32setlocale_8c_source.html, Line number 66-67). 

Yeah, OK we could put more kludges into win32setlocale.c.  I don't
mind committing a patch like that if it addresses the issue.  I am not
in a position to confirm that myself... what we need is someone who
works with Windows to write the patch and test it across that upgrade.
Longer term I'm looking for something better than that though, because
it doesn't address the root cause (need for stable identifiers), and
will only ever allow us to fix problems with the old unstable names
*after* users complain that their database is dead, 3-6 months after
in fact due to release cycles.  I think a dynamic mapping file might
be better?  (Maybe win32locale.c should be able to read that kludge
table from a file that you can give it with an environment variable,
or something like that?)



Thomas Munro <thomas.munro@gmail.com> writes:
> Longer term I'm looking for something better than that though, because
> it doesn't address the root cause (need for stable identifiers), and
> will only ever allow us to fix problems with the old unstable names
> *after* users complain that their database is dead, 3-6 months after
> in fact due to release cycles.  I think a dynamic mapping file might
> be better?  (Maybe win32locale.c should be able to read that kludge
> table from a file that you can give it with an environment variable,
> or something like that?)

+1 for the long-term solution being more-stable locale identifiers.
However, we should try to build something that will let users get
out of these situations with the existing identifiers, so I like
your idea of a plain-text mapping file for Windows locale names.
I don't think an environment variable is necessary; just define
a fixed name "$PGDATA/locale_map.txt" or such.  If that file
exists, just read it and map the pg_database field values with it.

Maybe this shouldn't even be Windows-specific?  Are there any
cases where it'd save people's bacon on other platforms?

            regards, tom lane



Just curious. Why a plain text file rather than a system table? 

On Mon, May 13, 2024, 18:07 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
> Longer term I'm looking for something better than that though, because
> it doesn't address the root cause (need for stable identifiers), and
> will only ever allow us to fix problems with the old unstable names
> *after* users complain that their database is dead, 3-6 months after
> in fact due to release cycles.  I think a dynamic mapping file might
> be better?  (Maybe win32locale.c should be able to read that kludge
> table from a file that you can give it with an environment variable,
> or something like that?)

+1 for the long-term solution being more-stable locale identifiers.
However, we should try to build something that will let users get
out of these situations with the existing identifiers, so I like
your idea of a plain-text mapping file for Windows locale names.
I don't think an environment variable is necessary; just define
a fixed name "$PGDATA/locale_map.txt" or such.  If that file
exists, just read it and map the pg_database field values with it.

Maybe this shouldn't even be Windows-specific?  Are there any
cases where it'd save people's bacon on other platforms?

                        regards, tom lane


John McKown <john.archie.mckown@gmail.com> writes:
> Just curious. Why a plain text file rather than a system table?

Because you'd have no way to update such a table, if you can't
start the database or connect to it.  So that approach isn't
suitable for people whose database has been broken by one of
these system updates.  (This is largely the same reason why,
eg, postgresql.conf isn't a table.)

            regards, tom lane



Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get
> out of these situations with the existing identifiers, so I like
> your idea of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define
> a fixed name "$PGDATA/locale_map.txt" or such.  If that file
> exists, just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that
I don't do Windows, I just know that this passes CI and that the code
works the way I intended on my local Unix system if extracted into a
little harness).  With this, you could in theory create a file
PGDATA/win32setlocale.map containing:

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by
default (though they got a bit shorter 'cause I invented some
wildcards).  Some problems came up while wondering how to fit Türkiye
into the defaults, and how to back-patch:

1.  In the back-branches, we claim to support ancient Windows releases
as far back as "Windows 2000 SP4" (!), which obviously aren't getting
the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there
and generally before Windows 10.  And even if you exclude the
extremities of our support window somehow (how?), modern systems might
not have applied the update yet (IIUC they *have* to at some point
under the new world order, so there is a defined window of version
skew these days).

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded
in the current ACP ("active codepage", system-wide changeable setting
that controls char↔wchar_t conversion in system APIs), so the encoding
of that file (and the built-in default table) would need to match that
to work, as coded.  Perhaps it would be possible to make the mapping
file UTF-8 and transform that to ACP!  But it feels a bit too loopy
for me, and on the PostgreSQL side it is undefined/illegal whatever
you choose in PostgreSQL due to being accessed from different
databases which are using potentially different encodings that are
only required to be a superset of ASCII.  Avoid.

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the
replacement string.  But what is the oldest Windows release that can
understand a BCP47 code like that?

4.  Conversely, on modern systems, I'm still not entirely sure that
"tr-TR.1254" is exactly the same thing as "Turkish-XXX.1254" and that
it's OK to put ".1254" on the end like that.  Is it, and is it?  I
don't mean just "does it mean Turkish?", I mean "does it give exactly
the same answer for every conceivable pair of strings when compared
with strcoll_l(), and likewise for the ctype-based functions like
towlower() et al".

If the answers are not in our favour, I guess we could leave the
default behaviour unchanged, and let people set up a text file as
shown above to fix their database if they want, but that's also not
very nice and kinda weird (helping hypothetical users of museum-grade
systems by leaving real users' systems broken).

If the answer to 4 is yes, yes then we could also push ahead with the
plan to make initdb pick BCP47 names by default in PG18 (or even 17).

> Maybe this shouldn't even be Windows-specific?  Are there any
> cases where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so
it's no like POSIX locale names are set in stone under all
circumstances.  But on Unixen it's all just files in practice, you can
always just symlink them, move them around, compile them yourself from
sources, etc, if you really have to, so I think I'd rather contain the
crazy in win32*.c.

Вложения

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, May 13, 2024 11:38 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get out 
> of these situations with the existing identifiers, so I like your idea 
> of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define a 
> fixed name "$PGDATA/locale_map.txt" or such.  If that file exists, 
> just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes
CIand that the code works the way I intended on my local Unix system if extracted into a little harness).  With this,
youcould in theory create a file PGDATA/win32setlocale.map containing:
 

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I
inventedsome wildcards).  Some problems came up while wondering how to fit Türkiye into the defaults, and how to
back-patch:

1.  In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which
obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before
Windows10.  And even if you exclude the extremities of our support window somehow (how?), modern systems might not have
appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of
versionskew these days).
 

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide
changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in
defaulttable) would need to match that to work, as coded.  Perhaps it would be possible to make the mapping file UTF-8
andtransform that to ACP!  But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal
whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different
encodingsthat are only required to be a superset of ASCII.  Avoid.
 

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string.  But what is the oldest Windows
releasethat can understand a BCP47 code like that?
 

4.  Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as
"Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that.  Is it, and is it?  I don't mean just "does it
meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with
strcoll_l(),and likewise for the ctype-based functions like
 
towlower() et al".

If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a
textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping
hypotheticalusers of museum-grade systems by leaving real users' systems broken).
 

If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default
inPG18 (or even 17).
 

> Maybe this shouldn't even be Windows-specific?  Are there any cases 
> where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone
underall circumstances.  But on Unixen it's all just files in practice, you can always just symlink them, move them
around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in
win32*.c.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Thomas and Tom,

Please let us know your question or concern about the bug we discuss in this mail.

Thanks!
Haifang 

-----Original Message-----
From: Haifang Wang (Centific Technologies Inc) 
Sent: Wednesday, May 15, 2024 4:47 PM
To: Thomas Munro <thomas.munro@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Cc: pgsql-bugs@lists.postgresql.org
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Thanks for your questions, Thomas and Tom. + @Vishwa to help with technical questions.

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 13, 2024 11:38 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 14, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for the long-term solution being more-stable locale identifiers.
> However, we should try to build something that will let users get out 
> of these situations with the existing identifiers, so I like your idea 
> of a plain-text mapping file for Windows locale names.
> I don't think an environment variable is necessary; just define a 
> fixed name "$PGDATA/locale_map.txt" or such.  If that file exists, 
> just read it and map the pg_database field values with it.

OK, I tried that, first draft attached (with my standard proviso that I don't do Windows, I just know that this passes
CIand that the code works the way I intended on my local Unix system if extracted into a little harness).  With this,
youcould in theory create a file PGDATA/win32setlocale.map containing:
 

c Turkish_Turkey.1254=Turkish_Türkiye.1254

... or perhaps more likely:

c Turkish_Turkey.1254=tr-TR.1254

I also absorbed the pre-existing kludge table into the new system by default (though they got a bit shorter 'cause I
inventedsome wildcards).  Some problems came up while wondering how to fit Türkiye into the defaults, and how to
back-patch:

1.  In the back-branches, we claim to support ancient Windows releases as far back as "Windows 2000 SP4" (!), which
obviouslyaren't getting the Windows updates, so I guess "Turkish_Türkiye.1254" will fail there and generally before
Windows10.  And even if you exclude the extremities of our support window somehow (how?), modern systems might not have
appliedthe update yet (IIUC they *have* to at some point under the new world order, so there is a defined window of
versionskew these days).
 

2.  It's generally a terrible idea to be using "ü" in a locale name.
FWIW I assume setlocale() actually accepts and returns names encoded in the current ACP ("active codepage", system-wide
changeablesetting that controls char↔wchar_t conversion in system APIs), so the encoding of that file (and the built-in
defaulttable) would need to match that to work, as coded.  Perhaps it would be possible to make the mapping file UTF-8
andtransform that to ACP!  But it feels a bit too loopy for me, and on the PostgreSQL side it is undefined/illegal
whateveryou choose in PostgreSQL due to being accessed from different databases which are using potentially different
encodingsthat are only required to be a superset of ASCII.  Avoid.
 

3.  Therefore you'd probably want to prefer "tr-TR.1254" as the replacement string.  But what is the oldest Windows
releasethat can understand a BCP47 code like that?
 

4.  Conversely, on modern systems, I'm still not entirely sure that "tr-TR.1254" is exactly the same thing as
"Turkish-XXX.1254"and that it's OK to put ".1254" on the end like that.  Is it, and is it?  I don't mean just "does it
meanTurkish?", I mean "does it give exactly the same answer for every conceivable pair of strings when compared with
strcoll_l(),and likewise for the ctype-based functions like
 
towlower() et al".

If the answers are not in our favour, I guess we could leave the default behaviour unchanged, and let people set up a
textfile as shown above to fix their database if they want, but that's also not very nice and kinda weird (helping
hypotheticalusers of museum-grade systems by leaving real users' systems broken).
 

If the answer to 4 is yes, yes then we could also push ahead with the plan to make initdb pick BCP47 names by default
inPG18 (or even 17).
 

> Maybe this shouldn't even be Windows-specific?  Are there any cases 
> where it'd save people's bacon on other platforms?

Good question.  Sometimes ISO code go away or countries split etc, so it's no like POSIX locale names are set in stone
underall circumstances.  But on Unixen it's all just files in practice, you can always just symlink them, move them
around,compile them yourself from sources, etc, if you really have to, so I think I'd rather contain the crazy in
win32*.c.

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Sat, May 18, 2024 at 10:25 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Please let us know your question or concern about the bug we discuss in this mail.

Thanks for your help with this!  Did I miss an email in between?  Here
are some unanswered questions that seem to be stopping progress:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?
2.  If we translate to BCP47 automatically, should we put the ".1452"
on the end?  What does it mean exactly?  What does it mean if you
don't put it there?
3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?

With answers to those questions we might be able to ship some nice
built-in translations to get users out of this jam.  If there are
issues on those points, we might have to face some questions about
what the encoding is of the "Turkish_Türkiye.1254" string itself,
which is tricky for us for technical reasons, among other problems...
not sure.



Thomas Munro <thomas.munro@gmail.com> writes:
> With answers to those questions we might be able to ship some nice
> built-in translations to get users out of this jam.  If there are
> issues on those points, we might have to face some questions about
> what the encoding is of the "Turkish_Türkiye.1254" string itself,
> which is tricky for us for technical reasons, among other problems...
> not sure.

TBH, my idea of how this should go was to *not* ship any built-in
translations, or indeed any translation file at all (so I didn't
like your moving some existing hacks into that file).  If we
approach it like that, then individual users who hit this problem
are responsible for creating their own translation file, which
will automatically use whatever is the locally-appropriate
encoding.  Sure, it's less transparent for affected users, but
it will work for them which other approaches might not; and
evidence so far is that there's not a huge number of affected
users.

            regards, tom lane



Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > With answers to those questions we might be able to ship some nice
> > built-in translations to get users out of this jam.  If there are
> > issues on those points, we might have to face some questions about
> > what the encoding is of the "Turkish_Türkiye.1254" string itself,
> > which is tricky for us for technical reasons, among other problems...
> > not sure.
>
> TBH, my idea of how this should go was to *not* ship any built-in
> translations, or indeed any translation file at all (so I didn't
> like your moving some existing hacks into that file).  If we
> approach it like that, then individual users who hit this problem
> are responsible for creating their own translation file, which
> will automatically use whatever is the locally-appropriate
> encoding.  Sure, it's less transparent for affected users, but
> it will work for them which other approaches might not; and
> evidence so far is that there's not a huge number of affected
> users.

Hmm, yeah I guess we could just ship a patch like what I posted
already, and let users figure out what works.  I would still like to
know the answer to those questions so we can offer good advice.  If
the answers are all yes then I think we can say "the file is in
encoded in ASCII; use wildcards to deal with any legacy non-ASCII
locale names on the left, and use BCP47 names on the right" and begin
expunging the bad old names from our universe.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Thomas,

What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

NOTE: this email is plain text format, not easy to read. And also the previous email content usually is not included in
thelatest email, we only could see the most latest one. 
 

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Friday, May 17, 2024 10:57 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>;
pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Sat, May 18, 2024 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > With answers to those questions we might be able to ship some nice 
> > built-in translations to get users out of this jam.  If there are 
> > issues on those points, we might have to face some questions about 
> > what the encoding is of the "Turkish_Türkiye.1254" string itself, 
> > which is tricky for us for technical reasons, among other problems...
> > not sure.
>
> TBH, my idea of how this should go was to *not* ship any built-in 
> translations, or indeed any translation file at all (so I didn't like 
> your moving some existing hacks into that file).  If we approach it 
> like that, then individual users who hit this problem are responsible 
> for creating their own translation file, which will automatically use 
> whatever is the locally-appropriate encoding.  Sure, it's less 
> transparent for affected users, but it will work for them which other 
> approaches might not; and evidence so far is that there's not a huge 
> number of affected users.

Hmm, yeah I guess we could just ship a patch like what I posted already, and let users figure out what works.  I would
stilllike to know the answer to those questions so we can offer good advice.  If the answers are all yes then I think
wecan say "the file is in encoded in ASCII; use wildcards to deal with any legacy non-ASCII locale names on the left,
anduse BCP47 names on the right" and begin expunging the bad old names from our universe.
 

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole
thread, so here is this whole thread in our project email archive:


https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com

But let me ask the questions again, with some motivation/reason I want
to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL
versions, for example PostgreSQL 12, are expected to run on old
versions of Windows from long before Windows 10, so we might have to
consider this.  However, if we go with Tom's idea that we do nothing
by default but just allow users to supply their own optional mapping
file, then this question becomes unimportant, users can figure out for
themselves whether it works, and presumably only 10+ got the update
that renamed Turkey to Türkiye.  [And in reality, I hope/expect that
no one really does run old out-of-support OSes, because that's crazy,
but I'm not allowed to assume...])

2.  If we translate to BCP47 locale names like "tr-TR" automatically,
should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if
you don't put it on, the encoding in "char"-based functions is the
"ACP".  What I really want to know is, can it be different from the
"ACP", and if it is, which functions does it affect?  For example if
the ACP is 1521 and I call _tolower_l() giving it a locale_t that I
opened with "en-US.UTF-8", what happens?  I am sure this is a simple
question but we are not Windows programmers, you are the first person
to show up offering to investigate, and I personally found the docs a
bit light on the topic.)

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In
other words, is it *exactly the same code and driving data*, just
using different labels?  Or is it a new locale implementation that
could differ arbitrarily in behaviour?  If the answer is yes, it's
just a new naming scheme, then life will be much much simpler for our
users, but if not, then indexes might be corrupted if we tell people
to switch to the new BCP47 names, and so we'd better know about that,
so we can adjust our advice to users.)



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our
projectemail archive:
 


https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected
torun on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go
withTom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this
questionbecomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the
updatethat renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support
OSes,because that's crazy, but I'm not allowed to assume...])
 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does
itmean exactly?
 
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based
functionsis the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which
functionsdoes it affect?  For example if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened
with"en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the
firstperson to show up offering to investigate, and I personally found the docs a bit light on the topic.)
 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and
drivingdata*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in
behaviour? If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but
ifnot, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know
aboutthat, so we can adjust our advice to users.)
 

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Vishwa Deepak
Дата:

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.


Thanks & Regards
Vishwa


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.
learn.microsoft.com



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     
Vishwa


Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Rahul Pandey
Дата:
Thanks, Vishwa, for tagging me. 
Adding my two cents,

Hello Thomas,

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 


2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

Thanks,
Rahul




From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey
Adding my inline comment to best of my knowledge.


Thanks & Regards
Vishwa


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details
The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.
learn.microsoft.com



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)
Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     
Vishwa


RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

Thanks Vishwa for all the clarification below.

 

Hi @Rahul and everyone,

 

Is there anything else not clear? Is there any solution for the issue?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Vishwa, for tagging me. 

Adding my two cents,

 

Hello Thomas,

 

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.

May be added expert from feature team can validate this assumption.

 

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 

 

 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags

 

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

 

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

 

Thanks,

Rahul

 

 


From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey

Adding my inline comment to best of my knowledge.

 

 

Thanks & Regards
Vishwa

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.

learn.microsoft.com

 



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  



Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     

Vishwa

 

 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

Hi Thomas and Team,

 

Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

 

Thanks!
Haifang

 

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Tuesday, May 28, 2024 11:21 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks Vishwa for all the clarification below.

 

Hi @Rahul and everyone,

 

Is there anything else not clear? Is there any solution for the issue?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, May 22, 2024 4:37 AM
To: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Vishwa, for tagging me. 

Adding my two cents,

 

Hello Thomas,

 

1.  What is the oldest Windows release that can understand the "new" BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Vishwa: Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.

May be added expert from feature team can validate this assumption.

 

Rahul: Locale names based on BCP 47 were first introduced in Windows Vista timeframe, so using them should be pretty safe for most modern and older versions (unless its XP or earlier). 

 

 

2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

Vishwa: It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

https://learn.microsoft.com/en-us/windows/uwp/app-resources/how-rms-matches-lang-tags

 

Rahul: For the first part, your assumption is correct. The behaviour for "tr-TR" and "tr-TR.ACP" would be same and it would try to use the default ANSI Code Page for Turkish (which happens to be 1254). Using any other code page (for example"tr-TR.1252") would use that codepage (1252: English). For the second part (example), I am not sure if I understand the question completely, but mixing the encoding is almost never a good idea and could lead to mojibaked strings in the worst case to no change (if strings only contain ASCII chars) in the best-case scenario.

 

3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

 

Vishwa: Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Rahul: I agree with Vishwa. the locale is the same, just the name of the country in English is changed. Rest all data is the same.

 

Thanks,

Rahul

 

 


From: Vishwa Deepak <Vishwa.Deepak@microsoft.com>
Sent: Tuesday, May 21, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>; Rahul Pandey <pandeyrah@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

 

I am adding experts in this thread to address these queries. @Amy Wishnousky @Shawn Steele @Rahul Pandey

Adding my inline comment to best of my knowledge.

 

 

Thanks & Regards
Vishwa

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, May 21, 2024 5:02 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Vishwa,

Could you please help with the questions below?

Thanks!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, May 20, 2024 1:57 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, May 21, 2024 at 8:17
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> What questions do you have? Could you please list them clearly so that Vishwa could help to answer?

I already did, twice, but perhaps Vishwa or others can't see the whole thread, so here is this whole thread in our project email archive:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2Fflat%2FPH8PR21MB3902F334A3174C54058F792CE5182%2540PH8PR21MB3902.namprd21.prod.outlook.com&data=05%7C02%7CVishwa.Deepak%40microsoft.com%7Cd5972d3ecb084308e18d08dc79252cea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638518447836539330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ELfVuQ62kphneTt0ES5v2iPOhlXH2OtxU0EGBre6BG8%3D&reserved=0

But let me ask the questions again, with some motivation/reason I want to know in parentheses:

1.  What is the oldest Windows release that can understand the "new"
BCP47 locale names, like "tr-TR" or "tr-TR.1452"?  (Some PostgreSQL versions, for example PostgreSQL 12, are expected to run on old versions of Windows from long before Windows 10, so we might have to consider this.  However, if we go with Tom's idea that we do nothing by default but just allow users to supply their own optional mapping file, then this question becomes unimportant, users can figure out for themselves whether it works, and presumably only 10+ got the update that renamed Turkey to Türkiye.  [And in reality, I hope/expect that no one really does run old out-of-support OSes, because that's crazy, but I'm not allowed to assume...])

Its difficult for me to answer this question with accuracy. I can see BCP47 related code in win8, so my assumption is that window 8 and above will support it.
May be added expert from feature team can validate this assumption.



2.  If we translate to BCP47 locale names like "tr-TR" automatically, should we put the ".1452" on the end?  What does it mean exactly?
What does it mean if you don't put it there?  (I could guess that if you don't put it on, the encoding in "char"-based functions is the "ACP".  What I really want to know is, can it be different from the "ACP", and if it is, which functions does it affect?  For example, if the ACP is 1521 and I call _tolower_l() giving it a locale_t that I opened with "en-US.UTF-8", what happens?  I am sure this is a simple question but we are not Windows programmers, you are the first person to show up offering to investigate, and I personally found the docs a bit light on the topic.)

 

It tries to figure out the best match for the given input locale. Below link explain in more detail. As far as codepage part is concerned, may be expert add the details

The previous topic (How the Resource Management System matches and chooses resources) looks at qualifier-matching in general. This topic focuses on language-tag-matching in more detail.

learn.microsoft.com

 



3.  Do the new BCP47 locale names give *exactly* the same results for
strcoll() and tolower() etc, as the old "Turkish*" style names?  (In other words, is it *exactly the same code and driving data*, just using different labels?  Or is it a new locale implementation that could differ arbitrarily in behaviour?  If the answer is yes, it's just a new naming scheme, then life will be much much simpler for our users, but if not, then indexes might be corrupted if we tell people to switch to the new BCP47 names, and so we'd better know about that, so we can adjust our advice to users.)

Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü  

 

Please do proper due diligence at your end before proceeding with any kind of mapping.

Regards     

Vishwa

 

 

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below.
Didyou get chance to do investigations? Any other questions? 

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was
travelling and at a conference last week (at which one of the topics
was "what are we going to do about the sorry state of PostgreSQL on
Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I
wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is
changed which included replcement of u with ü"

That sounds like it is answering the question 'Do
"Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?',
which I had already assumed to be true, but I was actually asking 'Do
""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all
behave identically?'.  It sounds like the answer is probably yes, but
the mention of "u" vs "ü" implied a narrower answer than I was looking
for...  It's important, because we're proposing to translate to
"tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I
think my next move will be to try to make a build option that can also
do locale name renaming on Unix, so that I have something that I could
test myself and push for the next release of PostgreSQL which will be
in October.

One open question is whether we should ship a translation file
ourselves.  I have heard two opinions: Tom Lane proposed in this
thread that there should be no file initially, we let the user figure
out what to put in there.  Magnus Hagander proposed at the pgconf.dev
conference last week that we should ideally ship a complete
translation file and maintain it over time, and it should be installed
not in pgdata but rather in the install directory.  I am not
personally able or willing to maintain such a file, I'm only offering
to supply the C code to read it and do what it says, but perhaps a
group such as the EDB Installer maintainer group, or at least someone
with a vested interest in PostgreSQL on Windows, might like to own
that job.



Thomas Munro <thomas.munro@gmail.com> writes:
> One open question is whether we should ship a translation file
> ourselves.  I have heard two opinions: Tom Lane proposed in this
> thread that there should be no file initially, we let the user figure
> out what to put in there.  Magnus Hagander proposed at the pgconf.dev
> conference last week that we should ideally ship a complete
> translation file and maintain it over time, and it should be installed
> not in pgdata but rather in the install directory.

FWIW, I'm kind of down on the latter approach, because I don't think
it'll move the needle very far.  Based on track record so far, there's
no chance that we will be aware of a Microsoft locale renaming before
it starts breaking users' databases.  Therefore, "edit the translation
file" is going to have to be a documented process in any case, because
affected users are not going to want to wait around for our next
release for a fix.  Also, if people do have to do that, it doesn't
seem like a great idea to tell them to modify an installed file rather
than a cluster-local configuration file.  What if they do a minor
version update but the minor version doesn't (yet) contain the fix?

Admittedly, the installed-file approach could make it more transparent
for people who'd done a PG minor update before the relevant Windows
update.  I'm not sure how large that set of people will be, though.

            regards, tom lane



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below.
Didyou get chance to do investigations? Any other questions?
 

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one
ofthe topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic
thanthis thread...)
 

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave
identically?',which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254",
"Turkish_Türkiye.1254"and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the
mentionof "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to
translateto "tr-TR.1254".
 

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build
optionthat can also do locale name renaming on Unix, so that I have something that I could test myself and push for the
nextrelease of PostgreSQL which will be in October.
 

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed
inthis thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander
proposedat the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it
overtime, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or
willingto maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a
groupsuch as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows,
mightlike to own that job.
 

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous
discussionare in different mail thread. It is not easy to track. 😊
 

Regards!
Haifang

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us> 
Sent: Monday, June 3, 2024 3:55 PM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Rahul Pandey <pandeyrah@microsoft.com>; Vishwa
Deepak<Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
MagnusHagander <magnus@hagander.net>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

Thomas Munro <thomas.munro@gmail.com> writes:
> One open question is whether we should ship a translation file 
> ourselves.  I have heard two opinions: Tom Lane proposed in this 
> thread that there should be no file initially, we let the user figure 
> out what to put in there.  Magnus Hagander proposed at the pgconf.dev 
> conference last week that we should ideally ship a complete 
> translation file and maintain it over time, and it should be installed 
> not in pgdata but rather in the install directory.

FWIW, I'm kind of down on the latter approach, because I don't think it'll move the needle very far.  Based on track
recordso far, there's no chance that we will be aware of a Microsoft locale renaming before it starts breaking users'
databases. Therefore, "edit the translation file" is going to have to be a documented process in any case, because
affectedusers are not going to want to wait around for our next release for a fix.  Also, if people do have to do that,
itdoesn't seem like a great idea to tell them to modify an installed file rather than a cluster-local configuration
file. What if they do a minor version update but the minor version doesn't (yet) contain the fix?
 

Admittedly, the installed-file approach could make it more transparent for people who'd done a PG minor update before
therelevant Windows update.  I'm not sure how large that set of people will be, though.
 

            regards, tom lane

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, Jun 4, 2024 at 12:13 PM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Thanks, Tom. Is there any suggestion on how to track all the conversations in one thread? Seems like all our previous
discussionare in different mail thread. It is not easy to track. 😊 

FWIW this shows up as one thread in my email client, and in the
PostgreSQL archives website[1].  I don't know much about email and
which RFCs or conventions are at work here, but apparently different
clients are using different techniques to identify threads.  I assume
it could be done with thread headers (as seen in this thread) or
reply-to chains or fuzzy recognition of subject etc.  Given that
Outlook seems to ignore the "> " inline response quoting convention
used by the rest of the internet, it wouldn't surprise me to hear that
it also doesn't follow the conventions for recognising threads either,
being a sort of related topic.

[1]
https://www.postgresql.org/message-id/flat/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com



Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Rahul Pandey
Дата:
Hi,

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
You are correct, they all behave in exactly the same way. 
I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

As for translation files, apologies I neither have a preference nor expertise to comment on this.


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607
 
Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00 AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Wed, Jun 5, 2024 at 9:10 PM Rahul Pandey <pandeyrah@microsoft.com> wrote:
> "Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"
> You are correct, they all behave in exactly the same way.
> I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the
implementationmore future proof. 

Thanks for confirming.  More soon.



RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:

 

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Monday, June 17, 2024 2:06 PM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi all,

 

Still need to double check with you about this bug. How is it going? Any other information needed?

 

Thanks!
Haifang

 

From: Haifang Wang (Centific Technologies Inc)
Sent: Thursday, June 13, 2024 11:47 AM
To: Rahul Pandey <pandeyrah@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi Thomas and Tom,

 

Just would like to follow up you about the bug. Is there any update? Any other information needed?

 

Thanks!
Haifang

 

From: Rahul Pandey <pandeyrah@microsoft.com>
Sent: Wednesday, June 5, 2024 2:11 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>; Thomas Munro <thomas.munro@gmail.com>
Cc: Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>;
🎯dev <targetdev@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Hi,

 

"Do "Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?"

You are correct, they all behave in exactly the same way. 

I too would recommend using "tr-TR.1254" since BCP47 tags have a defined structure and this would make the implementation more future proof.

 

As for translation files, apologies I neither have a preference nor expertise to comment on this.

 


From: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Sent: Tuesday, June 4, 2024 5:39 AM
To: Thomas Munro <thomas.munro@gmail.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

 

Thanks, Thomas. I would leave the questions to Vishwa and Rahul to answer.

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, June 3, 2024 2:22 PM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele <Shawn.Steele@microsoft.com>; Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org; Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 1:00
AM Haifang Wang (Centific Technologies
Inc) <v-haiwang@microsoft.com> wrote:
> Just would like to follow up with you about this bug. Hope Rahul and Vishwa have answered all your questions below. Did you get chance to do investigations? Any other questions?

Hi,

Thanks for all your feedback!  Sorry for my late replies, I was travelling and at a conference last week (at which one of the topics was "what are we going to do about the sorry state of PostgreSQL on Windows", but that's a wider topic than this thread...)

The answers are all useful, thank you.  There was just one thing I wanted to clarify.  Vishwa said:

"Yes, its exactly the same code and driving data , only spelling is changed which included replcement of u with ü"

That sounds like it is answering the question 'Do "Turkish_Turkey.1254" and "Turkish_Türkiye.1254" behave identically?', which I had already assumed to be true, but I was actually asking 'Do ""Turkish_Turkey.1254", "Turkish_Türkiye.1254" and "tr-TR.1254" all behave identically?'.  It sounds like the answer is probably yes, but the mention of "u" vs "ü" implied a narrower answer than I was looking for...  It's important, because we're proposing to translate to "tr-TR.1254".

Since no one has come forward to test the patch I wrote on Windows, I think my next move will be to try to make a build option that can also do locale name renaming on Unix, so that I have something that I could test myself and push for the next release of PostgreSQL which will be in October.

One open question is whether we should ship a translation file ourselves.  I have heard two opinions: Tom Lane proposed in this thread that there should be no file initially, we let the user figure out what to put in there.  Magnus Hagander proposed at the pgconf.dev conference last week that we should ideally ship a complete translation file and maintain it over time, and it should be installed not in pgdata but rather in the install directory.  I am not personally able or willing to maintain such a file, I'm only offering to supply the C code to read it and do what it says, but perhaps a group such as the EDB Installer maintainer group, or at least someone with a vested interest in PostgreSQL on Windows, might like to own that job.

Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
Thomas Munro
Дата:
On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Since no one has come forward to test the patch I wrote on Windows, I
> think my next move will be to try to make a build option that can also
> do locale name renaming on Unix, so that I have something that I could
> test myself and push for the next release of PostgreSQL which will be
> in October.

Here is a such a patch.  If you go into pg_config_manual.h and
uncomment this line:

/* #define DEBUG_SETLOCALE_MAP */

... then Unix systems will also be able to rename locales passed to
setlocale().  A map file can be provided either by putting its
absolute path into the environment variable PG_SETLOCALE_MAP, or by
installing it as $PREFIX/share/postgresql/setlocale.map.  I couldn't
immediately think of a good way to find it in the data directory.

Here's an example of a line that should fix the Turkish problem
(though I haven't tested that, I am not a Windows user):

Turkish_T*.1254=tr-TR.1254

I added some documentation and showed that example.

If you wanted to check it's working on a Unix system, you might try
some lines like *.UTF-8=does_not_exist or en_US.UTF-8=fr_FR.UTF-8 and
then somehow verify that it's using French.

I considered adding win32setlocale.c to the list of files to build for
the port libraries even on Unix, and then wrapping the contents in
#ifdef, but IIUC macOS squawks if you have an empty .c after
preprocessing, so I'd have to add a dummy symbol in there.  Or maybe
that'd be better than what I did here, namely including
win32setlocale.c in chklocale.c in this case.  Better ideas welcome.
Adding a meson/configure switch to enable it and make the whole .c
file optional seemed excessive.

Вложения

RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

От
"Haifang Wang (Centific Technologies Inc)"
Дата:
Thanks, Thomas. Will wait for the build option to do further testing. 

Regards!
Haifang

-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Tuesday, June 25, 2024 4:52 AM
To: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com>
Cc: Rahul Pandey <pandeyrah@microsoft.com>; Vishwa Deepak <Vishwa.Deepak@microsoft.com>; Shawn Steele
<Shawn.Steele@microsoft.com>;Amy Wishnousky <amyw@microsoft.com>; Tom Lane <tgl@sss.pgh.pa.us>;
pgsql-bugs@lists.postgresql.org;Shweta Gulati <gulatishweta@microsoft.com>; Ashish Nawal <nawalashish@microsoft.com>
 
Subject: Re: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607

On Tue, Jun 4, 2024 at 9:22 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Since no one has come forward to test the patch I wrote on Windows, I 
> think my next move will be to try to make a build option that can also 
> do locale name renaming on Unix, so that I have something that I could 
> test myself and push for the next release of PostgreSQL which will be 
> in October.

Here is a such a patch.  If you go into pg_config_manual.h and uncomment this line:

/* #define DEBUG_SETLOCALE_MAP */

... then Unix systems will also be able to rename locales passed to setlocale().  A map file can be provided either by
puttingits absolute path into the environment variable PG_SETLOCALE_MAP, or by installing it as
$PREFIX/share/postgresql/setlocale.map. I couldn't immediately think of a good way to find it in the data directory.
 

Here's an example of a line that should fix the Turkish problem (though I haven't tested that, I am not a Windows
user):

Turkish_T*.1254=tr-TR.1254

I added some documentation and showed that example.

If you wanted to check it's working on a Unix system, you might try some lines like *.UTF-8=does_not_exist or
en_US.UTF-8=fr_FR.UTF-8and then somehow verify that it's using French.
 

I considered adding win32setlocale.c to the list of files to build for the port libraries even on Unix, and then
wrappingthe contents in #ifdef, but IIUC macOS squawks if you have an empty .c after preprocessing, so I'd have to add
adummy symbol in there.  Or maybe that'd be better than what I did here, namely including win32setlocale.c in
chklocale.cin this case.  Better ideas welcome.
 
Adding a meson/configure switch to enable it and make the whole .c file optional seemed excessive.