Обсуждение: [HACKERS] ICU locales and text/char(n) SortSupport on Windows
varstr_sortsupport() only allows Windows to use SortSupport with a non-C-locale (when the server encoding happens to be UTF-8, which I assume is the common case). This is because we (quite reasonably) don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion hack from varstr_cmp() for the SortSupport authoritative comparator (varstrfastcmp_locale() shouldn't get its own copy of this kludge, because it's supposed to be "fast"). This broad restriction made sense when Windows + UTF-8 + non-C-locale necessarily required the aforementioned UTF-16 conversion kludge. However, iff an ICU locale is in use on Windows (or any other platform), then we can always use SortSupport, regardless of anything else (we should not have the core code install a fmgr comparison shim that just calls varstr_cmp(), though we still do). We don't actually need the UTF-16 kludge at all, so we can use SortSupport without any special care. The current state of affairs doesn't make any sense, AFAICT, and so the restriction should be removed on general principle: we *already* expect ICU to have no restrictions that are peculiar to Windows, as we see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on to this restriction. This restriction also seems worth fixing because Windows users are generally more likely to want to use ICU locales; most of them would otherwise end up actually paying the overhead for the UTF-16 kludge. (Presumably the UTF-16 conversion makes text sorting *even slower* than it would be if we merely didn't do SortSupport, which is to say: very slow indeed.) In summary, we're currently attaching the use of SortSupport to the wrong thing. We're treating this UTF-16 business as something that implies a broad OS/platform restriction, when in fact it should be treated as implying a restriction for one particular collation provider only (a collation provider that happens to be built into Windows, but isn't really special to us). Attached patch shows what I'm getting at. This is untested, since I don't use Windows. Proceed with caution. On a related note, am I the only one that finds it questionable that str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself contains an "#ifdef USE_ICU" block? It seems like those two things might get conflated on some platforms. We don't want lower() to ever not use the ICU infrastructure when an ICU collation is used, and yet it's not obvious that that's impossible. I understand that the code in regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER facilities, and that that was always accepted as legacy that ICU had to live with. Maybe a static assertion is all that we need here (ICU builds must also be USE_WIDE_UPPER_LOWER builds). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote: > On a related note, am I the only one that finds it questionable that > str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself > contains an "#ifdef USE_ICU" block? It seems like those two things > might get conflated on some platforms. We don't want lower() to ever > not use the ICU infrastructure when an ICU collation is used, and yet > it's not obvious that that's impossible. I understand that the code in > regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER > facilities, and that that was always accepted as legacy that ICU had > to live with. Maybe a static assertion is all that we need here (ICU > builds must also be USE_WIDE_UPPER_LOWER builds). I checked !USE_WIDE_UPPER_LOWER by configuring v10 as follows: ./configure -C --prefix=$HOME/sw/nopath/pg10 --enable-debug \ --enable-cassert --enable-depend --enable-tap-tests --with-libxml\ --with-gssapi --with-openssl ac_cv_func_towlower=no The result fails to compile: $ make formatting.c: In function ‘str_tolower’: formatting.c:1623:10: error: ‘mylocale’ undeclared (first use in this function) if (mylocale) ^ ... snipped other errors ... A --with-icu build fails similarly. PG9.6 builds and passes "make check". Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing USE_WIDE_UPPER_LOWER? Every buildfarm fossil has both. Solaris 2.5.1 (out of support for 12 years) might be the most interesting OS affected: https://www.gnu.org/software/gnulib/manual/html_node/towlower.html https://www.gnu.org/software/gnulib/manual/html_node/wcstombs.html The above-described topic is currently a PostgreSQL 10 open item. Peter (Eisentraut), since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 16, 2017 at 03:33:53PM -0700, Peter Geoghegan wrote: > In summary, we're currently attaching the use of SortSupport to the > wrong thing. We're treating this UTF-16 business as something that > implies a broad OS/platform restriction, when in fact it should be > treated as implying a restriction for one particular collation > provider only (a collation provider that happens to be built into > Windows, but isn't really special to us). > > Attached patch shows what I'm getting at. This is untested, since I > don't use Windows. Proceed with caution. This is currently a v10 open item, but I think it doesn't qualify for that treatment. It's merely an opportunity for optimization, albeit an attractively-simple one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes: > Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing > USE_WIDE_UPPER_LOWER? Every buildfarm fossil has both. +1 ... if nothing else, there's the problem that untested code is likely to be broken. You just proved it *is* broken, of course, but my point is that even if we repaired the immediate damage we could have little confidence in it staying fixed. I think the USE_WIDE_UPPER_LOWER split was originally my code, so I'm willing to take care of removing it if there's consensus that that's what to do. I'm not sure that we need to treat this as a v10 open item, though. The premise of removing !USE_WIDE_UPPER_LOWER is that nobody cares anymore, therefore it shouldn't matter to users whether we remove it in v10. There's an argument that having only two states of the relevant code, not three, in the live back branches is worth something for maintenance --- but should that outweigh the risk of breaking something post-rc1? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote: > This is currently a v10 open item, but I think it doesn't qualify for that > treatment. It's merely an opportunity for optimization, albeit an > attractively-simple one. Fair enough. This is clearly an omission that was made in 41839b7ab, the commit that added/fixed Windows support for ICU. Hopefully the omission can be corrected for v10. I see this as a surprising behavior for users, since ICU locales on all other platforms *will* have much faster sorting than libc locales (often more than 3x faster). This is mostly because ICU brings back abbreviated keys. 3x - 5x faster is more of a qualitative difference than a quantitative one, IMHO. That having been said, I'm certainly not going to insist on a v10 fix for this issue. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/21/17 01:29, Noah Misch wrote: > I checked !USE_WIDE_UPPER_LOWER by configuring v10 as follows: > > ./configure -C --prefix=$HOME/sw/nopath/pg10 --enable-debug \ > --enable-cassert --enable-depend --enable-tap-tests --with-libxml \ > --with-gssapi --with-openssl ac_cv_func_towlower=no > > The result fails to compile: Yeah, the placement of the ifdef blocks was pretty bogus. This patch fixes it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 9/16/17 18:33, Peter Geoghegan wrote: > In summary, we're currently attaching the use of SortSupport to the > wrong thing. We're treating this UTF-16 business as something that > implies a broad OS/platform restriction, when in fact it should be > treated as implying a restriction for one particular collation > provider only (a collation provider that happens to be built into > Windows, but isn't really special to us). > > Attached patch shows what I'm getting at. This is untested, since I > don't use Windows. Proceed with caution. Your analysis makes sense, but the patch doesn't work, because "locale" is never set before reading it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >> Attached patch shows what I'm getting at. This is untested, since I >> don't use Windows. Proceed with caution. > > Your analysis makes sense, but the patch doesn't work, because "locale" > is never set before reading it. It was just for illustrative purposes. Seems fine to actually move the WIN32 block down to just before the existing TRUST_STRXFRM test, since the locale is going to be cached and then immediately reused where we return immediately (Windows libc) anyway. This would also be a win for clarity, IMV. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <noah@leadboat.com> wrote: > This is currently a v10 open item, but I think it doesn't qualify for that > treatment. It's merely an opportunity for optimization, albeit an > attractively-simple one. I have withdrawn this as an open item. I'm still hopeful that it can be worked through for v10 at Peter's discretion. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I wrote: > Noah Misch <noah@leadboat.com> writes: >> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing >> USE_WIDE_UPPER_LOWER? Every buildfarm fossil has both. > +1 ... if nothing else, there's the problem that untested code is likely > to be broken. You just proved it *is* broken, of course, but my point > is that even if we repaired the immediate damage we could have little > confidence in it staying fixed. Further notes about that: * The Single Unix Spec v2 (a/k/a POSIX 1997), which has been our minimum portability spec for quite awhile, requires wcstombs() and towlower(), and further requires the latter to be declared in <wctype.h>. * Surveying the buildfarm, I agree with your conclusion that every active member has wcstombs() and towlower(). gaur/pademelon is the lone member that lacks <wctype.h>; it declares towlower() in <wchar.h> instead. It's not so surprising that that system adheres to a pre-1997 idea of where to put that, because its /usr/include files mostly date from 1996 ... Meanwhile, I see that Peter has posted a fix for the immediate problem. I propose that Peter should apply his fix in HEAD and v10, and then I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 21, 2017 at 05:38:13PM -0400, Tom Lane wrote: > I wrote: > > Noah Misch <noah@leadboat.com> writes: > >> Perhaps it is time to require HAVE_WCSTOMBS and HAVE_TOWLOWER, removing > >> USE_WIDE_UPPER_LOWER? Every buildfarm fossil has both. > > > +1 ... if nothing else, there's the problem that untested code is likely > > to be broken. You just proved it *is* broken, of course, but my point > > is that even if we repaired the immediate damage we could have little > > confidence in it staying fixed. That would be easy enough to ensure by adding ac_cv_func_towlower=no to some buildfarm animal. But the real-world need for said code is clearly dead and gone. Good riddance. > Meanwhile, I see that Peter has posted a fix for the immediate problem. > I propose that Peter should apply his fix in HEAD and v10, and then > I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only. That works for me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/21/17 17:38, Tom Lane wrote: > Meanwhile, I see that Peter has posted a fix for the immediate problem. > I propose that Peter should apply his fix in HEAD and v10, and then done > I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/21/17 15:21, Peter Geoghegan wrote: > On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >>> Attached patch shows what I'm getting at. This is untested, since I >>> don't use Windows. Proceed with caution. >> >> Your analysis makes sense, but the patch doesn't work, because "locale" >> is never set before reading it. > > It was just for illustrative purposes. Seems fine to actually move the > WIN32 block down to just before the existing TRUST_STRXFRM test, since > the locale is going to be cached and then immediately reused where we > return immediately (Windows libc) anyway. This would also be a win for > clarity, IMV. I agree. The attached patch should do it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 9/21/17 17:38, Tom Lane wrote: >> Meanwhile, I see that Peter has posted a fix for the immediate problem. >> I propose that Peter should apply his fix in HEAD and v10, and then > done >> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only. And done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I agree. The attached patch should do it. I see one small issue here: You'll now need to set ssup->comparator back to NULL before you return early in the Windows' libc case. That way, a shim comparator (that goes through bttextcmp(), in the case of text) will be installed within FinishSortSupportFunction(). Other than that, looks good to me. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/22/17 12:25, Peter Geoghegan wrote: > On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> I agree. The attached patch should do it. > > I see one small issue here: You'll now need to set ssup->comparator > back to NULL before you return early in the Windows' libc case. That > way, a shim comparator (that goes through bttextcmp(), in the case of > text) will be installed within FinishSortSupportFunction(). Other than > that, looks good to me. committed accordingly -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 24, 2017 at 5:04 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/22/17 12:25, Peter Geoghegan wrote: >> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> I agree. The attached patch should do it. >> >> I see one small issue here: You'll now need to set ssup->comparator >> back to NULL before you return early in the Windows' libc case. That >> way, a shim comparator (that goes through bttextcmp(), in the case of >> text) will be installed within FinishSortSupportFunction(). Other than >> that, looks good to me. > > committed accordingly Thanks. I am currently working on a patch for the issues with ICU colcollate stability as I see them. I should be able to post something later today or tomorrow. I would appreciate it if you held off on committing anything there until you've considered what I'll propose. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers