Обсуждение: [HACKERS] Getting server crash on Windows when using ICU collation
Hi All,
1) psql -d postgres
2) CREATE DATABASE icu_win_test
TEMPLATE template0
ENCODING 'UTF8'
LC_CTYPE 'C'
LC_COLLATE 'C';
3) \c icu_win_test;
4) icu_win_test=# select 'B' > 'a' COLLATE "de-u-co-standard-x-icu";
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
The backtrace observed in the core dump is,
> postgres.exe!varstr_cmp(char * arg1, int len1, char * arg2, int len2, unsigned int collid) Line 1494 C
postgres.exe!text_cmp(varlena * arg1, varlena * arg2, unsigned int collid) Line 1627 C
postgres.exe!text_gt(FunctionCallInfoData * fcinfo) Line 1738 + 0x12 bytes C
postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext, char * isnull) Line 650 + 0xa bytes C
postgres.exe!evaluate_expr(Expr * expr, unsigned int result_type, int result_typmod, unsigned int result_collation) Line 4719 C
postgres.exe!evaluate_function(unsigned int funcid, unsigned int result_type, int result_typmod, unsigned int result_collid, unsigned int input_collid, List * args, char funcvariadic, HeapTupleData * func_tuple, eval_const_expressions_context * context) Line 4272 + 0x50 bytes C
postgres.exe!simplify_function(unsigned int funcid, unsigned int result_type, int result_typmod, unsigned int result_collid, unsigned int input_collid, List * * args_p, char funcvariadic, char process_args, char allow_non_const, eval_const_expressions_context * context) Line 3914 + 0x44 bytes C
postgres.exe!eval_const_expressions_mutator(Node * node, eval_const_expressions_context * context) Line 2627 C
postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator, void * context) Line 2735 + 0x37 bytes C
postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator, void * context) Line 2932 + 0x8 bytes C
postgres.exe!eval_const_expressions(PlannerInfo * root, Node * node) Line 2413 C
postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, char hasRecursion, double tuple_fraction) Line 623 + 0x2d bytes C
RCA:
====
As seen in the backtrace, the crash is basically happening in varstr_cmp() function. AFAICU, this crash is only happening on Windows because in varstr_cmp(), if the encoding type is UTF8, we don't even think of calling ICU functions to compare the string. Infact, we directly attempt to call the OS function wsccoll*().
The point is, if collation provider is ICU, then we should tryto call ICU functions for collation support instead of calling OS functions. This thing is being taken care inside varstr_cmp() for Linux but surprisingly it's not handled for Windows. Please have a look at below code snippet in varstr_cmp() to know on how it is being taken care for linux,
#endif /* WIN32 */
........
........
#ifdef USE_ICU
#ifdef HAVE_UCOL_STRCOLLUTF8
if (GetDatabaseEncoding() == PG_UTF8)
{
UErrorCode status;
status = U_ZERO_ERROR;
result = ucol_strcollUTF8(mylocale->info.icu.ucol,
arg1, len1,
arg2, len2,
&status);
if (U_FAILURE(status))
ereport(ERROR,
(errmsg("collation failed: %s", u_errorName(status))));
}
else
#endif
{
int32_t ulen1,
ulen2;
UChar *uchar1, *uchar2;
ulen1 = icu_to_uchar(&uchar1, arg1, len1);
ulen2 = icu_to_uchar(&uchar2, arg2, len2);
result = ucol_strcoll(mylocale->info.icu.ucol,
uchar1, ulen1,
uchar2, ulen2);
}
#else /* not USE_ICU */
/* shouldn't happen */
elog(ERROR, "unsupported collprovider: %c", mylocale->provider);
#endif /* not USE_ICU */
}
else
{
#ifdef HAVE_LOCALE_T
result = strcoll_l(a1p, a2p, mylocale->info.lt);
#else
/* shouldn't happen */
elog(ERROR, "unsupported collprovider: %c", mylocale->provider);
#endif
}
Fix:
====
I am currently working on this and will try to come up with the fix asap.
[1] - https://www.postgresql.org/message-id/CAE9k0P%3DQRjtS1a0rgTyKag_%2Bs6XCs7vovV%2BgSkUfYVASog0P8w%40mail.gmail.com
PFA patch that fixes the issue described in above thread. As mentioned in the above thread, the crash is basically happening in varstr_cmp() function and it's only happening on Windows because in varstr_cmp(), if the collation provider is ICU, we don't even think of calling ICU functions to compare the string. Infact, we directly attempt to call the OS function wsccoll*() which is not expected. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Sat, Jun 10, 2017 at 3:25 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi All, > > I am seeing a server crash when running queries using ICU collations on > Windows. Following are the steps to reproduce the crash with the help of > patch to enable icu feature on Windows - [1], > > 1) psql -d postgres > > 2) CREATE DATABASE icu_win_test > TEMPLATE template0 > ENCODING 'UTF8' > LC_CTYPE 'C' > LC_COLLATE 'C'; > > 3) \c icu_win_test; > > 4) icu_win_test=# select 'B' > 'a' COLLATE "de-u-co-standard-x-icu"; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> > > The backtrace observed in the core dump is, > >> postgres.exe!varstr_cmp(char * arg1, int len1, char * arg2, int len2, >> unsigned int collid) Line 1494 C > postgres.exe!text_cmp(varlena * arg1, varlena * arg2, unsigned int collid) > Line 1627 C > postgres.exe!text_gt(FunctionCallInfoData * fcinfo) Line 1738 + 0x12 > bytes C > postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext, > char * isnull) Line 650 + 0xa bytes C > postgres.exe!evaluate_expr(Expr * expr, unsigned int result_type, int > result_typmod, unsigned int result_collation) Line 4719 C > postgres.exe!evaluate_function(unsigned int funcid, unsigned int > result_type, int result_typmod, unsigned int result_collid, unsigned int > input_collid, List * args, char funcvariadic, HeapTupleData * func_tuple, > eval_const_expressions_context * context) Line 4272 + 0x50 bytes C > postgres.exe!simplify_function(unsigned int funcid, unsigned int > result_type, int result_typmod, unsigned int result_collid, unsigned int > input_collid, List * * args_p, char funcvariadic, char process_args, char > allow_non_const, eval_const_expressions_context * context) Line 3914 + 0x44 > bytes C > postgres.exe!eval_const_expressions_mutator(Node * node, > eval_const_expressions_context * context) Line 2627 C > postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator, > void * context) Line 2735 + 0x37 bytes C > postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator, > void * context) Line 2932 + 0x8 bytes C > postgres.exe!eval_const_expressions(PlannerInfo * root, Node * node) Line > 2413 C > postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse, > PlannerInfo * parent_root, char hasRecursion, double tuple_fraction) Line > 623 + 0x2d bytes C > > RCA: > ==== > As seen in the backtrace, the crash is basically happening in varstr_cmp() > function. AFAICU, this crash is only happening on Windows because in > varstr_cmp(), if the encoding type is UTF8, we don't even think of calling > ICU functions to compare the string. Infact, we directly attempt to call the > OS function wsccoll*(). > > The point is, if collation provider is ICU, then we should tryto call ICU > functions for collation support instead of calling OS functions. This thing > is being taken care inside varstr_cmp() for Linux but surprisingly it's not > handled for Windows. Please have a look at below code snippet in > varstr_cmp() to know on how it is being taken care for linux, > > #endif /* WIN32 */ > ........ > ........ > > #ifdef USE_ICU > #ifdef HAVE_UCOL_STRCOLLUTF8 > if (GetDatabaseEncoding() == PG_UTF8) > { > UErrorCode status; > > status = U_ZERO_ERROR; > result = ucol_strcollUTF8(mylocale->info.icu.ucol, > arg1, len1, > arg2, len2, > &status); > if (U_FAILURE(status)) > ereport(ERROR, > (errmsg("collation failed: %s", u_errorName(status)))); > } > else > #endif > { > int32_t ulen1, > ulen2; > UChar *uchar1, *uchar2; > > ulen1 = icu_to_uchar(&uchar1, arg1, len1); > ulen2 = icu_to_uchar(&uchar2, arg2, len2); > > result = ucol_strcoll(mylocale->info.icu.ucol, > uchar1, ulen1, > uchar2, ulen2); > } > #else /* not USE_ICU */ > /* shouldn't happen */ > elog(ERROR, "unsupported collprovider: %c", mylocale->provider); > #endif /* not USE_ICU */ > } > else > { > #ifdef HAVE_LOCALE_T > result = strcoll_l(a1p, a2p, mylocale->info.lt); > #else > /* shouldn't happen */ > elog(ERROR, "unsupported collprovider: %c", mylocale->provider); > #endif > } > > Fix: > ==== > I am currently working on this and will try to come up with the fix asap. > > [1] - > https://www.postgresql.org/message-id/CAE9k0P%3DQRjtS1a0rgTyKag_%2Bs6XCs7vovV%2BgSkUfYVASog0P8w%40mail.gmail.com > > > -- > With Regards, > Ashutosh Sharma > EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Mon, Jun 12, 2017 at 10:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > PFA patch that fixes the issue described in above thread. As mentioned > in the above thread, the crash is basically happening in varstr_cmp() > function and it's only happening on Windows because in varstr_cmp(), > if the collation provider is ICU, we don't even think of calling ICU > functions to compare the string. Infact, we directly attempt to call > the OS function wsccoll*() which is not expected. Thanks. > Your analysis is right, basically, when ICU is enabled we need to use ICU related functions and use corresponding information in the pg_locale structure. However, I see few problems with your patch. + if (mylocale) + { + if (mylocale->provider == COLLPROVIDER_ICU) + { +#ifdef USE_ICU +#ifdef HAVE_UCOL_STRCOLLUTF8 + if (GetDatabaseEncoding() == PG_UTF8) + { + UErrorCode status; + + status = U_ZERO_ERROR; + result = ucol_strcollUTF8(mylocale->info.icu.ucol, + arg1, len1, + arg2, len2, + &status); On Windows, we need to map UTF-8 strings to UTF-16 as we are doing in this function for other Windows specific comparisons for UTF-8 strings. Also, we don't want to screw memcmp optimization we have in this function, so do this ICU specific checking after memcmp check. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 6/12/17 00:38, Ashutosh Sharma wrote: > PFA patch that fixes the issue described in above thread. As mentioned > in the above thread, the crash is basically happening in varstr_cmp() > function and it's only happening on Windows because in varstr_cmp(), > if the collation provider is ICU, we don't even think of calling ICU > functions to compare the string. Infact, we directly attempt to call > the OS function wsccoll*() which is not expected. Thanks. Maybe just diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index a0dd391f09..2506f4eeb8 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -1433,7 +1433,7 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid)#ifdef WIN32 /* Win32 doesnot have UTF-8, so we need to map to UTF-16 */ - if (GetDatabaseEncoding() == PG_UTF8) + if (GetDatabaseEncoding() == PG_UTF8 && (!mylocale || mylocale->provider == COLLPROVIDER_LIBC)) { int a1len; int a2len; -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On Thu, Jun 15, 2017 at 7:43 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Jun 12, 2017 at 10:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> PFA patch that fixes the issue described in above thread. As mentioned >> in the above thread, the crash is basically happening in varstr_cmp() >> function and it's only happening on Windows because in varstr_cmp(), >> if the collation provider is ICU, we don't even think of calling ICU >> functions to compare the string. Infact, we directly attempt to call >> the OS function wsccoll*() which is not expected. Thanks. >> > > Your analysis is right, basically, when ICU is enabled we need to use > ICU related functions and use corresponding information in the > pg_locale structure. However, I see few problems with your patch. > > + if (mylocale) > + { > + if (mylocale->provider == COLLPROVIDER_ICU) > + { > +#ifdef USE_ICU > +#ifdef HAVE_UCOL_STRCOLLUTF8 > + if (GetDatabaseEncoding() == PG_UTF8) > + { > + UErrorCode status; > + > + status = U_ZERO_ERROR; > + result = ucol_strcollUTF8(mylocale->info.icu.ucol, > + arg1, len1, > + arg2, len2, > + &status); > > On Windows, we need to map UTF-8 strings to UTF-16 as we are doing in > this function for other Windows specific comparisons for UTF-8 > strings. Also, we don't want to screw memcmp optimization we have in > this function, so do this ICU specific checking after memcmp check. Firstly, thanks for reviewing the patch. Okay, we do map UTF-8 strings to UTF-16 on windows. But, I think, we only do that when calling OS (wcscoll*) functions for string comparison. Please correct me if i am wrong here. In my case, i am using ICU functions for comparing strings and i am not sure if we need the convert UTF-8 strings to UTF-16. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Hi, On Thu, Jun 15, 2017 at 8:36 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/12/17 00:38, Ashutosh Sharma wrote: >> PFA patch that fixes the issue described in above thread. As mentioned >> in the above thread, the crash is basically happening in varstr_cmp() >> function and it's only happening on Windows because in varstr_cmp(), >> if the collation provider is ICU, we don't even think of calling ICU >> functions to compare the string. Infact, we directly attempt to call >> the OS function wsccoll*() which is not expected. Thanks. > > Maybe just > > diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c > index a0dd391f09..2506f4eeb8 100644 > --- a/src/backend/utils/adt/varlena.c > +++ b/src/backend/utils/adt/varlena.c > @@ -1433,7 +1433,7 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid) > > #ifdef WIN32 > /* Win32 does not have UTF-8, so we need to map to UTF-16 */ > - if (GetDatabaseEncoding() == PG_UTF8) > + if (GetDatabaseEncoding() == PG_UTF8 && (!mylocale || mylocale->provider == COLLPROVIDER_LIBC)) > { > int a1len; > int a2len; Oh, yes, this looks like the simplest and possibly the ideal way to fix the issue. Attached is the patch. Thanks for the inputs. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Jun 15, 2017 at 11:18 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > On Thu, Jun 15, 2017 at 8:36 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/12/17 00:38, Ashutosh Sharma wrote: >>> PFA patch that fixes the issue described in above thread. As mentioned >>> in the above thread, the crash is basically happening in varstr_cmp() >>> function and it's only happening on Windows because in varstr_cmp(), >>> if the collation provider is ICU, we don't even think of calling ICU >>> functions to compare the string. Infact, we directly attempt to call >>> the OS function wsccoll*() which is not expected. Thanks. >> >> Maybe just >> >> diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c >> index a0dd391f09..2506f4eeb8 100644 >> --- a/src/backend/utils/adt/varlena.c >> +++ b/src/backend/utils/adt/varlena.c >> @@ -1433,7 +1433,7 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid) >> >> #ifdef WIN32 >> /* Win32 does not have UTF-8, so we need to map to UTF-16 */ >> - if (GetDatabaseEncoding() == PG_UTF8) >> + if (GetDatabaseEncoding() == PG_UTF8 && (!mylocale || mylocale->provider == COLLPROVIDER_LIBC)) >> { >> int a1len; >> int a2len; > > Oh, yes, this looks like the simplest and possibly the ideal way to > fix the issue. Attached is the patch. Thanks for the inputs. > How will this compare UTF-8 strings in UTF-8 encoding? It seems to me that ideally, it should use ucol_strcollUTF8 to compare the same, however, with patch, it will always ucol_strcoll as we never define HAVE_UCOL_STRCOLLUTF8 flag on Windows. We have some multi-byte tests in src/test/mb directory, see if we can use those to verify these changes. I admit that I have not tried to execute those on Windows, so I have no idea if those even work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 6/15/17 13:48, Ashutosh Sharma wrote: >> Maybe just >> >> diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c >> index a0dd391f09..2506f4eeb8 100644 >> --- a/src/backend/utils/adt/varlena.c >> +++ b/src/backend/utils/adt/varlena.c >> @@ -1433,7 +1433,7 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid) >> >> #ifdef WIN32 >> /* Win32 does not have UTF-8, so we need to map to UTF-16 */ >> - if (GetDatabaseEncoding() == PG_UTF8) >> + if (GetDatabaseEncoding() == PG_UTF8 && (!mylocale || mylocale->provider == COLLPROVIDER_LIBC)) >> { >> int a1len; >> int a2len; > > Oh, yes, this looks like the simplest and possibly the ideal way to > fix the issue. Attached is the patch. Thanks for the inputs. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/16/17 06:30, Amit Kapila wrote: > How will this compare UTF-8 strings in UTF-8 encoding? It seems to me > that ideally, it should use ucol_strcollUTF8 to compare the same, > however, with patch, it will always ucol_strcoll as we never define > HAVE_UCOL_STRCOLLUTF8 flag on Windows. We have a configure check for that, but I don't know how to replicate that on Windows. If ucol_strcollUTF8 is not available, we have code to convert to UTF-16.This is the same code that is used for non-Windows. > We have some multi-byte tests > in src/test/mb directory, see if we can use those to verify these > changes. I admit that I have not tried to execute those on Windows, > so I have no idea if those even work. There is a test specifically for ICU, which you can run with make check EXTRA_TESTS=collate.icu.utf8 -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/16/17 10:12, Peter Eisentraut wrote: > On 6/16/17 06:30, Amit Kapila wrote: >> How will this compare UTF-8 strings in UTF-8 encoding? It seems to me >> that ideally, it should use ucol_strcollUTF8 to compare the same, >> however, with patch, it will always ucol_strcoll as we never define >> HAVE_UCOL_STRCOLLUTF8 flag on Windows. > > We have a configure check for that, but I don't know how to replicate > that on Windows. > > If ucol_strcollUTF8 is not available, we have code to convert to UTF-16. > This is the same code that is used for non-Windows. After thinking about this some more, I have committed a change to define HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally. Until someone figures out a different way, I think it's better that users of newish versions of ICU get the newer/better behavior, and users of older versions can file a bug. The alternative is that we forget about this and we keep using the old code path indefinitely. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On Sat, Jun 17, 2017 at 6:57 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/16/17 10:12, Peter Eisentraut wrote: >> On 6/16/17 06:30, Amit Kapila wrote: >>> How will this compare UTF-8 strings in UTF-8 encoding? It seems to me >>> that ideally, it should use ucol_strcollUTF8 to compare the same, >>> however, with patch, it will always ucol_strcoll as we never define >>> HAVE_UCOL_STRCOLLUTF8 flag on Windows. >> >> We have a configure check for that, but I don't know how to replicate >> that on Windows. >> >> If ucol_strcollUTF8 is not available, we have code to convert to UTF-16. >> This is the same code that is used for non-Windows. > > After thinking about this some more, I have committed a change to define > HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally. Surprisingly, I am not able to find this commit on a master branch. Until someone figures > out a different way, I think it's better that users of newish versions > of ICU get the newer/better behavior, and users of older versions can > file a bug. The alternative is that we forget about this and we keep > using the old code path indefinitely. Well, it will work for the users who are using ICU version >= 50 but not for the older ICU versions So, we will have to figure out a way, to either detect the availability of ucoll_strcollutf8() on Windows or may be ICU version itself and set HAVE_UCOL_STRCOLLUTF8 flag accordingly. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 16, 2017 at 7:42 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/16/17 06:30, Amit Kapila wrote: >> How will this compare UTF-8 strings in UTF-8 encoding? It seems to me >> that ideally, it should use ucol_strcollUTF8 to compare the same, >> however, with patch, it will always ucol_strcoll as we never define >> HAVE_UCOL_STRCOLLUTF8 flag on Windows. > > We have a configure check for that, but I don't know how to replicate > that on Windows. > I think we can find out in which version of ICU this function has been released and then determine the version. We already determine the MSVC version in Windows (refer DetermineVisualStudioVersion in src/tools/msvc/VSObjectFactory.pm). I think we can determine ICU version with something like "unconv -V", it gives results as "uconv v2.1 ICU 53.1". -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Jun 17, 2017 at 6:57 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/16/17 10:12, Peter Eisentraut wrote: >> On 6/16/17 06:30, Amit Kapila wrote: >>> How will this compare UTF-8 strings in UTF-8 encoding? It seems to me >>> that ideally, it should use ucol_strcollUTF8 to compare the same, >>> however, with patch, it will always ucol_strcoll as we never define >>> HAVE_UCOL_STRCOLLUTF8 flag on Windows. >> >> We have a configure check for that, but I don't know how to replicate >> that on Windows. >> >> If ucol_strcollUTF8 is not available, we have code to convert to UTF-16. >> This is the same code that is used for non-Windows. > > After thinking about this some more, I have committed a change to define > HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally. Until someone figures > out a different way, I think it's better that users of newish versions > of ICU get the newer/better behavior, and users of older versions can > file a bug. > Yeah, we can take that stand or maybe document that as well but not sure that is the best way to deal with it. I have just posted one way to determine if icu library has support for ucol_strcollUTF8, see if that sounds like a way forward to you. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, Attached is the patch that detects the ICU version on Windows and based on that it decides whether 'HAVE_UCOL_STRCOLLUTF8' flag needs to be set or not. If this patch gets consisdered then, we may have to revert the changes following git commit. Thanks. commit e42645ad92687a2250ad17e1a045da73e54a5064 Author: Peter Eisentraut <peter_e@gmx.net> Date: Fri Jun 16 21:23:22 2017 -0400 Define HAVE_UCOL_STRCOLLUTF8 on Windows This should normally be determined by a configure check, but until someone figures out how to do that on Windows, it's better that the code uses the new function by default. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Sat, Jun 17, 2017 at 9:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Jun 17, 2017 at 6:57 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/16/17 10:12, Peter Eisentraut wrote: >>> On 6/16/17 06:30, Amit Kapila wrote: >>>> How will this compare UTF-8 strings in UTF-8 encoding? It seems to me >>>> that ideally, it should use ucol_strcollUTF8 to compare the same, >>>> however, with patch, it will always ucol_strcoll as we never define >>>> HAVE_UCOL_STRCOLLUTF8 flag on Windows. >>> >>> We have a configure check for that, but I don't know how to replicate >>> that on Windows. >>> >>> If ucol_strcollUTF8 is not available, we have code to convert to UTF-16. >>> This is the same code that is used for non-Windows. >> >> After thinking about this some more, I have committed a change to define >> HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally. Until someone figures >> out a different way, I think it's better that users of newish versions >> of ICU get the newer/better behavior, and users of older versions can >> file a bug. >> > > Yeah, we can take that stand or maybe document that as well but not > sure that is the best way to deal with it. I have just posted one way > to determine if icu library has support for ucol_strcollUTF8, see if > that sounds like a way forward to you. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 6/16/17 23:46, Amit Kapila wrote: > I have just posted one way > to determine if icu library has support for ucol_strcollUTF8, see if > that sounds like a way forward to you. I'm not in a position to test such patches, so someone else will have to take that on. It might not be worth bothering. ICU 50 is already about 5 years old. If you're packaging for Windows, I suspect you have the option of bundling a version of your choice. The support for older versions is mainly to support building on "stable" Linux distributions, and even there the window of usefulness is closing. (CentOS 7 has 50, CentOS 6 has 4.2 which is too old for other reasons, Debian stable has 52 (and it will become oldstable after today)). Then again, it might be worth putting things into place in case we have to do similar detections in the future. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On Sat, Jun 17, 2017 at 6:38 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/16/17 23:46, Amit Kapila wrote: >> I have just posted one way >> to determine if icu library has support for ucol_strcollUTF8, see if >> that sounds like a way forward to you. > > I'm not in a position to test such patches, so someone else will have to > take that on. Well, I have tested it from my end. I've basically tried to test it by running multi-byte tests as Amit suggested and by verifing the things manually through debugger. And, i had mistakenly attached wrong patch in my earlier email. Here, i attach the correct patch. Sorry about that. > > It might not be worth bothering. ICU 50 is already about 5 years old. > If you're packaging for Windows, I suspect you have the option of > bundling a version of your choice. The support for older versions is > mainly to support building on "stable" Linux distributions, and even > there the window of usefulness is closing. (CentOS 7 has 50, CentOS 6 > has 4.2 which is too old for other reasons, Debian stable has 52 (and it > will become oldstable after today)). Yes, it's hard to find any user's having ICU version < 50 installed on their system. But, having said that, it's always good to have such detective checks basically considering that we already have such configure check for Linux platform. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Sat, Jun 17, 2017 at 8:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > On Sat, Jun 17, 2017 at 6:38 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/16/17 23:46, Amit Kapila wrote: >>> I have just posted one way >>> to determine if icu library has support for ucol_strcollUTF8, see if >>> that sounds like a way forward to you. >> >> I'm not in a position to test such patches, so someone else will have to >> take that on. > I can verify the patch on Win7 setup to which I have access if that helps. > Well, I have tested it from my end. I've basically tried to test it by > running multi-byte tests as Amit suggested and by verifing the things > manually through debugger . And, i had mistakenly attached wrong patch > in my earlier email. Here, i attach the correct patch. > Is it possible for you to once verify this patch with icu library version where ucol_strcollUTF8 is not defined? + # get the icu version. + my $output = `uconv -V /? 2>&1`; + $? >> 8 == 0 or die "uconv command not found"; If we don't find unconv, isn't it better to fall back to non-UTF8 version rather than saying command not found? + print $o "#define HAVE_UCOL_STRCOLLUTF8 1\n" + if ($self->{ICUVersion} >= 50); Indentation looks slightly odd, see the other usage in the same file. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On Sun, Jun 18, 2017 at 6:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Jun 17, 2017 at 8:27 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Hi, >> >> On Sat, Jun 17, 2017 at 6:38 PM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> On 6/16/17 23:46, Amit Kapila wrote: >>>> I have just posted one way >>>> to determine if icu library has support for ucol_strcollUTF8, see if >>>> that sounds like a way forward to you. >>> >>> I'm not in a position to test such patches, so someone else will have to >>> take that on. >> > > I can verify the patch on Win7 setup to which I have access if that helps. > >> Well, I have tested it from my end. I've basically tried to test it by >> running multi-byte tests as Amit suggested and by verifing the things >> manually through debugger . And, i had mistakenly attached wrong patch >> in my earlier email. Here, i attach the correct patch. >> > > Is it possible for you to once verify this patch with icu library > version where ucol_strcollUTF8 is not defined? Okay, I have verified the attached patch with following ICU versions and didn't find any problem 1) ICU 4.8.1.1 2) ICU 49.1.2 3) ICU 50.1.2 4) ICU 53.1 The first two ICU versions i.e. 'ICU 4.8.1.1' and 'ICU 49.1.2' doesn't have 'ucol_strcollUTF8' defined in it whereas the last two versions i.e. 'ICU 50.1.2' and 'ICU 53.1', includes 'ucol_strcollUTF8'. > > + # get the icu version. > + my $output = `uconv -V /? 2>&1`; > + $? >> 8 == 0 or die "uconv command not found"; > > If we don't find unconv, isn't it better to fall back to non-UTF8 > version rather than saying command not found? Well, if any of the ICU package is installed on our system then we will certainly find uconv command. The only case where we can see such error is when user doesn't have any of the ICU packages installed on their system and are somehow trying to perform icu enabled build and in such case the build configuration has to fail which i think is the expected behaviour. Anyways, it is not uconv that decides whether we should fallback to non-UTF8 or not. It's the ICU version that decides whether to stick to UTF8 or fallback to nonUTF8 version. Thanks. > > + print $o "#define HAVE_UCOL_STRCOLLUTF8 1\n" > + if ($self->{ICUVersion} >= 50); > > Indentation looks slightly odd, see the other usage in the same file. I have corrected it. Please have a look into the attached patch. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On 6/19/17 00:42, Ashutosh Sharma wrote: >> If we don't find unconv, isn't it better to fall back to non-UTF8 >> version rather than saying command not found? > Well, if any of the ICU package is installed on our system then we > will certainly find uconv command. The only case where we can see such > error is when user doesn't have any of the ICU packages installed on > their system and are somehow trying to perform icu enabled build and > in such case the build configuration has to fail which i think is the > expected behaviour. Anyways, it is not uconv that decides whether we > should fallback to non-UTF8 or not. It's the ICU version that decides > whether to stick to UTF8 or fallback to nonUTF8 version. Thanks. How do we know we're running the uconv command from the installation that we will compile against? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On Tue, Jun 20, 2017 at 2:36 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/19/17 00:42, Ashutosh Sharma wrote: >>> If we don't find unconv, isn't it better to fall back to non-UTF8 >>> version rather than saying command not found? >> Well, if any of the ICU package is installed on our system then we >> will certainly find uconv command. The only case where we can see such >> error is when user doesn't have any of the ICU packages installed on >> their system and are somehow trying to perform icu enabled build and >> in such case the build configuration has to fail which i think is the >> expected behaviour. Anyways, it is not uconv that decides whether we >> should fallback to non-UTF8 or not. It's the ICU version that decides >> whether to stick to UTF8 or fallback to nonUTF8 version. Thanks. > > How do we know we're running the uconv command from the installation > that we will compile against? Okay, I think, you are talking about the case where we may have multiple ICU versions installed on our system and there might be a possibility that the uconv command may get executed from the ICU version that we are not trying to link with postgres. This can only happen when user has set the path for icu binaries in the system PATH variable incorrectly. For e.g., if i have installed ICU versions 49 and 53 on my system and the library and include path i am trying to add is from ICU version 53 whereas the system PATH points to dll's from ICU 49 then that will be considered as a incorrect build setup. In this case, even if i am trying to use ICU 53 libraries but as dll's i am using is from ICU 49 (as my system PATH is pointing to ICU 49 bin path) the uconv -V output would be '49.1' instead of '53.1'. In such case, the postgres installation would fail because during installation it would search for libicu53.dll not libicu49.dll. Therefore, even if we are running wrong uconv command, we would not at all succeed with the installation process. Hence, i think, we shouldn't be worrying about any such things. Please let me know if i could answer your question or not. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
On Tue, Jun 20, 2017 at 9:06 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Hi, > > On Tue, Jun 20, 2017 at 2:36 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/19/17 00:42, Ashutosh Sharma wrote: >>>> If we don't find unconv, isn't it better to fall back to non-UTF8 >>>> version rather than saying command not found? >>> Well, if any of the ICU package is installed on our system then we >>> will certainly find uconv command. The only case where we can see such >>> error is when user doesn't have any of the ICU packages installed on >>> their system and are somehow trying to perform icu enabled build and >>> in such case the build configuration has to fail which i think is the >>> expected behaviour. Anyways, it is not uconv that decides whether we >>> should fallback to non-UTF8 or not. It's the ICU version that decides >>> whether to stick to UTF8 or fallback to nonUTF8 version. Thanks. >> >> How do we know we're running the uconv command from the installation >> that we will compile against? > > Okay, I think, you are talking about the case where we may have > multiple ICU versions installed on our system and there might be a > possibility that the uconv command may get executed from the ICU > version that we are not trying to link with postgres. > To avoid that why can't we use the same icu path for executing uconv as we are using for linking? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 6/19/17 23:36, Ashutosh Sharma wrote: > In this case, even if i am trying to use ICU 53 libraries but as > dll's i am using is from ICU 49 (as my system PATH is pointing to ICU > 49 bin path) the uconv -V output would be '49.1' instead of '53.1'. In > such case, the postgres installation would fail because during > installation it would search for libicu53.dll not libicu49.dll. I don't think this is how it works. The result would be that uconv claims the function is available but it's not in the library -- then you get a compilation failure. Or uconv claims the function is not there but it is -- then you just build without using it. The .dll files are only used at run time. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6/20/17 09:23, Amit Kapila wrote: > To avoid that why can't we use the same icu path for executing uconv > as we are using for linking? Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or something like that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 21, 2017 at 12:15 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 6/20/17 09:23, Amit Kapila wrote: >> To avoid that why can't we use the same icu path for executing uconv >> as we are using for linking? > > Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or > something like that. > That's what I had in mind as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On Wed, Jun 21, 2017 at 9:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jun 21, 2017 at 12:15 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 6/20/17 09:23, Amit Kapila wrote: >>> To avoid that why can't we use the same icu path for executing uconv >>> as we are using for linking? >> >> Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or >> something like that. >> > > That's what I had in mind as well. > Okay, attached is the patch which first detects the platform type and runs the uconv commands accordingly from the corresponding icu bin path. Please have a look and let me know for any issues. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Okay, attached is the patch which first detects the platform type and > runs the uconv commands accordingly from the corresponding icu bin > path. Please have a look and let me know for any issues. Thanks. Should this be on the open items list? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 10, 2017 at 11:02 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >> Okay, attached is the patch which first detects the platform type and >> runs the uconv commands accordingly from the corresponding icu bin >> path. Please have a look and let me know for any issues. Thanks. > > Should this be on the open items list? Maybe. Peter E has a patch to fix the other issue (ICU 52 has a buggy ucol_strcollUTF8()) by never using it on ICU versions prior to 53. He posted this just yesterday. This removes the existing configure test, replacing it with something that will work just the same on Windows, presuming Peter also reverts the commit that had ICU never use ucol_strcollUTF8() on Windows. -- Peter Geoghegan
On Thu, Aug 10, 2017 at 3:57 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Thu, Aug 10, 2017 at 11:02 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>> Okay, attached is the patch which first detects the platform type and >>> runs the uconv commands accordingly from the corresponding icu bin >>> path. Please have a look and let me know for any issues. Thanks. >> >> Should this be on the open items list? > > Maybe. Peter E has a patch to fix the other issue (ICU 52 has a buggy > ucol_strcollUTF8()) by never using it on ICU versions prior to 53. He > posted this just yesterday. He committed it just now, so that's the end of this Windows issue, I suppose. -- Peter Geoghegan
On 8/10/17 22:21, Peter Geoghegan wrote: > On Thu, Aug 10, 2017 at 3:57 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> On Thu, Aug 10, 2017 at 11:02 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: >>>> Okay, attached is the patch which first detects the platform type and >>>> runs the uconv commands accordingly from the corresponding icu bin >>>> path. Please have a look and let me know for any issues. Thanks. >>> >>> Should this be on the open items list? >> >> Maybe. Peter E has a patch to fix the other issue (ICU 52 has a buggy >> ucol_strcollUTF8()) by never using it on ICU versions prior to 53. He >> posted this just yesterday. > > He committed it just now, so that's the end of this Windows issue, I suppose. Yes, I think so. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services