Обсуждение: Possibly a bug

Поиск
Список
Период
Сортировка

Possibly a bug

От
Анатолий
Дата:
Hello.
If the user parameter is passed to a function, the other parameter is not converted to uppercase (if it is cirillic). The same with procedures and if you swap the params.


create or replace function to_upper_first(param1 text, param2 text)
returns text as
$$
    select upper(param1) || ' ' || upper(param2) || ' ' || param2;
$$ language sql;

select version();
select  to_upper_first('тест', user::text);

Output:134 ms

CREATE FUNCTION                                                      version                                                       
-------------------------------------------------------------------------------------------------------------------- PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
(1 row)
                     to_upper_first                     
-------------------------------------------------------- тест USER_44D38K4ZB_44D4GQVSV user_44d38k4zb_44d4gqvsv
(1 row)

https://onecompiler.com/postgresql/44d4fxa3x

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

select  to_upper_first('test', user::text);

to_upper_first                     
-------------------------------------------------------- TEST USER_44D38K6QJ_44D4GTCEM user_44d38k6qj_44d4gtcem
(1 row)


Re: Possibly a bug

От
Tom Lane
Дата:
=?UTF-8?B?0JDQvdCw0YLQvtC70LjQuQ==?= <anatoly@email.su> writes:
> Hello. If the *user* parameter is passed to a function, the other 
> parameter is not converted to uppercase (if it is cirillic). The same 
> with procedures and if you swap the params.

Works for me.  The most obvious explanation for your result is that
you're using a locale that doesn't know anything about Cyrillic,
such as C locale.

            regards, tom lane



Re: Possibly a bug

От
surya poondla
Дата:
Hi Анатолий, Tom

Thank you for reporting the issue.
I did some testing and see a similar issue. I'm seeing a behavior that suggests this isn't simply a C locale configuration issue, but rather an inconsistency in how collations are resolved within SQL functions.

My environment is configured with UTF-8 locale:

  postgres=# show lc_collate;
   lc_collate
  -------------
   en_US.UTF-8
  (1 row)

  postgres=# show lc_ctype;
    lc_ctype
  -------------
   en_US.UTF-8
  (1 row)

Test 1: SQL function with user parameter. Cyrillic is not uppercased
  postgres=# create or replace function to_upper_first(param1 text, param2 text)
  returns text as $
      select upper(param1) || ' ' || upper(param2) || ' ' || param2;
  $ language sql;
CREATE FUNCTION
postgres=# select to_upper_first('тест', user::text);
  to_upper_first
------------------
 тест SURYA surya
(1 row)

Test 2: PL/pgSQL function. Cyrillic is not uppercased
postgres=# create or replace function to_upper_first(param1 text, param2 text)
postgres-# returns text language plpgsql as $$
postgres$# begin
postgres$#     return upper(param1) || ' ' || upper(param2) || ' ' || param2;
postgres$# end;
postgres$# $$;
CREATE FUNCTION
postgres=#
postgres=# select to_upper_first('тест', user::text);
  to_upper_first
------------------
 тест SURYA surya
(1 row)


Test 3: Explicit collation, uppercase works correctly
postgres=# create or replace function to_upper_first(param1 text, param2 text)
postgres-# returns text as $$
postgres$#     select upper(param1 collate "en_US.UTF-8")
postgres$#          || ' '
postgres$#          || upper(param2 collate "en_US.UTF-8")
postgres$#          || ' '
postgres$#          || param2;
postgres$# $$ language sql;
CREATE FUNCTION
postgres=#
postgres=# select to_upper_first('тест', user::text);
  to_upper_first
------------------
 ТЕСТ SURYA surya
(1 row)

Test 4: Direct upper() calls - Work correctly
postgres=# select upper('тест');
 upper
-------
 ТЕСТ
(1 row)

postgres=# select upper('тест'::text);
 upper
-------
 ТЕСТ
(1 row)

postgres=# select upper(('тест')::text);
 upper
-------
 ТЕСТ
(1 row)

postgres=# select upper(('тест') collate "en_US.UTF-8");
 upper
-------
 ТЕСТ
(1 row)

postgres=# select upper(('тест') collate "C");
 upper
-------
 тест
(1 row)


My observations:
  1. Direct calls to upper('тест') correctly uppercases the Cyrillic using the database's en_US.UTF-8 locale
  2. Inside functions, upper(param1) fails to uppercase Cyrillic when user::text is passed as param2
       i. The issue occurs in both SQL and PL/pgSQL functions
  3. Explicitly specifying collate "en_US.UTF-8" in the function resolves the issue.

If select upper('тест'::text) works correctly (i.e it converts to uppercase) outside the function, why does upper(param1) appear to use C collation inside the function when user::text is involved as another parameter?
It seems that the user session variable as a parameter is somehow affecting collation resolution for the independent param1 parameter, which seems unexpected.

I will investigate more into this issue.

Regards,
Surya Poondla

Re: Possibly a bug

От
jian he
Дата:
hi.

I further simplified your test case.

create or replace function to_upper_first(param1 text, param2 text)
returns text as $$
select upper(param1) || '_'
$$ language sql;

select to_upper_first('тест', 'a'::text);
select to_upper_first('тест', 'a'::text collate "default");
select to_upper_first('тест', 'a'::text collate "C");
select to_upper_first('тест', current_user);
select to_upper_first('тест', user::text);
select to_upper_first('тест', current_database()::text);
select to_upper_first('тест', current_schema()::text);
select to_upper_first('тест', current_schemas(true)::text);

These current.* functions always use the "C" collation for their
output, so in this case, they are not related to session variables, in
my view.

essentially, i think, your question is about the difference of
select to_upper_first('тест', 'a'::text);
and
select to_upper_first('тест', 'a'::text collate "C");

you may also see.
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/collate.icu.utf8.out#n896



--
jian
https://www.enterprisedb.com/



Re: Possibly a bug

От
surya poondla
Дата:
Hi Jian,

These current.* functions always use the "C" collation for their
output, so in this case, they are not related to session variables, in
my view.

Thank you, I did some more testing and see the below.

postgres=# SELECT collation for (user);
 pg_collation_for
------------------
 "C"
(1 row)

postgres=# SELECT collation for (current_user);
 pg_collation_for
------------------
 "C"
(1 row)

postgres=#   SELECT to_upper_varchar('тест'::varchar);
 to_upper_varchar
------------------
 ТЕСТ
(1 row)

postgres=#   SELECT collation for ('тест');
 pg_collation_for
------------------

(1 row)

postgres=#   CREATE OR REPLACE FUNCTION debug_func(param1 text, param2 text) RETURNS text AS $$ SELECT 'param1 collation: ' || pg_collation_for(param1) || ', param2 collation: ' || pg_collation_for(param2); $$ LANGUAGE sql;
CREATE FUNCTION
postgres=#   SELECT debug_func('тест', user::text);
                 debug_func
----------------------------------------------
 param1 collation: "C", param2 collation: "C"
(1 row)

postgres=#   CREATE OR REPLACE FUNCTION debug_func2(param1 text, param2 text) RETURNS text AS $$ SELECT 'param1 collation: ' || pg_collation_for(param1) || ', param2 collation: ' || pg_collation_for(param2); $$ LANGUAGE sql;
CREATE FUNCTION
postgres=# SELECT debug_func2('тест'::text, user::text);
                  debug_func2
----------------------------------------------
 param1 collation: "C", param2 collation: "C"
(1 row)

postgres=#

My observation is the user, current* functions collation is "C" collation and when a "C" collation is passed as a parameter, it converts all the parameters to use the "C" collation.

I was looking at sql_fn_make_param() function and the comments over their say
"/*
* If we have a function input collation, allow it to override the
* type-derived collation for parameter symbols.  (XXX perhaps this should
* not happen if the type collation is not default?)
*/
"
From the above code comment it looks like the "C" collation is overriding the type-driven collation, and this might be causing the issue.

I am currently using gdb to debug more and will work on a patch based on my findings.

Regards,
Surya Poondla

Re: Possibly a bug

От
Tom Lane
Дата:
surya poondla <suryapoondla4@gmail.com> writes:
> My observation is the user, current* functions collation is "C" collation
> and when a "C" collation is passed as a parameter, it converts all the
> parameters to use the "C" collation.

Yeah, Jian diagnosed this correctly: the "user" pseudo-function has
output type "name" and therefore inherits the "C" collation attached
to that type, which then propagates through the cast to text.  Then
when we come to resolve the collation for to_upper_first() to use,
we have one input that is unknown and hence has no collation to offer,
and one that has "C" collation, so "C" collation wins.  You'd need an
explicit COLLATE clause to override that.

> I am currently using gdb to debug more and will work on a patch based on my
> findings.

AFAIK all this behavior is mandated by the SQL standard.  Well, the
fact that the non-standard type "name" is given "C" collation isn't
to be found in the spec, but the rules for deducing the collation of
an expression definitely are.  If you want to propose a patch about
this, it had better come with a well-reasoned argument why our current
behavior violates the spec and the patch fixes that.

            regards, tom lane