Обсуждение: BUG #15069: group by after regexp_replace

Поиск
Список
Период
Сортировка

BUG #15069: group by after regexp_replace

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15069
Logged by:          Ilhwan Ko
Email address:      koglep@gmail.com
PostgreSQL version: 9.6.7
Operating system:   macOS 10.12
Description:

create table test_t (a varchar(10), b varchar (20), c int);
 
insert into test_t (a,b,c) values ('a', 'a', 1);
insert into test_t (a,b,c) values (' a', 'a', 2); 
insert into test_t (a,b,c) values ('    a', 'a', 4);
insert into test_t (a,b,c) values ('     a', 'a', 8);
insert into test_t (a,b,c) values ('b', 'b', 1);
insert into test_t (a,b,c) values (' b', 'b', 2); 
insert into test_t (a,b,c) values ('    b', 'b', 4);
insert into test_t (a,b,c) values ('     b', 'b', 8);
 
 
select keyword, sum(t.c) as cc
from
(
    select upper(regexp_replace(a, '\\s+', '')) as keyword,  c
    from test_t
) t
group by keyword;

select upper(regexp_replace(a, '\\s+', '')) as keyword,  sum(c)
from test_t
group by upper(regexp_replace(a, '\\s+', ''));
    
    

select keyword, sum(t.c) as cc
from
(
    select upper(regexp_replace(b, '\\s+', '')) as keyword,  c
    from test_t
) t
group by keyword;

select upper(regexp_replace(b, '\\s+', '')) as keyword,  sum(c)
from test_t
group by upper(regexp_replace(b, '\\s+', ''));
    

I expected to get the same results regarding to above  four queries.
However, they were different.


Re: BUG #15069: group by after regexp_replace

От
"David G. Johnston"
Дата:
On Thu, Feb 15, 2018 at 8:14 AM, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      15069
Logged by:          Ilhwan Ko
Email address:      koglep@gmail.com
PostgreSQL version: 9.6.7
Operating system:   macOS 10.12
Description:

select upper(regexp_replace(b, '\\s+', '')) as keyword,  sum(c)
from test_t
group by upper(regexp_replace(b, '\\s+', ''));

I expected to get the same results regarding to above  four queries.
However, they were different.


​Not a bug - you mis-understand string literal syntax and escaping.  What you are asking to replace is "a backslash followed by one or more "s"es.


If you want to leave the literal as-is you need to write:

E'\\s+'

The way I prefer is to keep the "E" ​omitted and write:

'\s+'

Without the "E" the backslash is not an escape character in a PostgreSQL literal and so the backslash in the regex doesn't need to be protected.  By protecting it you are actually protecting the backslash in front of the "s" thus causing it to become two separate "symbols", "\" and "s" - and the + then applies to the literal "s".

David J.


Re: BUG #15069: group by after regexp_replace

От
Andrew Gierth
Дата:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG>     select upper(regexp_replace(a, '\\s+', '')) as keyword

This regexp doesn't do what you seem to be expecting. With
standard_conforming_strings enabled, which is the default, \ is not a
special character to the string literal parser, so the regexp engine is
seeing a regexp of \\s+ which means "match the literal character \
followed by one or more lowercase "s" characters". Accordingly the
regexp never matches on your input.

To remove whitespace use '\s+' or E'\\s+' or '[[:space:]]+'

-- 
Andrew (irc:RhodiumToad)