Обсуждение: Regular expression for lower case to upper case.

Поиск
Список
Период
Сортировка

Regular expression for lower case to upper case.

От
Eagna
Дата:


Hi all,


I want a regex to change the case of a field from UPPER to lower.

I know about the UPPER() and LOWER() functions and they are not what I want.

I would have thought this should be very simple, but I've searched a lot and can't seem to get an answer.

Here's a fiddle with a couple of things that I've tried - I'm obviously missing something that should be starting me in the face.

Any input appreciated.


E.



Re: Regular expression for lower case to upper case.

От
"Peter J. Holzer"
Дата:
On 2022-12-10 09:40:22 +0000, Eagna wrote:
> I want a regex to change the case of a field from UPPER to lower.

First a note about terminology: A regular expression matches a string,
it doesn't replace anything.

The regexp_replace function uses a regular expression to match parts of
a string and then uses to replacement string to replace them - but the
replacement string is not itself a regular expresssion.

Tha said, the replacement string in some editors (like Vim) and some
programming languages (like Perl) provides syntax for changing case
(both vi(m) and Perl use \u and \U...\E for uppercasing).

https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP
doesn't mention any special escapes except \1 through \9 and \&. So it
is extremely likely that no such escapes exist.


> I know about the UPPER() and LOWER() functions and they are not what I want.

Can you elaborate why you can't use those?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Regular expression for lower case to upper case.

От
"David G. Johnston"
Дата:
On Saturday, December 10, 2022, Eagna <eagna@protonmail.com> wrote:


Hi all,


I want a regex to change the case of a field from UPPER to lower.

I know about the UPPER() and LOWER() functions and they are not what I want.

I would have thought this should be very simple, but I've searched a lot and can't seem to get an answer.

Here's a fiddle with a couple of things that I've tried - I'm obviously missing something that should be starting me in the face.

Any input appreciated.


E.



RegExp by itself cannot do this.  You have to match all parts of the input into different capturing groups, then use lower() combined with format() to build a new string.  Putting the capturing groups into an array is the most useful option.

David J.
 

Re: Regular expression for lower case to upper case.

От
Eagna
Дата:
Hi, and thanks for your input,

> Tha said, the replacement string in some editors (like Vim) and some
> programming languages (like Perl) provides syntax for changing case
> (both vi(m) and Perl use \u and \U...\E for uppercasing).


This is probably why I was so frustrated - I thought that there should be an equivalent in PostgreSQL.

I was going mad trying all of these from my searches.

> > I know about the UPPER() and LOWER() functions and they are not what I want.

> Can you elaborate why you can't use those?


I want to index on a REGEXP_REPLACE() - I thought using lower -> upper would be a good test.

I could always have used another REGEXP_REPLACE() for my testing, but I then became "obsessed"  with the idea of using
REGEXP_REPLACE()as a substitute for UPPER() - kind of an obfuscated code competition with myself! :-) 

Again, thanks for your input.

E.

> hp





Re: Regular expression for lower case to upper case.

От
Eagna
Дата:
Hi, and thanks for your input.


> RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use
lower()combined with format() to build a new string. Putting the capturing groups into an array is the most useful
option.


OK - I *_kind_* of see what you're saying.

There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you
propose.

Thanks again,

E.

> David J.





Re: Regular expression for lower case to upper case.

От
"Peter J. Holzer"
Дата:
On 2022-12-10 11:00:48 +0000, Eagna wrote:
> > RegExp by itself cannot do this. You have to match all parts of the
> > input into different capturing groups, then use lower() combined
> > with format() to build a new string. Putting the capturing groups
> > into an array is the most useful option.
>
> OK - I *_kind_* of see what you're saying.
>
> There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd
> care to give an outline of the solution that you propose.

For example like this:

INSERT INTO test VALUES
('abc_def_ghi');

Let's say I want to uppercase the part between the two underscores.

First use regexp_replace to split the string into three parts: One
before the match, the match and one after the match:

SELECT
  regexp_replace(x, '(.*_)(.*)(_.*)', '\1'),
  regexp_replace(x, '(.*_)(.*)(_.*)', '\2'),
  regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

╔════════════════╤════════════════╤════════════════╗
║ regexp_replace │ regexp_replace │ regexp_replace ║
╟────────────────┼────────────────┼────────────────╢
║ abc_           │ def            │ _ghi           ║
╚════════════════╧════════════════╧════════════════╝
(1 row)


Once that works, uppercase the part you want and concatenate everything
together again:

SELECT
  regexp_replace(x, '(.*_)(.*)(_.*)', '\1') ||
  upper(regexp_replace(x, '(.*_)(.*)(_.*)', '\2')) ||
  regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

╔═════════════╗
║  ?column?   ║
╟─────────────╢
║ abc_DEF_ghi ║
╚═════════════╝
(1 row)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Regular expression for lower case to upper case.

От
"David G. Johnston"
Дата:
On Sat, Dec 10, 2022 at 6:32 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-12-10 11:00:48 +0000, Eagna wrote:
> > RegExp by itself cannot do this. You have to match all parts of the
> > input into different capturing groups, then use lower() combined
> > with format() to build a new string. Putting the capturing groups
> > into an array is the most useful option.
>
> OK - I *_kind_* of see what you're saying.
>
> There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd
> care to give an outline of the solution that you propose.

For example like this:

INSERT INTO test VALUES
('abc_def_ghi');

Let's say I want to uppercase the part between the two underscores.

First use regexp_replace to split the string into three parts: One
before the match, the match and one after the match:

SELECT
  regexp_replace(x, '(.*_)(.*)(_.*)', '\1'),
  regexp_replace(x, '(.*_)(.*)(_.*)', '\2'),
  regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;

A bit too inefficient for my taste.
I was describing the following:

with parts as materialized (
    select regexp_match(
        'abc_def_ghi',
        '^([^_]*_)([^_]*_)([^_]*)$') as part_array
)
select format(
    '%s%s%s',
    part_array[1],
    upper(part_array[2]),
    part_array[3])
from parts;

David J.

Re: Regular expression for lower case to upper case.

От
Alban Hertroys
Дата:
> On 10 Dec 2022, at 12:00, Eagna <eagna@protonmail.com> wrote:
>
>
> Hi, and thanks for your input.
>
>
>> RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use
lower()combined with format() to build a new string. Putting the capturing groups into an array is the most useful
option.
>
>
> OK - I *_kind_* of see what you're saying.
>
> There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you
propose.

If you put all the regexes and their replacements into a table[1], you could use an aggregate over them to combine all
thereplacements into the final string. It would need some aggregate like regex_replace_agg, which would probably be a
customaggregate. 

[1]: If you stick to ASCII, you could just calculate them and even omit storing them in a physical table.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.