Обсуждение: can i use to_ascii function ?

Поиск
Список
Период
Сортировка

can i use to_ascii function ?

От
"j n"
Дата:
Hi

i need to get rid of acute, and other special symbols from letters. Is any way how to do it easy in postgres ?

i need conversion like this from slovak language ... :

é, ě -> e
á, ä -> a
š -> s

i tried :

SELECT to_ascii('ščďť')

i get an error : ERROR:  encoding conversion from UTF8 to ASCII not supported

that i found something like this

SELECT convert('ščďť', 'UTF8', 'LATIN1')

i get an error : ERROR:  character 0xc5a1 of encoding "UTF8" has no equivalent in "LATIN1"

Is any way how to do it ?
If there is not build in support for this conversion can i create my own ?

pls help

Re: can i use to_ascii function ?

От
Achilleas Mantzios
Дата:
Στις Παρασκευή 23 Φεβρουάριος 2007 14:12, ο/η j n έγραψε:
> Hi
>
> i need to get rid of acute, and other special symbols from letters. Is any
> way how to do it easy in postgres ?
>
> i need conversion like this from slovak language ... :
>
> é, ě -> e
> á, ä -> a
> š -> s
>
> i tried :
>
> SELECT to_ascii('ščďť')
>
> i get an error : ERROR:  encoding conversion from UTF8 to ASCII not
> supported
>
> that i found something like this
>
> SELECT convert('ščďť', 'UTF8', 'LATIN1')
>
> i get an error : ERROR:  character 0xc5a1 of encoding "UTF8" has no
> equivalent in "LATIN1"
>
> Is any way how to do it ?
> If there is not build in support for this conversion can i create my own ?

Trying to resort to iso-8859-2 (the western slavic script) is not doing you
any good, since you simply recode those chars in UTF-8 to their latin2
counterparts.

How about
translate('your slovak text','éěáäš','eeaas');

>
> pls help

--
Achilleas Mantzios

Re: can i use to_ascii function ?

От
Michael Fuhr
Дата:
On Fri, Feb 23, 2007 at 01:12:53PM +0100, j n wrote:
> i need to get rid of acute, and other special symbols from letters. Is any
> way how to do it easy in postgres ?

This has come up before; here's a recent response with suggestions:

http://archives.postgresql.org/pgsql-general/2007-01/msg00702.php

--
Michael Fuhr

Re: can i use to_ascii function ?

От
Michael Fuhr
Дата:
[Please reply to the mailing list so others can participate in and
 learn from the discussion.]

On Sat, Feb 24, 2007 at 10:41:51AM +0100, j n wrote:
> maybe one more suggestion ...
> this type
>
> to_ascii(convert(lastname, 'LATIN2'), 'LATIN2')
>
> was not working fine unless i use ./configure --with-perl option i don't
> know if this is real reason ..., but i have not change anything else ...

The above expression has nothing to do with Perl unless perhaps you
have like-named PL/Perl functions in your database and you've set
search_path to find them ahead of the ones in pg_catalog, which is
unlikely.  I suspect that the relevant difference between your test
environments is something else.

What exactly do you mean by "not working fine"?  What did you do,
what were you expecting to happen, and what did happen?  Can you
post a standalone test case?

What version of PostgreSQL are you running?  What platform?  What
are the server and client encodings?  What are the server's locale
settings?

--
Michael Fuhr

Re: can i use to_ascii function ?

От
Michael Fuhr
Дата:
[Once again, please copy the mailing list on replies so others can
 participate in and learn from the discussion.  Also, pgsql-general
 might be a more appropriate list than pgsql-admin.]

On Mon, Feb 26, 2007 at 10:53:48AM +0100, j n wrote:
> 1. At first i tried use to_ascii ... convert
>  it works well on some letter 'á' converted to 'a' 'é' to 'e' but some of
> them like č or š convert as empty string it means 'not working fine'

We can't explain why this doesn't work unless you show exactly what
you did.  Please post a set of SQL statements that somebody could
run in their own database to reproduce the problem you're seeing.

Did any of this data originate on Windows?  If so then it's possible
that some accented characters aren't represented by the proper
Unicode code points.  This can happen, for example, if you load
Windows-1250 data into the database with client_encoding set to
LATIN2.  Depending on how you're viewing the data the "wrong"
characters might still display correctly.  To give a specific
example, š is 0x9a in Windows-1250 but if you load this character
with client_encoding set to LATIN2 then the database converts it
to U+009A, a control character, instead of to U+0161 latin small
letter s with caron (háček).  An application that reads the control
character might render it as š assuming that that's what character
was meant, but functions that operate on the data won't work as
expected.  This wouldn't fully explain the problems you're seeing
but it's something I've seen cause similar problems.

> 2. than i tried to use perl func but i didn't have configured postgres to
> use perl so i have to :
>
> ./configure --enable-multibyte=UTF8 prefix=/usr/local/pgsqlProd
> --exec-prefix=/usr/local/pgsqlProd --with-perl

The --enable-multibyte option was removed in 7.3 so you don't need
it; at the end of the configure you should have seen a warning that
this option was ignored.  And did you mean --prefix instead of
prefix?  Also, there's no need to set --exec-prefix if it gets the
same value as --prefix.

> gmake
> gmake install
>
> 3. After this everythink have started to work fine also unaccent perl
> function ...

Configuring with --with-perl is necessary if you want to use server-
side Perl functions, but you didn't say anything about Perl functions
in your previous message.  You said that to_ascii() and convert()
didn't work unless you used --with-perl, which doesn't make sense
because those functions have nothing to do with Perl.  I tried it
both ways and got the same behavior so I'm still skeptical that
--with-perl is the relevant difference for to_ascii() and convert()
behavior.

--
Michael Fuhr