Обсуждение: New database: SQL_ASCII vs UTF-8 trade-offs

Поиск
Список
Период
Сортировка

New database: SQL_ASCII vs UTF-8 trade-offs

От
ow
Дата:
"PostgreSQL 8.1.0 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.3 20051111
(prerelease) (Debian 4.0.2-4)"

Hi,

Am having some doubts whether a new db should be with SQL_ASCII or UTF-8
encoding. We expect ALL of our data to be ASCII. At the same time, I guess,
it's possible that some user may decide to get creative and enter, for example,
his own name with non-ASCII chars.

So, it seems that UTF-8 would be a better choise even if we plan to store only
ASCII data (a lot of ASCII data though).

Are there any negative effects related to the selection of UTF-8 over SQL_ASCII
(e.g. size of the database, sort/like/group issues, etc)?

Thanks in advance

---------------------------------------------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: New database: SQL_ASCII vs UTF-8 trade-offs

От
Peter Eisentraut
Дата:
Am Dienstag, 7. März 2006 15:08 schrieb ow:
> Are there any negative effects related to the selection of UTF-8 over
> SQL_ASCII (e.g. size of the database, sort/like/group issues, etc)?

If you're only planning to store ASCII data, choosing UTF-8 will not cause any
additional problems.  But obviously you're more future-proof that way.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: New database: SQL_ASCII vs UTF-8 trade-offs

От
Tom Lane
Дата:
ow <oneway_111@yahoo.com> writes:
> Are there any negative effects related to the selection of UTF-8 over SQL_ASCII

There will be a speed penalty; whether it's significant in your
application is something you can only determine by experiment.

            regards, tom lane

Re: New database: SQL_ASCII vs UTF-8 trade-offs

От
ow
Дата:

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> ow <oneway_111@yahoo.com> writes:
> > Are there any negative effects related to the selection of UTF-8 over
> SQL_ASCII
>
> There will be a speed penalty; whether it's significant in your
> application is something you can only determine by experiment.

I see... If *ALL* data is in ASCII, is it possible to just update
"pg_database.encoding" to UTF-8 or will I need to recreate the db?

Thanks







__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: New database: SQL_ASCII vs UTF-8 trade-offs

От
Tom Lane
Дата:
ow <oneway_111@yahoo.com> writes:
> I see... If *ALL* data is in ASCII, is it possible to just update
> "pg_database.encoding" to UTF-8 or will I need to recreate the db?

It seems risky, but you could probably get away with that as long
as the database locale (LC_COLLATE/LC_CTYPE) is "C" ... which is really
the only one that's safe with SQL_ASCII anyway ... note that
already-started backends will probably fail to notice such a change.

            regards, tom lane

Re: New database: SQL_ASCII vs UTF-8 trade-offs

От
ow
Дата:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It seems risky, but you could probably get away with that as long
> as the database locale (LC_COLLATE/LC_CTYPE) is "C" ... which is really
> the only one that's safe with SQL_ASCII anyway ...

I actually created the cluster with:
test1:~# /usr/lib/postgresql/8.1/bin/initdb --pwprompt -D
/var/lib/postgresql/8.1/main/ --lc-collate=POSIX


test1:~# locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Not sure if it's going to make a difference. Thanks








__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com