Обсуждение: unicode questions

Поиск
Список
Период
Сортировка

unicode questions

От
- -
Дата:
Dear PG hackers,

I have two question regarding Unicode support in PG:

1) If I set my database and connection encoding to UTF-8, does pg (and
future versions of it) guarantee that unicode code points are stored
unmodified? or could it be that pg does some unicode
normalization/manipulation with them before storing a string, or when
retrieving a string?

The reason why I'm asking is, I've built a little program that reads
in and stores text and explicilty analyzes the text at a later point
in time, also regarding things like if the text is in NFC, NFD or
neither. and since I want to store them in the database, it is very
imporant for PG not to fiddle around with the normalization unless my
program explicitly told PG to do that.

2) How far is normalization support in PG? When I checked a long time
ago, there was no such support. Now that the SQL standard mandates a
NORMALIZE function that may have changed. Any updates?


Re: unicode questions

От
Andrew Dunstan
Дата:

- - wrote:
> Dear PG hackers,
>
> I have two question regarding Unicode support in PG:
>
> 1) If I set my database and connection encoding to UTF-8, does pg (and
> future versions of it) guarantee that unicode code points are stored
> unmodified? or could it be that pg does some unicode
> normalization/manipulation with them before storing a string, or when
> retrieving a string?
>
> The reason why I'm asking is, I've built a little program that reads
> in and stores text and explicilty analyzes the text at a later point
> in time, also regarding things like if the text is in NFC, NFD or
> neither. and since I want to store them in the database, it is very
> imporant for PG not to fiddle around with the normalization unless my
> program explicitly told PG to do that.
>
> 2) How far is normalization support in PG? When I checked a long time
> ago, there was no such support. Now that the SQL standard mandates a
> NORMALIZE function that may have changed. Any updates?
>   

We don't do any normalization. If the client gives us UTF8 then we store 
exactly what it gives us, and return exactly that.

(This question is not really a -hackers question. The correct forum is 
pgsql-general. Please make sure you use the correct forum in future.)

cheers

andrew


Re: unicode questions

От
Andrew Dunstan
Дата:

>>> 2) How far is normalization support in PG? When I checked a long time
>>> ago, there was no such support. Now that the SQL standard mandates a
>>> NORMALIZE function that may have changed. Any updates?
>>>
>>>       

Creating such a function shouldn't be terribly hard AIUI, if someone 
wants to submit a patch. It was raised about three months ago but nobody 
actually volunteered unless I missed that.

cheers

andrew


Re: unicode questions

От
"David E. Wheeler"
Дата:
On Dec 24, 2009, at 4:14 PM, Andrew Dunstan wrote:

>>>> 2) How far is normalization support in PG? When I checked a long time
>>>> ago, there was no such support. Now that the SQL standard mandates a
>>>> NORMALIZE function that may have changed. Any updates?
>
> Creating such a function shouldn't be terribly hard AIUI, if someone wants to submit a patch. It was raised about
threemonths ago but nobody actually volunteered unless I missed that. 

I wrote a similar function in PL/Perl:
 http://justatheory.com/computers/databases/postgresql/unicode-normalization.html

Best,

David

Re: unicode questions

От
- -
Дата:
On Thu, Dec 24, 2009 at 5:40 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> 1) If I set my database and connection encoding to UTF-8, does pg (and
>> future versions of it) guarantee that unicode code points are stored
>> unmodified? or could it be that pg does some unicode
>> normalization/manipulation with them before storing a string, or when
>> retrieving a string?
>>
>> The reason why I'm asking is, I've built a little program that reads
>> in and stores text and explicilty analyzes the text at a later point
>> in time, also regarding things like if the text is in NFC, NFD or
>> neither. and since I want to store them in the database, it is very
>> imporant for PG not to fiddle around with the normalization unless my
>> program explicitly told PG to do that.
>
> We don't do any normalization. If the client gives us UTF8 then we store
> exactly what it gives us, and return exactly that.

OK.

>
> (This question is not really a -hackers question. The correct forum is
> pgsql-general. Please make sure you use the correct forum in future.)

Are you sure? The description for -hackers says: "Discussion of
current development issues, problems and bugs, and proposed new
features.", which seems to be exactly where you'd ask my 2nd question,
which is still unanswered.

>>
>> 2) How far is normalization support in PG? When I checked a long time
>> ago, there was no such support. Now that the SQL standard mandates a
>> NORMALIZE function that may have changed. Any updates?
>>

Kind regards.