Обсуждение: Re: [GENERAL] Language support of postgresql

Поиск
Список
Период
Сортировка

Re: [GENERAL] Language support of postgresql

От
"Martel, Hong"
Дата:

Hi,

 

As I understand, currently Postgres doesn’t support Chinese encoding GBK and BIG5 on both server and client side, only UNICODE.  Is it true?  Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime soon?

 

Are there any alternative solutions for this besides switching our database to Oracle or others that support the encodings?  One of our customers insists that we need to support all three encoding (BIG5, GB2312 and UNICODE).  We would love to stick to Postgres if there is any alternative way to solve the problem without incurring big cost.

 

Thank you very much for your time and attention.

 

Sincerely,

Hong Martel

Software Developer

 


This message is intended only for the addressee and may contain information that is company confidential or privileged. Any technical data in this message may be exported only in accordance with the U.S. International Traffic in Arms Regulations (22 CFR Parts 120-130) or the Export Administration Regulations (15 CFR Parts 730-774). Unauthorized use is strictly prohibited and may be unlawful. If you are not the intended recipient, or the person responsible for delivering to the intended recipient, you should not read, copy, disclose or otherwise use this message. If you have received this email in error, please delete it, and advise the sender immediately.

Re: [GENERAL] Language support of postgresql

От
John R Pierce
Дата:
On 4/28/2017 7:45 AM, Martel, Hong wrote:

As I understand, currently Postgres doesn’t support Chinese encoding GBK and BIG5 on both server and client side, only UNICODE.  Is it true?  Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime soon?

 

Are there any alternative solutions for this besides switching our database to Oracle or others that support the encodings?  One of our customers insists that we need to support all three encoding (BIG5, GB2312 and UNICODE).  We would love to stick to Postgres if there is any alternative way to solve the problem without incurring big cost.

 


I thought Postgres supported client_encodings of BIG5, GB18030, and GBK, all of which can be stored in the server using either UTF8 or MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ?



-- 
john r pierce, recycling bits in santa cruz

Re: [GENERAL] Language support of postgresql

От
Tom Lane
Дата:
"Martel, Hong" <Hong.Martel@saabsensis.com> writes:
> As I understand, currently Postgres doesn$B!G(Bt support Chinese encoding GBK and BIG5 on both server and client
side,only UNICODE.  Is it true?  Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime
soon?

Yes, and no.  There's basically zero chance that we'll ever allow these
ecodings as server-side encodings, because they aren't strict ASCII
supersets (that is, not all bytes of a multibyte character are
individually distinguishable from an ASCII character).  The amount of
work involved, and the ongoing hazard of security bugs that would ensue,
is just prohibitive.

We do however support them as client-side encodings with automatic
translation to and from Unicode on the server.

            regards, tom lane


Re: [GENERAL] Language support of postgresql

От
Tom Lane
Дата:
John R Pierce <pierce@hogranch.com> writes:
> I thought Postgres supported client_encodings of BIG5, GB18030, and GBK,
> all of which can be stored in the server using either UTF8 or
> MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ?

Hm, there's MULE<=>BIG5 converters but I don't see any for GBK or
GB18030.  Also, it looks like the MULE<=>BIG5 converters do some
re-encoding, so it's not clear to me whether they're lossless,
which I assume is the concern driving this request.

Still, you're right, there's more than one way to skin this cat.
Somebody could write an encoding converter that translates one
of these ASCII-unsafe representations into an ASCII-safe format
to be used internally in the backend, and then the reverse on
the way out.

            regards, tom lane


Re: [GENERAL] Language support of postgresql

От
John R Pierce
Дата:
On 5/2/2017 11:41 AM, Tom Lane wrote:
John R Pierce <pierce@hogranch.com> writes:
I thought Postgres supported client_encodings of BIG5, GB18030, and GBK, 
all of which can be stored in the server using either UTF8 or 
MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ?
Hm, there's MULE<=>BIG5 converters but I don't see any for GBK or
GB18030.  Also, it looks like the MULE<=>BIG5 converters do some
re-encoding, so it's not clear to me whether they're lossless,
which I assume is the concern driving this request.

I based my statement on misreading the tables on here, https://www.postgresql.org/docs/current/static/multibyte.html  but, now I see, MULE only supports big5 and EUC_CN.    

My limited readings earlier about BIG5 suggested its a mess of conflicting extensions, E-TEN and others, and the GB* stuff wasn't much better.

Anyways, it seems to me like UTF8 is the correct server encoding for most all uses.

-- 
john r pierce, recycling bits in santa cruz