Обсуждение: Re: [GENERAL] Language support of postgresql
Hi,
As I understand, currently Postgres doesn’t support Chinese encoding GBK and BIG5 on both server and client side, only UNICODE. Is it true? Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime soon?
Are there any alternative solutions for this besides switching our database to Oracle or others that support the encodings? One of our customers insists that we need to support all three encoding (BIG5, GB2312 安and UNICODE). We would love to stick to Postgres if there is any alternative way to solve the problem without incurring big cost.
Thank you very much for your time and attention.
Sincerely,
Hong Martel
Software Developer
This message is intended only for the addressee and may contain information that is company confidential or privileged. Any technical data in this message may be exported only in accordance with the U.S. International Traffic in Arms Regulations (22 CFR Parts 120-130) or the Export Administration Regulations (15 CFR Parts 730-774). Unauthorized use is strictly prohibited and may be unlawful. If you are not the intended recipient, or the person responsible for delivering to the intended recipient, you should not read, copy, disclose or otherwise use this message. If you have received this email in error, please delete it, and advise the sender immediately.
As I understand, currently Postgres doesn’t support Chinese encoding GBK and BIG5 on both server and client side, only UNICODE. Is it true? Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime soon?
Are there any alternative solutions for this besides switching our database to Oracle or others that support the encodings? One of our customers insists that we need to support all three encoding (BIG5, GB2312 安and UNICODE). We would love to stick to Postgres if there is any alternative way to solve the problem without incurring big cost.
I thought Postgres supported client_encodings of BIG5, GB18030, and GBK, all of which can be stored in the server using either UTF8 or MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ?
-- john r pierce, recycling bits in santa cruz
"Martel, Hong" <Hong.Martel@saabsensis.com> writes: > As I understand, currently Postgres doesn$B!G(Bt support Chinese encoding GBK and BIG5 on both server and client side,only UNICODE. Is it true? Are there any plans for postgresql team to implement GBK and BIG5 encoding anytime soon? Yes, and no. There's basically zero chance that we'll ever allow these ecodings as server-side encodings, because they aren't strict ASCII supersets (that is, not all bytes of a multibyte character are individually distinguishable from an ASCII character). The amount of work involved, and the ongoing hazard of security bugs that would ensue, is just prohibitive. We do however support them as client-side encodings with automatic translation to and from Unicode on the server. regards, tom lane
John R Pierce <pierce@hogranch.com> writes: > I thought Postgres supported client_encodings of BIG5, GB18030, and GBK, > all of which can be stored in the server using either UTF8 or > MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ? Hm, there's MULE<=>BIG5 converters but I don't see any for GBK or GB18030. Also, it looks like the MULE<=>BIG5 converters do some re-encoding, so it's not clear to me whether they're lossless, which I assume is the concern driving this request. Still, you're right, there's more than one way to skin this cat. Somebody could write an encoding converter that translates one of these ASCII-unsafe representations into an ASCII-safe format to be used internally in the backend, and then the reverse on the way out. regards, tom lane
John R Pierce <pierce@hogranch.com> writes:I thought Postgres supported client_encodings of BIG5, GB18030, and GBK, all of which can be stored in the server using either UTF8 or MULE_INTERNAL (MultiLingual EMACS) encodings for internal storage ?Hm, there's MULE<=>BIG5 converters but I don't see any for GBK or GB18030. Also, it looks like the MULE<=>BIG5 converters do some re-encoding, so it's not clear to me whether they're lossless, which I assume is the concern driving this request.
I based my statement on misreading the tables on here, https://www.postgresql.org/docs/current/static/multibyte.html but, now I see, MULE only supports big5 and EUC_CN.
My limited readings earlier about BIG5 suggested its a mess of conflicting extensions, E-TEN and others, and the GB* stuff wasn't much better.
Anyways, it seems to me like UTF8 is the correct server encoding for most all uses.
-- john r pierce, recycling bits in santa cruz