Обсуждение: Regarding bytea column in Posgresql

Поиск
Список
Период
Сортировка

Regarding bytea column in Posgresql

От
"Deole, Pushkar (Pushkar)"
Дата:

Hi,

 

I have been assigned to a product that uses Postgresql 9.3 as backend database. I am new to postgresql.

The product provides chat functionality between the uses and the completed chats are stored in the database table in a ‘bytea’ column in the form of xml. When I query the data from this column I see xml file with text data. I have couple of queries:

1.       Is ‘bytea’ column intended for storing text data?

2.       Typically a chat can have text data with several special characters (which can be represented in multi bytes), how these characters can be stored in a bytea column and retrieved back properly?

 

Thanks,

Pushkar

Re: Regarding bytea column in Posgresql

От
Bill Moran
Дата:
On Thu, 9 Apr 2015 11:03:30 +0000
"Deole, Pushkar (Pushkar)" <pdeole@avaya.com> wrote:
>
> I have been assigned to a product that uses Postgresql 9.3 as backend database. I am new to postgresql.
> The product provides chat functionality between the uses and the completed chats are stored in the database table in
a'bytea' column in the form of xml. When I query the data from this column I see xml file with text data. I have couple
ofqueries: 
>
> 1.       Is 'bytea' column intended for storing text data?

No, it's intended for storing binary data.

> 2.       Typically a chat can have text data with several special characters (which can be represented in multi
bytes),how these characters can be stored in a bytea column and retrieved back properly? 

bytea won't help you here. You'll have to manage the special
characters entirely in your code. bytea gives you back the
exact same types you put in, with no changes or interpretation.

A better choice would be to use a text field with a proper
text encoding (such as utf-8).

Probably an even better choice would be to use the XML datatype
in PostgreSQL, since you say that you're storing XML anyway.

The place where people tend to get tripped up with TEXT and
XML datatypes is that they're strict. If you try to store
text in a TEXT data type that isn't valid (i.e., multi-byte
characters that aren't correct) you'll get an error and the
data won't be accepted. The same thing happens if you try to
store invalid XML in an XML field (such as XML without proper
closing tags, etc). It seems that this strictness causes a lot
of people to avoid those data types, as there seem to be a lot
of people who would rather have garbage data in their database
than actually go to the work of fixing their application.

--
Bill Moran


Re: Regarding bytea column in Posgresql

От
John R Pierce
Дата:
On 4/9/2015 4:10 AM, Bill Moran wrote:
>> 1.       Is 'bytea' column intended for storing text data?
> No, it's intended for storing binary data.
>
>> >2.       Typically a chat can have text data with several special characters (which can be represented in multi
bytes),how these characters can be stored in a bytea column and retrieved back properly? 
> bytea won't help you here. You'll have to manage the special
> characters entirely in your code. bytea gives you back the
> exact same types you put in, with no changes or interpretation.
>
> A better choice would be to use a text field with a proper
> text encoding (such as utf-8).

one possible rationale for using BYTEA is that the data could be in
various encodings, which the application wishes to preserve, and keeps
track of somewhere else (perhaps in a field within the XML?).
PostgreSQL text types would insist that all text be stored in the same
encoding, and anything in a different encoding would have to be
converted to the database encoding.   Me, I'd be inclined to convert
everything to UTF8 and store it as such, and convert it back to the
user's encoding on display, but the feasibility of this really depends
on the use cases.

--
john r pierce, recycling bits in santa cruz



Re: Regarding bytea column in Posgresql

От
Craig Ringer
Дата:


On 10 April 2015 at 03:27, John R Pierce <pierce@hogranch.com> wrote:
 
one possible rationale for using BYTEA is that the data could be in various encodings, which the application wishes to preserve, and keeps track of somewhere else (perhaps in a field within the XML?).

Thanks for bringing this up, as it's a good reason to use bytea for XML.

XML actually has an encoding field in the DTD declaration, e.g.

    <?xml version="1.0" encoding="UTF-8"?>

It is common - and of dubious correctness - for applications to store XML in a 'text' or 'xml' field without changing the 'encoding' field in the doctype to reflect the encoding at rest.

Personally I wish the 'xml' type in Pg knew how to change the encoding declaration dynamically, but I know it's a hairy problem; e.g. if the client_encoding is iso-8859-1, but the client then converts the XML document to utf-8 internally, the encoding will be wrong if the client doesn't change it back.

I've also run into XML documents that shove data in different encodings into CDATA sections. This is wrong, of course, but apps sometimes do it anyway.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Regarding bytea column in Posgresql

От
"Deole, Pushkar (Pushkar)"
Дата:
John,

Please find my response inline.

Thanks..

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, April 10, 2015 12:57 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Regarding bytea column in Posgresql

On 4/9/2015 4:10 AM, Bill Moran wrote:
>> 1.       Is 'bytea' column intended for storing text data?
> No, it's intended for storing binary data.
>
>> >2.       Typically a chat can have text data with several special characters (which can be represented in multi
bytes),how these characters can be stored in a bytea column and retrieved back properly?
 
> bytea won't help you here. You'll have to manage the special 
> characters entirely in your code. bytea gives you back the exact same 
> types you put in, with no changes or interpretation.
>
> A better choice would be to use a text field with a proper text 
> encoding (such as utf-8).

one possible rationale for using BYTEA is that the data could be in various encodings, which the application wishes to
preserve,and keeps 
 
track of somewhere else (perhaps in a field within the XML?).      
PostgreSQL text types would insist that all text be stored in the same encoding, and anything in a different encoding
wouldhave to be 
 
converted to the database encoding.   Me, I'd be inclined to convert 
everything to UTF8 and store it as such, and convert it back to the user's encoding on display, but the feasibility of
thisreally depends on the use cases.
 

-- 
john r pierce, recycling bits in santa cruz

--> I confirmed with the developers who had originally developed this product and as per them, the xml is in UTF-8
encoding.So, ideally it would have been better to have a text or XML column type.
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.postgresql.org_mailpref_pgsql-2Dgeneral&d=AwICaQ&c=BFpWQw8bsuKpl1SgiZH64Q&r=9JxdEvGq7qy3NnL3UxN47w&m=c2bN1y5Oj6SbZjRfape8xKIQXIPlu4MRCwHN76YWi1I&s=BTWMlID5cymS2KBU5tS__D7oiNHtLnjEsuiZ8TswAvA&e=