Re: OCTET_LENGTH is wrong

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: OCTET_LENGTH is wrong
Дата
Msg-id 3BF96E43.20907@tm.ee
обсуждение исходный текст
Ответ на Re: OCTET_LENGTH is wrong  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers

Tom Lane wrote:

>Barry Lind <barry@xythos.com> writes:
>
>>While the text datatypes have additional issues with encodings, that is 
>>not true for the bytea type.  I think it does make sense that a client 
>>be able to get the size in bytes that the bytea type value will return 
>>to the client.
>>
>
>bytea does that already.  It's only text that has (or had, till a few
>minutes ago) the funny behavior.
>
>I'm not set on the notion that octet_length should return on-disk size;
>that's clearly not what's contemplated by SQL92, so I'm happy to agree
>that if we want that we should add a new function to get it.
>("storage_length", maybe.)  What's bothering me right now is the
>difference between client and server encodings.  It seems that the only
>plausible use for octet_length is to do memory allocation on the client
>side,
>
Allocating memory seems for me to be drivers (libpq, JDBC, ODBC,...) 
problem and
not something to be done by client code beforehand - at least for libpq 
(AFAIK) we
don't have any means of giving it a pre-allocated storage area for one 
field.

There is enough information in wire protocol for allocating right-sized 
chunks at the
time query result is read. An additional call of "SELECT 
OCTET_LENGTH(someCol)"
seems orders of magnitude slower than doing it at the right time in the 
driver .

>and for that purpose the length ought to be measured in the client
>encoding.  People seem to be happy with letting octet_length take the
>easy way out (measure in the server encoding), and I'm trying to get
>someone to explain to me why that's the right behavior.  I don't see it.
>
perhaps we need another function "OCTET_LENGTH(someCol, encoding)" for
getting what we want and also client_encoding() and server_encoding() 
for supplying
it some universal defaults ?

OTOH, from reading on Unicode I've came to a conlusion that there are 
often several
ways for expressing the same string in Unicode, so for server encoding 
not unicode and
client requesting unicode (say UTF-8) there can be several different 
ways to express
the same string. Thus there is no absolute OCTET_LENGTH for 
client_encoding for
all cases. Thus giving the actual uncompressed length seems most reasonable.

For unicode both in backend and frontend we could also make OCTET_LENGTH
return not int but an integer-interval of shortest and longest possible 
encoding ;)

------------------
Hannu






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: bug or change in functionality in 7.2?
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: Further open item (Was: Status of 7.2)