Re: OCTET_LENGTH is wrong

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: OCTET_LENGTH is wrong
Дата
Msg-id 200111182123.fAILNGW07403@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: OCTET_LENGTH is wrong  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: OCTET_LENGTH is wrong  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > On Sun, 18 Nov 2001, Tom Lane wrote:
> >> I presume that where you want to come out is OCTET_LENGTH = uncompressed
> >> length in the server's encoding ... but so far no one has really made
> >> a convincing argument why that answer is better or more spec-compliant
> >> than any other answer.  In particular, it's not obvious to me why
> >> "number of bytes we're actually using on disk" is wrong.
> 
> > I'm not sure, but if we say that the on disk representation is the
> > value of the character value expression whose size is being checked,
> > wouldn't that be inconsistent with the other uses of the character value
> 
> Yeah, it would be and is.  In fact, the present code has some
> interesting behaviors: if foo.x is a text value long enough to be
> toasted, then you get different results from
> 
>     SELECT OCTET_LENGTH(x) FROM foo;
> 
>     SELECT OCTET_LENGTH(x || '') FROM foo;
> 
> since the result of the concatenation expression won't be compressed.
> 
> I'm not actually here to defend the existing code; in fact I believe the
> XXX comment on textoctetlen questioning its correctness is mine.  What
> I am trying to point out is that the spec is so vague that it's not
> clear what the correct answer is.

Well, if the standard is unclear, we should assume to return the most
reasonable answer, which has to be non-compressed length.  

In multibyte encodings, when we started returning length() in
_characters_ instead of bytes, I assumed the major use for octet_length
was to return the number of bytes needed to hold the value on the client
side.

In single byte encodings, octet_length is the same as length() so
returning a compressed length may make sense, but I don't think we want
different meanings for the function for single and multi-byte encodings.

I guess the issue is that for single-byte encodings, octet_length is
pretty useless because it is the same as length, but for multi-byte
encodings, octet_length is invaluable and almost has to return
non-compress bytes because uncompressed is that the client sees.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: full outer join bug?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: OCTET_LENGTH is wrong