Re: OCTET_LENGTH is wrong

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: OCTET_LENGTH is wrong
Дата
Msg-id 3BF75391.9090209@sid.tm.ee
обсуждение исходный текст
Ответ на OCTET_LENGTH is wrong  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers

Tom Lane wrote:

>Peter Eisentraut <peter_e@gmx.net> writes:
>
>>... Moreover, it eliminates the standard useful behaviour of
>>OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>>
>
>While I don't necessarily dispute this, I do kinda wonder where you
>derive the statement.  AFAICS, SQL92 defines OCTET_LENGTH in terms
>of BIT_LENGTH:
>
>6.6 General Rule 5:
>
>            a) Let S be the <string value expression>. If the value of S is
>              not the null value, then the result is the smallest integer
>              not less than the quotient of the division (BIT_LENGTH(S)/8).
>            b) Otherwise, the result is the null value.
>
>and BIT_LENGTH is defined in the next GR:
>
>            a) Let S be the <string value expression>. If the value of S is
>              not the null value, then the result is the number of bits in
>              the value of S.
>            b) Otherwise, the result is the null value.
>
>While SQL92 is pretty clear about <bit string>, I'm damned if I can see
>anywhere that they define how many bits are in a character string value
>So who's to say what representation is to be used to count the bits?
>If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
>shouldn't a compressed representation be reasonable too?
>
One objection I have to this, is the fact that nobody uses the compressed
representation in client libraries whrereas they do use both UTF-16 and 
UTF-8.
At least UTF-8 is available as client encoding.

And probably it is possible that the length of the "possibly compressed" 
representation
can change without the underlying data changing (for example when you 
set a bit
somewhere that disables compression and UPDATE some other field in the 
tuple)
making the result of OCTET_LENGTH dependent on other things than the 
argument
string.

I also like the propery of _uncompressed_ OCTET_LENGTH that
OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s)
which is almost never true for compressed length

----------------
Hannu




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: Multilingual application, ORDER BY w/ different
Следующее
От: Tom Lane
Дата:
Сообщение: Re: OCTET_LENGTH is wrong