Tom Lane wrote:
>Peter Eisentraut <peter_e@gmx.net> writes:
>
>>... Moreover, it eliminates the standard useful behaviour of
>>OCTET_LENGTH, which is to show the length in bytes of a multibyte string.
>>
>
>While I don't necessarily dispute this, I do kinda wonder where you
>derive the statement. AFAICS, SQL92 defines OCTET_LENGTH in terms
>of BIT_LENGTH:
>
>6.6 General Rule 5:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the smallest integer
> not less than the quotient of the division (BIT_LENGTH(S)/8).
> b) Otherwise, the result is the null value.
>
>and BIT_LENGTH is defined in the next GR:
>
> a) Let S be the <string value expression>. If the value of S is
> not the null value, then the result is the number of bits in
> the value of S.
> b) Otherwise, the result is the null value.
>
>While SQL92 is pretty clear about <bit string>, I'm damned if I can see
>anywhere that they define how many bits are in a character string value
>So who's to say what representation is to be used to count the bits?
>If, say, UTF-16 and UTF-8 are equally reasonable choices, then why
>shouldn't a compressed representation be reasonable too?
>
One objection I have to this, is the fact that nobody uses the compressed
representation in client libraries whrereas they do use both UTF-16 and
UTF-8.
At least UTF-8 is available as client encoding.
And probably it is possible that the length of the "possibly compressed"
representation
can change without the underlying data changing (for example when you
set a bit
somewhere that disables compression and UPDATE some other field in the
tuple)
making the result of OCTET_LENGTH dependent on other things than the
argument
string.
I also like the propery of _uncompressed_ OCTET_LENGTH that
OCTET_LENGTH(s||s) == 2 * OCTET_LENGTH(s)
which is almost never true for compressed length
----------------
Hannu