Re: OCTET_LENGTH is wrong
От | Bruce Momjian |
---|---|
Тема | Re: OCTET_LENGTH is wrong |
Дата | |
Msg-id | 200111182123.fAILNGW07403@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: OCTET_LENGTH is wrong (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: OCTET_LENGTH is wrong
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Sun, 18 Nov 2001, Tom Lane wrote: > >> I presume that where you want to come out is OCTET_LENGTH = uncompressed > >> length in the server's encoding ... but so far no one has really made > >> a convincing argument why that answer is better or more spec-compliant > >> than any other answer. In particular, it's not obvious to me why > >> "number of bytes we're actually using on disk" is wrong. > > > I'm not sure, but if we say that the on disk representation is the > > value of the character value expression whose size is being checked, > > wouldn't that be inconsistent with the other uses of the character value > > Yeah, it would be and is. In fact, the present code has some > interesting behaviors: if foo.x is a text value long enough to be > toasted, then you get different results from > > SELECT OCTET_LENGTH(x) FROM foo; > > SELECT OCTET_LENGTH(x || '') FROM foo; > > since the result of the concatenation expression won't be compressed. > > I'm not actually here to defend the existing code; in fact I believe the > XXX comment on textoctetlen questioning its correctness is mine. What > I am trying to point out is that the spec is so vague that it's not > clear what the correct answer is. Well, if the standard is unclear, we should assume to return the most reasonable answer, which has to be non-compressed length. In multibyte encodings, when we started returning length() in _characters_ instead of bytes, I assumed the major use for octet_length was to return the number of bytes needed to hold the value on the client side. In single byte encodings, octet_length is the same as length() so returning a compressed length may make sense, but I don't think we want different meanings for the function for single and multi-byte encodings. I guess the issue is that for single-byte encodings, octet_length is pretty useless because it is the same as length, but for multi-byte encodings, octet_length is invaluable and almost has to return non-compress bytes because uncompressed is that the client sees. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: