Re: char(n) to varchar or text conversion should strip trailing spaces
От | Tom Lane |
---|---|
Тема | Re: char(n) to varchar or text conversion should strip trailing spaces |
Дата | |
Msg-id | 839.1037644958@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: char(n) to varchar or text conversion should strip trailing spaces (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
I said: > "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: >> One alternate possible approach would maybe be to change the on-disk >> representation to really be binary compatible and change the input >> output and operator functions ? > Seems like a great idea to me. On further thought I've got some reservations about it. The main problem is that the physical contents of a char(n) Datum wouldn't provide the complete semantic meaning: you must know the typmod as well to know the number of padding spaces that are supposed to be there. We have special provisions to allow input and output functions to know the typmod, but in more general cases functions will not receive a typmod, and in any case they cannot deliver a typmod. So for any operation that wanted to behave as though the padding spaces are real, there'd be a problem. We could maybe play some games with having char(n) values be expanded (space-padded) during expression evaluation and then trimmed for storage, but this strikes me as awfully messy, not to mention redundant with the compression done by TOAST. Also: I've been reading through the spec in more detail, and what I now realize is that they expect trailing blanks to be ignored by default in both char(n) and varchar(n) comparisons! In fact, it's not really the datatype that determines this, but the collation attribute, and they specify that the default collation must have the PAD SPACE attribute (which essentially means that trailing spaces are not significant). What's more, AFAICT padding spaces in char(n) are treated as real data by every operation except comparison --- for example, they are real data in concatenation. I don't think we really want to meet the letter of the spec here :-( It seems quite schizoid to treat pad spaces as real data for everything except comparison. Certainly I do not want to do that for varchar or text datatypes. The idea of having char(n)-to-text conversion strip trailing blanks still appeals to me, but I have to withdraw the claim that it'd improve our spec compliance; it wouldn't. I'm now wondering whether it wouldn't be better to leave the data representation as-is (padding spaces are stored), and still allow binary compatibility both ways, but add a collection of duplicate pg_proc and pg_operator entries so that char-ness is preserved where appropriate. For example, we'd need both these pg_proc entries for UPPER():upper(text) returns textupper(character) returns character They could point at the same C routine, but the parser would select the first when the input is text or varchar, and the second when the input is character. This would solve the original complaint aboutupper('foo '::char(6)) = 'FOO' needing to yield TRUE. The extra pg_proc entries would be a tad tedious, but I think we'd only need a couple dozen to satisfy the spec's requirements. Some (perhaps not all) of these functions would need to be duplicated: btrim(text) btrim(text,text) convert(text,name) convert(text,name,name) convert_using(text,text) initcap(text) lower(text) lpad(text,integer) lpad(text,integer,text) ltrim(text) ltrim(text,text) max(text) min(text) overlay(text,text,integer) overlay(text,text,integer,integer) repeat(text,integer) replace(text,text,text) rpad(text,integer) rpad(text,integer,text) rtrim(text) rtrim(text,text) split_part(text,text,integer) substr(text,integer) substr(text,integer,integer) substring(text,integer) substring(text,integer,integer) text_larger(text,text) text_smaller(text,text) textcat(text,text) translate(text,text,text) upper(text) regards, tom lane
В списке pgsql-hackers по дате отправления: