Обсуждение: BUG #17258: Unexpected results in CHAR(1) data type
The following bug has been logged on the website: Bug reference: 17258 Logged by: David Calascibetta Email address: david@calascibetta.com PostgreSQL version: 12.8 Operating system: AWS RDS Description: My understanding is that a CHAR(1) column can never be 0 (zero) length. I expect the results of these two SELECT statements to produce the same result: select length(substr(' '::varchar,1,1)), ascii(substr(' '::varchar,1,1)); produces==> 1 32 (as expected) select length(substr(' '::char,1,1)), ascii(substr(' '::char,1,1)); produces==> 0 0 (not as expected)
The following bug has been logged on the website:
Bug reference: 17258
Logged by: David Calascibetta
Email address: david@calascibetta.com
PostgreSQL version: 12.8
Operating system: AWS RDS
Description:
My understanding is that a CHAR(1) column can never be 0 (zero) length.
> On Oct 29, 2021, at 11:14 AM, PG Bug reporting form <noreply@postgresql.org> wrote: > > select length(substr(' '::varchar,1,1)), ascii(substr(' > '::varchar,1,1)); > produces==> 1 32 (as expected) > > select length(substr(' '::char,1,1)), ascii(substr(' '::char,1,1)); > produces==> 0 0 (not as expected) There are four substr() functions defined in the standard distribution. They are: substr(text, int4, int4) substr(text, int4) substr(bytea, int4, int4) substr(bytea, int4) There is none defined directly for char(n). Since "text" is the preferred type (not "bytea"), I expect a cast of your ''::char to text before entry to the function. So the question is why casting ' '::char to text is different than ' '::text. The answer is that cast is implemented using rtrim1, which trims trailing space. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
-----Original Message----- From: David M. Calascibetta <david@calascibetta.com> Sent: Friday, October 29, 2021 2:04 PM To: 'Mark Dilger' <mark.dilger@enterprisedb.com> Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type Ok, but my example was just a simplified version of what is going on. The actual problem stems from a CHAR(1) column data type that is behaving the same way. I was just trying to create a super-simple example of the problem. It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented. -----Original Message----- From: Mark Dilger <mark.dilger@enterprisedb.com> Sent: Friday, October 29, 2021 1:58 PM To: david@calascibetta.com; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type > On Oct 29, 2021, at 11:14 AM, PG Bug reporting form <noreply@postgresql.org> wrote: > > select length(substr(' '::varchar,1,1)), ascii(substr(' > '::varchar,1,1)); > produces==> 1 32 (as expected) > > select length(substr(' '::char,1,1)), ascii(substr(' '::char,1,1)); > produces==> 0 0 (not as expected) There are four substr() functions defined in the standard distribution. They are: substr(text, int4, int4) substr(text, int4) substr(bytea, int4, int4) substr(bytea, int4) There is none defined directly for char(n). Since "text" is the preferred type (not "bytea"), I expect a cast of your ''::char to text before entry to the function. So the question is why casting ' '::char to text is different than ' '::text. The answer is that cast is implemented using rtrim1, which trims trailing space. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type
Ok, but my example was just a simplified version of what is going on.
The actual problem stems from a CHAR(1) column data type that is behaving the same way.
I was just trying to create a super-simple example of the problem.
It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.
I disagree about it being a feature request.
I was only using substr to demonstrate the problem.
Here is the problem without using any functions:
create table x (a char(1));
insert into x values ('x');
insert into x values (' ');
select length(a) from x;
1
0
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Friday, October 29, 2021 3:40 PM
To: David@calascibetta.com
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: FW: BUG #17258: Unexpected results in CHAR(1) data type
On Fri, Oct 29, 2021 at 1:17 PM David M. Calascibetta <david@calascibetta.com> wrote:
Subject: RE: BUG #17258: Unexpected results in CHAR(1) data type
Ok, but my example was just a simplified version of what is going on.
The actual problem stems from a CHAR(1) column data type that is behaving the same way.
I was just trying to create a super-simple example of the problem.
It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented.
This qualifies as a feature request, not a bug. One could write a version of substr that does what you expect (it probably wouldn't be named substr though) and takes in a character data type. It's just no one has, nor is likely to. Thus you are stuck using versions that take in text and you get the char-to-text casting side effects.
If you do octet_length(' ':: character(1)) it will return 1, not zero. So it indeed has a length one.
David J.
I disagree about it being a feature request.
I was only using substr to demonstrate the problem.
Here is the problem without using any functions:
create table x (a char(1));
insert into x values ('x');
insert into x values (' ');
select length(a) from x;
1
0
> On Oct 29, 2021, at 12:32 PM, David M. Calascibetta <david@calascibetta.com> wrote: > > Ok, but my example was just a simplified version of what is going on. > The actual problem stems from a CHAR(1) column data type that is behaving the same way. > I was just trying to create a super-simple example of the problem. > It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented. Please consider: mark.dilger=# select ''::char(1) = ' '::char(50); ?column? ---------- t (1 row) I infer that you expect a single byte of space to be compared against 50 bytes of space, and to be found unequal. Postgresdoesn't treat trailing spaces in char(n) the way I infer that you expect. Even without casting to another type (andthe equality operator for char(n) does not cast to another type) the comparison logic intentionally ignores the trailingspaces. Consider also: mark.dilger=# select length(' '::char(50)); length -------- 0 (1 row) Perhaps this behavior is nuts, but I say it is not a bug, just a peculiarity in how char(n) is defined to behave. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
OK. I have a work-around so I'm alright. I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there. I was only reporting it to improve the product, but if you think this is appropriate behavior, I'm good with it. You can close this issue. David -----Original Message----- From: Mark Dilger <mark.dilger@enterprisedb.com> Sent: Friday, October 29, 2021 4:16 PM To: <David@calascibetta.com> <David@Calascibetta.com> Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type > On Oct 29, 2021, at 12:32 PM, David M. Calascibetta <david@calascibetta.com> wrote: > > Ok, but my example was just a simplified version of what is going on. > The actual problem stems from a CHAR(1) column data type that is behaving the same way. > I was just trying to create a super-simple example of the problem. > It still seems to me that a CHAR(1) should never be zero length, regardless of how it's implemented. Please consider: mark.dilger=# select ''::char(1) = ' '::char(50); ?column? ---------- t (1 row) I infer that you expect a single byte of space to be compared against 50 bytes of space, and to be found unequal. Postgresdoesn't treat trailing spaces in char(n) the way I infer that you expect. Even without casting to another type (andthe equality operator for char(n) does not cast to another type) the comparison logic intentionally ignores the trailingspaces. Consider also: mark.dilger=# select length(' '::char(50)); length -------- 0 (1 row) Perhaps this behavior is nuts, but I say it is not a bug, just a peculiarity in how char(n) is defined to behave. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Oct 29, 2021, at 2:26 PM, David M. Calascibetta <david@calascibetta.com> wrote: > > OK. I have a work-around so I'm alright. Glad to hear it. > I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there. I haven't studied the behavior of char(n) on other RDBMS products. I'd be curious if the SQL spec says anything that we'reviolating in this regard. If so, we should at least have a warning in the docs about that. (We already have a warningabout how char(n) behaves, but nothing I see about the behavior being non-compliant.) But I'm wondering if otherRDBMS products really differ in this regard? Are you perhaps thinking about how varchar(n) works? > I was only reporting it to improve the product, but if you think this is appropriate behavior, > I'm good with it. I tend to think of char(n) as a misfeature and avoid using it. Based on your experience with other RDBMSs, would you expectchar(n) and varchar(n) to behave the same or to behave differently? In postgres, they are different, and varchar(n)would behave more like you seem to want. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> I agree, the behavior is nuts, and is inconsistent with every other RDBMS out there. I haven't studied the behavior of char(n) on other RDBMS products. I'd be curious if the SQL spec says anything that we'reviolating in this regard. If so, we should at least have a warning in the docs about that. (We already have a warningabout how char(n) behaves, but nothing I see about the behavior being non-compliant.) But I'm wondering if otherRDBMS products really differ in this regard? Are you perhaps thinking about how varchar(n) works? In every other rdbms, if it's fixed length character, values will always be that fixed length. PG behaves that way unless the value is blank. I believe this is in violation of the spec, which is why I reported it. Even the PG doc says it will pad with blanks. It does not give an exception for blank values. > I was only reporting it to improve the product, but if you think this > is appropriate behavior, I'm good with it. I tend to think of char(n) as a misfeature and avoid using it. Based on your experience with other RDBMSs, would you expectchar(n) and varchar(n) to behave the same or to behave differently? In postgres, they are different, and varchar(n)would behave more like you seem to want. They should behave the same, in that if I insert a blank into a char(1) column, it should store a blank value. If I insert a blank into a varchar(1) column, it should store a blank value. Fixed length characters would pad on the right, but it should not distort the value I entered in any other way. In this case, I enter a blank value and PG removes it. That is not appropriate behavior.
They should behave the same, in that if I insert a blank into a char(1) column, it should store a blank value.
If I insert a blank into a varchar(1) column, it should store a blank value. Fixed length characters would pad
on the right, but it should not distort the value I entered in any other way. In this case, I enter a blank value and
PG removes it. That is not appropriate behavior.
> On Oct 29, 2021, at 2:52 PM, David M. Calascibetta <david@calascibetta.com> wrote: > > In every other rdbms, if it's fixed length character, values will always be that fixed length. > PG behaves that way unless the value is blank. I believe this is in violation of the spec, > which is why I reported it. Even the PG doc says it will pad with blanks. It does not give > an exception for blank values. Taken from the postgres docs: > Values of type character are physically padded with spaces to the specified width n, and are stored and displayed thatway. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values oftype character. In collations where whitespace is significant, this behavior can produce unexpected results; for exampleSELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2) returns true, even though C locale would consider a space to begreater than a newline. Trailing spaces are removed when converting a character value to one of the other string types.Note that trailing spaces are semantically significant in character varying and text values, and when using patternmatching, that is LIKE and regular expressions. The two parts of that which at least hint at the behavior are "However, trailing spaces are treated as semantically insignificantand disregarded when comparing two values of type character." and "Trailing spaces are removed when convertinga character value to one of the other string types." There aren't that many functions which directly accept a bpchar type (aka, a char(n)), so most of the functionality involvingchar(n) will involve first casting to text before handing the resultant text off to a function, with the cast triggeringthe "spaces are removed when converting" bit, and those functions that do directly take a bpchar type for comparisonpurposes trigger the "semantically insignificant and disregarded" bit. For practical purposes you will frequentlyhit the one bit or the other. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
and those functions that do directly take a bpchar type for comparison purposes trigger the "semantically insignificant and disregarded" bit. For practical purposes you will frequently hit the one bit or the other.
> On Oct 29, 2021, at 3:13 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > > For practical purposes you will frequently hit the one bit or the other. > > > As I noted in a prior reply, octet_length(char) does the length computation with the padding spaces. So it is possiblefor char input functions to do the expected thing. Yes, I saw that. But there aren't that many functions like octet_length that do so. If users coming from other RDBMSs expectCHAR(1) to behave as David expects them to behave, it's cold comfort to say, "hang in there, you can still use octet_length()on them!" Better to say that they are going to be bitten by this expectation again and again, and insteadchoose a different datatype (which you also said.) — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes: > I haven't studied the behavior of char(n) on other RDBMS products. I'd be curious if the SQL spec says anything that we'reviolating in this regard. It's not a great approximation to the spec. Postgres views trailing spaces in a char(n) value as always being semantically insignificant, where I think the spec treats them as insignificant only for purposes of comparisons. Even more to the point, the spec considers that PAD SPACE is an attribute of *collations* not data types. Back in the day we didn't have collations, so the only way to even approximate that behavior was to make it a data type property. Now that we do have collations, it'd be conceivable to reimplement all this in something closer to the way the spec describes it. But it'd be a lot of work, and I'm not sure we'd accept such a patch even if somebody wrote it. It'd almost inevitably break applications that are relying on the existing behavior. > I tend to think of char(n) as a misfeature and avoid using it. Yeah, that. I haven't seen any reason to use char(n) rather than varchar(n) since punched cards stopped being a thing. So it's hard to summon the motivation to do a lot of work on that data type. Perhaps somebody else will feel more motivated, but nobody's stepped forward, and I wouldn't counsel holding your breath for it. regards, tom lane
> On Oct 29, 2021, at 3:09 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote: > > most of the functionality involving char(n) ... > and those functions that do directly take a bpchar type for comparison purposes trigger the "semantically insignificantand disregarded" bit I meant the "most of the functionality" qualifier to apply to the second part, meaning "most of the functions that do..."but on re-reading, my grammar didn't accomplish that. I now understand why David Johnston corrected me here: > On Oct 29, 2021, at 3:13 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > > As I noted in a prior reply, octet_length(char) does the length computation with the padding spaces. So it is possiblefor char input functions to do the expected thing. You are correct, sir. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I understand your position. I've been there myself. We will convert the CHAR(1) columns to VARCHAR and keep going. I just thought it was strange and you should know about it. Not a problem. Thanks for your attention. David -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Friday, October 29, 2021 5:23 PM To: Mark Dilger <mark.dilger@enterprisedb.com> Cc: <David@calascibetta.com> <David@Calascibetta.com>; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17258: Unexpected results in CHAR(1) data type Mark Dilger <mark.dilger@enterprisedb.com> writes: > I haven't studied the behavior of char(n) on other RDBMS products. I'd be curious if the SQL spec says anything that we're violating in this regard. It's not a great approximation to the spec. Postgres views trailing spaces in a char(n) value as always being semantically insignificant, where I think the spec treats them as insignificant only for purposes of comparisons. Even more to the point, the spec considers that PAD SPACE is an attribute of *collations* not data types. Back in the day we didn't have collations, so the only way to even approximate that behavior was to make it a data type property. Now that we do have collations, it'd be conceivable to reimplement all this in something closer to the way the spec describes it. But it'd be a lot of work, and I'm not sure we'd accept such a patch even if somebody wrote it. It'd almost inevitably break applications that are relying on the existing behavior. > I tend to think of char(n) as a misfeature and avoid using it. Yeah, that. I haven't seen any reason to use char(n) rather than varchar(n) since punched cards stopped being a thing. So it's hard to summon the motivation to do a lot of work on that data type. Perhaps somebody else will feel more motivated, but nobody's stepped forward, and I wouldn't counsel holding your breath for it. regards, tom lane
CHAR( size) is a fixed length string of length "size" characters (not bytes). Whatever data you enter, space is added at the end.
id NUMBER
, c2 char(2 char)
, c10 char(10 char)
);
Table TEST created.
SQL> insert into test values (1, ' ', ' ');
1 row inserted.
SQL> insert into test values (2, 'ä', 'ä');
1 row inserted.
SQL> select id, length(c2), length(c10) from test;
ID LENGTH(C2) LENGTH(C10)
-- ---------- -----------
1 2 10
2 2 10
2 rows selected.
-- LENGTH IN BYTES
SQL> select id, lengthb(c2), lengthb(c10) from test;
ID LENGTHB(C2) LENGTHB(C10)
-- ----------- ------------
1 2 10
2 3 11
2 rows selected.
-- LENGTH IN CHARS
SQL> select id, lengthc(c2), lengthc(c10) from test;
ID LENGTHC(C2) LENGTHC(C10)
-- ----------- ------------
1 2 10
2 2 10
2 rows selected.
-- DUMP OF REAL VALUE ON STORAGE
SQL> select id, dump(c10, 1015) from test;
ID DUMP(C10,1015)
-- -----------------------------------------------------------------------
1 Typ=96 Len=10 CharacterSet=AL32UTF8: 32,32,32,32,32,32,32,32,32,32
2 Typ=96 Len=11 CharacterSet=AL32UTF8: 195,164,32,32,32,32,32,32,32,32,32
2 rows selected.
-- ARE THEY EQUAL? YES
SQL> select * from test where c2 = c10;
ID C2 C10
-- -- ----------
1
2 ä ä
2 rows selected.
ID C2 C10
-- -- ----------
2 ä ä
1 rows selected.
SQL> select banner_full from v$version;
BANNER_FULL
-----------------------------------------------------------------------------------------
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.2.0.0.0