Обсуждение: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] Oracle buysInnobase)
> -----Original Message----- > From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com] > Sent: Wednesday, October 19, 2005 2:34 PM > To: Dann Corbit > Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier; > Richard_D_Levine@raytheon.com; pgsql-hackers@postgresql.org; pgsql- > general@postgresql.org > Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] > Oracle buysInnobase) > > On Wed, 19 Oct 2005, Dann Corbit wrote: > > > > -----Original Message----- > > > From: Terry Fielder [mailto:terry@ashtonwoodshomes.com] > > > Sent: Wednesday, October 19, 2005 2:05 PM > > > To: Dann Corbit > > > Cc: Tino Wildenhain; Marc G. Fournier; Richard_D_Levine@raytheon.com; > > > pgsql-hackers@postgresql.org; pgsql-general@postgresql.org > > > Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle > > > buysInnobase) > > > > > > > > > > > > Dann Corbit wrote: > > > > Try this query in Oracle, SQL*Server, DB/2, Informix, etc.: > > > > > > > > connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' > > as > > > > varchar(30)); > > > > ?column? > > > > ---------- > > > > (0 rows) > > > > > > > > I see how you can interpret the SQL Standard to make the above > > response > > > > a correct one. But is it the response that you would like? > > > > > > When the compared datatypes are VARCHAR: YES > > > > What is the value of doing that? > > > > I can see plenty of harm and absolutely no return. We are talking about > > blank padding before comparison. Do you really want 'Danniel ' > > considered distinct from 'Danniel ' in a comparison? In real life, > > what does that buy you? > > It buys you the ability to store things where trailing spaces are > signficant (for example passwords) within the existing limitations of not > having a full set of the collation behavior. I suppose that there may be some value in having 'J&^% ' be different from 'J&^% '. I would expect to insert a password with trailing blanks to be the same as inserting a password with no trailing blanks. I think that whatever is done ought to be whatever the standard says. If I misinterpret the standard and PostgreSQL is doing it right, then that is fine. It is just that PostgreSQL is very counter-intuitive compared to other database systems that I have used in this one particular area. When I read the standard, it looked to me like PostgreSQL was not performing correctly. It is not unlikely that I read it wrong.
Dann, > I think that whatever is done ought to be whatever the standard says. > If I misinterpret the standard and PostgreSQL is doing it right, then > that is fine. It is just that PostgreSQL is very counter-intuitive > compared to other database systems that I have used in this one > particular area. When I read the standard, it looked to me like > PostgreSQL was not performing correctly. It is not unlikely that I read > it wrong. AFAIT, the standard says "implementation-specific". So we're standard. The main cost for comparing trimmed values is performance; factoring an rtrim into every comparison will add significant overhead to the already CPU-locked process of, for example, creating indexes. We're looking for ways to make the comparison operators lighter-weight, not heavier. My general perspective on this is that if trailing blanks are a significant hazard for your application, then trim them on data input. That requires a *lot* less peformance overhead than doing it every time you compare something. Changing the behaviour would break backwards compatibility for some users. For that matter, I've been subscribed to 8 PostgreSQL mailing lists since 1999, and this is the first time I can recall someone complaining about this comparison behavior. So it's obviously not a widespread issue. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: >Dann, > > > >>I think that whatever is done ought to be whatever the standard says. >>If I misinterpret the standard and PostgreSQL is doing it right, then >>that is fine. It is just that PostgreSQL is very counter-intuitive >>compared to other database systems that I have used in this one >>particular area. When I read the standard, it looked to me like >>PostgreSQL was not performing correctly. It is not unlikely that I read >>it wrong. >> >> > >AFAIT, the standard says "implementation-specific". So we're standard. > >The main cost for comparing trimmed values is performance; factoring an >rtrim into every comparison will add significant overhead to the already >CPU-locked process of, for example, creating indexes. We're looking for >ways to make the comparison operators lighter-weight, not heavier. > > If I understand the spec correctly, it seems to indicate that this is specific to the locale/character set. Assuming that the standard doesn't have anything to do with any character sets, it should be possible to make this available for those who want it as an initdb option. Whether or not this is important enough to offer or not is another matter. Personally my questions are: 1) How many people have been bitten by this badly? 2) How many people have been bitten by joins that depend on padding? Personally, unlike case folding, this seems to be an area where a bit of documentation (i.e. all collation sets have are assumed to have the NO PAD option in the SQL standard) would be sufficient to answer to questions of standards-compliance. >My general perspective on this is that if trailing blanks are a significant >hazard for your application, then trim them on data input. That requires >a *lot* less peformance overhead than doing it every time you compare >something. > > In general I agree. But I am not willing to jump to the conclusion that it will never be warranted to add this as an initdb option. I am more interested in what cases people see where this would be required. But I agree that the bar is much higher than it is in many other cases. Best Wishes, Chris Travers Metatron Technology Consulting
Chris Travers <chris@travelamericas.com> writes: > If I understand the spec correctly, it seems to indicate that this is > specific to the locale/character set. The spec associates padding behavior with collations, which per spec are separate from the datatypes --- that is, you should be able to able to specify a collation for each string-type table column (whether char(N) or varchar(N)) and even for each literal string constant. We do not currently have that capability, and accordingly fall back to binding PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N). AFAICS this choice is allowed by the spec since the default collation is implementation-defined. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Chris Travers <chris@travelamericas.com> writes: > > If I understand the spec correctly, it seems to indicate that this is > > specific to the locale/character set. > > The spec associates padding behavior with collations, which per spec are > separate from the datatypes --- that is, you should be able to able to > specify a collation for each string-type table column (whether char(N) > or varchar(N)) and even for each literal string constant. We do not > currently have that capability, and accordingly fall back to binding > PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N). > > AFAICS this choice is allowed by the spec since the default collation is > implementation-defined. Does it even make sense for char(N) to not be space padded? I had the impression char(N) was always N characters long, not more or less. I can't picture any other character being used for padding, then you would need a more flexible rtrim function. And I can understand the collation order determining whether 'a' and 'a ' compare equal. But surely if you store 'a' in a varchar(N) you have to get 'a' back out, not some other string! Does the spec really allow varchar to actually be padded and not just compare ignoring trailing space? (I can't believe anyone really wants varchar to be space padded. Space padding always seemed like a legacy feature for databases with fixed record length data types. Why would anyone want a string data type that can't represent all strings?) -- greg
On 10/20/2005 2:17 AM, Greg Stark wrote: > (I can't believe anyone really wants varchar to be space padded. Space padding > always seemed like a legacy feature for databases with fixed record length > data types. Why would anyone want a string data type that can't represent all > strings?) They must have buried that "bow to COBOL" so deep in the code that they had no choice but to abuse their power and stuff this cruft into the standard. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Thu, 2005-10-20 at 08:14 -0400, Jan Wieck wrote: > On 10/20/2005 2:17 AM, Greg Stark wrote: > > > (I can't believe anyone really wants varchar to be space padded. Space padding > > always seemed like a legacy feature for databases with fixed record length > > data types. Why would anyone want a string data type that can't represent all > > strings?) > > They must have buried that "bow to COBOL" so deep in the code that they > had no choice but to abuse their power and stuff this cruft into the > standard. > MOVE SPACES TO 0101 You just gotta love it Andy