Обсуждение: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

Поиск
Список
Период
Сортировка

Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com]
> Sent: Wednesday, October 19, 2005 2:34 PM
> To: Dann Corbit
> Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
> Richard_D_Levine@raytheon.com; pgsql-hackers@postgresql.org; pgsql-
> general@postgresql.org
> Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy]
[GENERAL]
> Oracle buysInnobase)
>
> On Wed, 19 Oct 2005, Dann Corbit wrote:
>
> > > -----Original Message-----
> > > From: Terry Fielder [mailto:terry@ashtonwoodshomes.com]
> > > Sent: Wednesday, October 19, 2005 2:05 PM
> > > To: Dann Corbit
> > > Cc: Tino Wildenhain; Marc G. Fournier;
Richard_D_Levine@raytheon.com;
> > > pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
> > > Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL]
Oracle
> > > buysInnobase)
> > >
> > >
> > >
> > > Dann Corbit wrote:
> > > > Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
> > > >
> > > > connxdatasync=# select 1 where cast('a' as varchar(30)) =
cast('a '
> > as
> > > > varchar(30));
> > > >  ?column?
> > > > ----------
> > > > (0 rows)
> > > >
> > > > I see how you can interpret the SQL Standard to make the above
> > response
> > > > a correct one.  But is it the response that you would like?
> > >
> > > When the compared datatypes are VARCHAR: YES
> >
> > What is the value of doing that?
> >
> > I can see plenty of harm and absolutely no return.  We are talking
about
> > blank padding before comparison.  Do you really want 'Danniel '
> > considered distinct from 'Danniel  ' in a comparison?  In real life,
> > what does that buy you?
>
> It buys you the ability to store things where trailing spaces are
> signficant (for example passwords) within the existing limitations of
not
> having a full set of the collation behavior.

I suppose that there may be some value in having 'J&^% ' be different
from 'J&^%   '.

I would expect to insert a password with trailing blanks to be the same
as inserting a password with no trailing blanks.

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.


Re: 'a' == 'a '

От
Josh Berkus
Дата:
Dann,

> I think that whatever is done ought to be whatever the standard says.
> If I misinterpret the standard and PostgreSQL is doing it right, then
> that is fine.  It is just that PostgreSQL is very counter-intuitive
> compared to other database systems that I have used in this one
> particular area.  When I read the standard, it looked to me like
> PostgreSQL was not performing correctly.  It is not unlikely that I read
> it wrong.

AFAIT, the standard says "implementation-specific".   So we're standard.

The main cost for comparing trimmed values is performance; factoring an
rtrim into every comparison will add significant overhead to the already
CPU-locked process of, for example, creating indexes.  We're looking for
ways to make the comparison operators lighter-weight, not heavier.

My general perspective on this is that if trailing blanks are a significant
hazard for your application, then trim them on data input.  That requires
a *lot* less peformance overhead than doing it every time you compare
something.

Changing the behaviour would break backwards compatibility for some users.
For that matter, I've been subscribed to 8 PostgreSQL mailing lists since
1999, and this is the first time I can recall someone complaining about
this comparison behavior.  So it's obviously not a widespread issue.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: [GENERAL] 'a' == 'a '

От
Chris Travers
Дата:
Josh Berkus wrote:

>Dann,
>
>
>
>>I think that whatever is done ought to be whatever the standard says.
>>If I misinterpret the standard and PostgreSQL is doing it right, then
>>that is fine.  It is just that PostgreSQL is very counter-intuitive
>>compared to other database systems that I have used in this one
>>particular area.  When I read the standard, it looked to me like
>>PostgreSQL was not performing correctly.  It is not unlikely that I read
>>it wrong.
>>
>>
>
>AFAIT, the standard says "implementation-specific".   So we're standard.
>
>The main cost for comparing trimmed values is performance; factoring an
>rtrim into every comparison will add significant overhead to the already
>CPU-locked process of, for example, creating indexes.  We're looking for
>ways to make the comparison operators lighter-weight, not heavier.
>
>
If I understand the spec correctly, it seems to indicate that this is
specific to the locale/character set.  Assuming that the standard
doesn't have anything to do with any character sets, it should be
possible to make this available for those who want it as an initdb
option.  Whether or not this is important enough to offer or not is
another matter.

Personally my questions are:

1)  How many people have been bitten by this badly?
2)  How many people have been bitten by joins that depend on padding?

Personally, unlike case folding, this seems to be an area where a bit of
documentation (i.e. all collation sets have are assumed to have the NO
PAD option in the SQL standard) would be sufficient to answer to
questions of standards-compliance.

>My general perspective on this is that if trailing blanks are a significant
>hazard for your application, then trim them on data input.  That requires
>a *lot* less peformance overhead than doing it every time you compare
>something.
>
>
In general I agree.  But I am not willing to jump to the conclusion that
it will never be warranted to add this as an initdb option.  I am more
interested in what cases people see where this would be required.  But I
agree that the bar is much higher than it is in many other cases.

Best Wishes,
Chris Travers
Metatron Technology Consulting

Re: [GENERAL] 'a' == 'a '

От
Tom Lane
Дата:
Chris Travers <chris@travelamericas.com> writes:
> If I understand the spec correctly, it seems to indicate that this is
> specific to the locale/character set.

The spec associates padding behavior with collations, which per spec are
separate from the datatypes --- that is, you should be able to able to
specify a collation for each string-type table column (whether char(N)
or varchar(N)) and even for each literal string constant.  We do not
currently have that capability, and accordingly fall back to binding
PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).

AFAICS this choice is allowed by the spec since the default collation is
implementation-defined.

            regards, tom lane

Re: [GENERAL] 'a' == 'a '

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Chris Travers <chris@travelamericas.com> writes:
> > If I understand the spec correctly, it seems to indicate that this is
> > specific to the locale/character set.
>
> The spec associates padding behavior with collations, which per spec are
> separate from the datatypes --- that is, you should be able to able to
> specify a collation for each string-type table column (whether char(N)
> or varchar(N)) and even for each literal string constant.  We do not
> currently have that capability, and accordingly fall back to binding
> PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).
>
> AFAICS this choice is allowed by the spec since the default collation is
> implementation-defined.

Does it even make sense for char(N) to not be space padded? I had the
impression char(N) was always N characters long, not more or less. I can't
picture any other character being used for padding, then you would need a more
flexible rtrim function.

And I can understand the collation order determining whether 'a' and 'a '
compare equal. But surely if you store 'a' in a varchar(N) you have to get 'a'
back out, not some other string! Does the spec really allow varchar to
actually be padded and not just compare ignoring trailing space?


(I can't believe anyone really wants varchar to be space padded. Space padding
always seemed like a legacy feature for databases with fixed record length
data types. Why would anyone want a string data type that can't represent all
strings?)

--
greg

Re: [GENERAL] 'a' == 'a '

От
Jan Wieck
Дата:
On 10/20/2005 2:17 AM, Greg Stark wrote:

> (I can't believe anyone really wants varchar to be space padded. Space padding
> always seemed like a legacy feature for databases with fixed record length
> data types. Why would anyone want a string data type that can't represent all
> strings?)

They must have buried that "bow to COBOL" so deep in the code that they
had no choice but to abuse their power and stuff this cruft into the
standard.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #