Обсуждение: Re: [GENERAL] 'a' == 'a '

Поиск
Список
Период
Сортировка

Re: [GENERAL] 'a' == 'a '

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Greg Stark
> Sent: Wednesday, October 19, 2005 11:17 PM
> To: Tom Lane
> Cc: Chris Travers; josh@agliodbs.com; pgsql-hackers@postgresql.org;
Dann
> Corbit; Stephan Szabo; Terry Fielder; Tino Wildenhain; Marc G.
Fournier;
> Richard_D_Levine@raytheon.com; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] [HACKERS] 'a' == 'a '
>
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> > Chris Travers <chris@travelamericas.com> writes:
> > > If I understand the spec correctly, it seems to indicate that this
is
> > > specific to the locale/character set.
> >
> > The spec associates padding behavior with collations, which per spec
are
> > separate from the datatypes --- that is, you should be able to able
to
> > specify a collation for each string-type table column (whether
char(N)
> > or varchar(N)) and even for each literal string constant.  We do not
> > currently have that capability, and accordingly fall back to binding
> > PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).
> >
> > AFAICS this choice is allowed by the spec since the default
collation is
> > implementation-defined.
>
> Does it even make sense for char(N) to not be space padded? I had the
> impression char(N) was always N characters long, not more or less. I
can't
> picture any other character being used for padding, then you would
need a
> more
> flexible rtrim function.
>
> And I can understand the collation order determining whether 'a' and
'a '
> compare equal. But surely if you store 'a' in a varchar(N) you have to
get
> 'a'
> back out, not some other string! Does the spec really allow varchar to
> actually be padded and not just compare ignoring trailing space?
>
>
> (I can't believe anyone really wants varchar to be space padded. Space
> padding
> always seemed like a legacy feature for databases with fixed record
length
> data types. Why would anyone want a string data type that can't
represent
> all
> strings?)

Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.

Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the same
length before performing the comparison.

Hence, you will see that 'Danniel' = 'Danniel ' is true in most cases.

Now, this really does not have any connection with storage or varchar or
bpchar or char or text or anything like that.

It is only the action to be taken when a comparison operation is
performed.

Re: [GENERAL] 'a' == 'a '

От
Chris Travers
Дата:
Dann Corbit wrote:

>Let me make something clear:
>When we are talking about padding here it is only in the context of a
>comparison operator and NOT having anything to do with storage.
>
>
IIrc, varchar and bpchar are stored in a similar way, but are presented
differently when retrieved.  I.e. storage is separate from presentation
in this case.  I.e. the padding in bpchar occurs when it is presented
and stripped when it is stored.

Again, I am happy "solving" this simply by documenting it since any
questions of interpretation and implimentation of the standard would be
answered.  So far what I (and I am sure others) have not heard is a
strong case for changing the behavior, given that it is in line with a
reasonable interpretation of the standards.

>Given two strings of different in a comparison, most database systems
>(by default) will blank pad the shorter string so that they are the same
>length before performing the comparison.
>
>
Understood, but what gain do you have in a case like this that might
justify the effort that would go into making it, say, an initdb option?
How often does this behavior cause problems?

Best Wishes,
Chris Travers
Metatron Technology Consulting

Re: [GENERAL] 'a' == 'a '

От
Tom Lane
Дата:
Chris Travers <chris@travelamericas.com> writes:
> IIrc, varchar and bpchar are stored in a similar way, but are presented
> differently when retrieved.  I.e. storage is separate from presentation
> in this case.  I.e. the padding in bpchar occurs when it is presented
> and stripped when it is stored.

This is not so, although I've occasionally wondered whether we shouldn't
try to make it so.  Certainly we don't rely on char(N) to be physically
fixed-width (and can't, at least not with variable-width character
encodings) so there can be no performance advantage to actually storing
the insignificant spaces.  The hard part would be in figuring out how
the output routine could know how many spaces to add back.

            regards, tom lane

Re: [GENERAL] 'a' == 'a '

От
Richard_D_Levine@raytheon.com
Дата:

Tom Lane <tgl@sss.pgh.pa.us> wrote on 10/20/2005 03:11:23 PM:
<snip>
> The hard part would be in figuring out how
> the output routine could know how many spaces to add back.

The length is in the metadata for the column, or am I being dense?

>
>          regards, tom lane


Re: [GENERAL] 'a' == 'a '

От
Richard_D_Levine@raytheon.com
Дата:
I will happily reiterate that I am the troll who started this mess by
whining about how *Oracle* handles this.  Tom's explanation that CHAR is
has a PAD collation and VARCHAR has a NO PAD collation have restored my
faith that there is goodness in the world.  My whining was out of
ignorance.  I wouldn't change the proper way PostgreSQL works.  Documenting
it is good.  I will use this new found knowledge from now on in my database
designs.

Cheers,

Rick

Chris Travers <chris@travelamericas.com> wrote on 10/20/2005 01:52:36 PM:

> Dann Corbit wrote:
>
> >Let me make something clear:
> >When we are talking about padding here it is only in the context of a
> >comparison operator and NOT having anything to do with storage.
> >
> >
> IIrc, varchar and bpchar are stored in a similar way, but are presented
> differently when retrieved.  I.e. storage is separate from presentation
> in this case.  I.e. the padding in bpchar occurs when it is presented
> and stripped when it is stored.
>
> Again, I am happy "solving" this simply by documenting it since any
> questions of interpretation and implimentation of the standard would be
> answered.  So far what I (and I am sure others) have not heard is a
> strong case for changing the behavior, given that it is in line with a
> reasonable interpretation of the standards.
>
> >Given two strings of different in a comparison, most database systems
> >(by default) will blank pad the shorter string so that they are the same
> >length before performing the comparison.
> >
> >
> Understood, but what gain do you have in a case like this that might
> justify the effort that would go into making it, say, an initdb option?
> How often does this behavior cause problems?
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting


Re: [GENERAL] 'a' == 'a '

От
"John D. Burger"
Дата:
[Removed all the non-list addresses]

Dann Corbit wrote:

> Let me make something clear:
> When we are talking about padding here it is only in the context of a
> comparison operator and NOT having anything to do with storage.
>
> Given two strings of different in a comparison, most database systems
> (by default) will blank pad the shorter string so that they are the
> same
> length before performing the comparison.
>
> Hence, you will see that 'Danniel' = 'Danniel ' is true in most cases.
>
> Now, this really does not have any connection with storage or varchar
> or
> bpchar or char or text or anything like that.

Is this really true??? My understanding of the spec was that this was
=exactly= the difference between char(N) and varchar(N) - the former is
padded to length N when you store it, or at least the DB has to act as
if this is the case.  Can someone quote the appropriate chapter and
verse?

Thanks.

- John D. Burger
   MITRE