Re: record identical operator

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: record identical operator
Дата
Msg-id 1379519692.59106.YahooMailNeo@web162902.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Re: record identical operator  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: record identical operator  (Stephen Frost <sfrost@snowman.net>)
Re: record identical operator  (Hannu Krosing <hannu@2ndQuadrant.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> To have clean semantics, I think the operator should mean that the
>> stored format of the row is the same.  Regarding the array null
>> bitmap example, I think it would be truly weird if the operator
>> said that the stored format was the same, but this happened:
>>
>> test=# select pg_column_size(ARRAY[1,2,3]);
>>   pg_column_size
>> ----------------
>>               36
>> (1 row)
>>
>> test=# select pg_column_size((ARRAY[1,2,3,NULL])::int4[3]);
>>   pg_column_size
>> ----------------
>>               44
>> (1 row)
>>
>> They have the same stored format, but a different number of
>> bytes?!?
>
> Hmm.  For most of this thread, I was leaning toward the view that
> comparing the binary representations was the wrong concept, and that
> we actually needed to have type-specific operators that understand the
> semantics of the data type.
>
> But I think this example convinces me otherwise.  What we really want
> to do here is test whether two values are the same, and if you can
> feed two values that are supposedly the same to some function and get
> two different answers, well then they're not really the same, are
> they?

Right.  Not only would the per-type solution make materialized views
maintenance broken by default, requiring per-type work to make it
work reasonably, with silent failures for any type you didn't know
about, but "no user-visible differences" is a pretty slippery
concept.  Did you really think of all the functions someone might
use to look at a value?  Might there be performance differences we
care about that should be handled, even if the user has no way to
dig out the difference?  Will that change in a future release?

> Now, I grant that the array case is pretty weird.  An array with an
> all-zeroes null bitmap is basically semantically identical to one with
> no null bitmap at all.  But there are other such cases as well.  You
> can have two floats that print the same way except when
> extra_float_digits=3, for example, and I think that's probably a
> difference that we *wouldn't* want to paper over.  You can have a
> long-form numeric that represents a value that could have been
> represented as a short-form numeric, which is similar to the array
> case.  There are probably other examples as well.  But in each of
> those cases, the point is that there *is* some operation which will
> distinguish between the two supposedly-identical values, and therefore
> they are not identical for all purposes.  Therefore, I see no harm in
> having an operator that tests for
> are-these-values-identical-for-all-purposes.  If that's useful for
> RMVC, then there may be other legitimate uses for it as well.
>
> And once we decide that's OK, I think we ought to document it.

That seems to be the consensus.  I don't think we can really
document this form of record comparison without also documenting
how equality works.  I'll work something up for the next version of
the patch.

> Sure, it's a little confusing, but we can explain it, I think.  It's a good
> opportunity to point out to people that, most of the time, they really
> want something else, like the equality operator for the default btree
> opclass.

I think the hardest part will be documenting the difference between
the row value constructor semantics (which are all that is
currently documented) and the record equality semantics (used for
sorting and building indexes).  In a green field I think I would
have argued for having just the standard semantics we have
documented, and modifying our sort execution nodes and index builds
to deal with that.  This is one of those cases where the breakage
from changing to that is hard to justify on a cleaner conceptual
semantics basis.

There also seems to be universal agreement that the operator names
should be something other than what I put in the v1 patch, but we
don't have agreement on what should be used instead.  We need six
operators, to support the btree am requirements.  Currently the
patch has:

=== !== >== >>> <== <<<

Suggested "same as" operators so far are:

====
=====
<<=>>
=><=

Anyone want to champion one of those, or something else?  How about
the other five operators to go with your favorite?

Keep in mind that this thread has also turned up strong support for
an operator to express IS NOT DISTINCT FROM -- so that it can be
used with ANY/ALL, among other things.  Long term, having an
opfamily for that might help us clean up the semantics of record
comparison when there are NULLs involved.  Currently we use the =
operator but act as though IS NOT DISTINCT FROM was specified
(except for some cases involving a row value constructor).  Any
serious discussion of that should probably move to a new thread,
but I mention it here because some people wanted to reserve
operator space for that, which could conflict with "same as"
operators.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: record identical operator
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: record identical operator