Обсуждение: Range type adaptation implemented

Поиск
Список
Период
Сортировка

Range type adaptation implemented

От
Daniele Varrazzo
Дата:
I have implemented adaptation between Postgres range types and Python.

Usage is documented at:
<http://initd.org/psycopg/docs/extras.html#range-data-types>

There is a basic Range class, completely working (it is not an
abstract class) but with not enough information for adaptation. A few
subclasses allow mapping to and from the db, for instance NumericRange
map to int4range, int8range and numrange. Adaptation of builtin range
types is supported out-of-the-box. Adaptation of user-defined ranges
is supported via the function register_range() that queries the
database, if required creates a new range type and registers all the
adapters. The usual chain of fallbacks is provided if the user doesn't
want to query the database etc.

The Range class doesn't re-implements the database range operator and
doesn't support canonicalization because we cannot emulate fully the
Postgres model. The funniest surprise I've had during the development
was an error creating the object TextRange('#', '$') in the database.
It should be possible as

    >>> ord('#'), ord('$')
    (35, 36)

but PG insisted that "range lower bound must be less than or equal to
range upper bound". Well, turns out that:

    postgres=# select ascii('#') < ascii('$'), '#' < '$';
     ?column? | ?column?
    ----------+----------
     t        | f
    (1 row)

Morale: it's between [hard and impossible) to canonicalize ranges the
way postgres does, hence it's better to leave only the database
responsible for it, otherwise we may end up with ranges valid for
postgres and not for python and viceversa.

On adaptation: numeric ranges are adapted using a syntax '[10,20)':
this allows to have a single range type for all numerical types. Other
ranges, such as the date range and in general user-defined types, are
adapted using a syntax such as daterange('2012-01-01', '2012-12-31',
'(]'), with the values adapted recursively. I would have preferred to
implement all the ranges in the second way, because it relies on
already defined adapters for the representation and returns typed
values instead of string literals. I've had to adapt the number ranges
as literals because there is no cast defined across postgres range
types:

    postgres=# select '[10,20)'::int4range::numrange;
    ERROR:  cannot cast type int4range to numrange

nor AFAIK  there is a constructor for an "untyped range" such as
range(lower, upper, bounds). So if what we have is Python numbers,
e.g. 10, 20, we cannot choose whether we want an int4range or an
int8range, and Postgres won't forgive the error with a cast. The
shortcoming of using a literal is that in case of ambiguity postgres
will refuse to guess, but this can be solved with a ::cast on the
placeholder.

OTOH the lack of a generic postgres range constructor function forces
us to have different Range subclasses for date, timestamp, timestamptz
and all the custom ranges, because in the sql representation we have
to pass the type name. I've thought about using a single Range with a
single adapter looking at the type of the bounds contained by the
python objects and making and using the proper PG type name, but this
wouldn't work for ranges containing no bound (unbound or empty) and in
that case it should resort to a literal. This is bad as a query would
end up having sometimes a daterange(to, from), sometimes just a
literal '(,)' or 'empty' so it may work all the time and fail
unexpectedly on specific values for cast problems in the database.

The above docs include the other changes implemented in these days:
customizable composite adaptation and json support. My devel has still
the features in separate branches to be merged together. Comments,
tests and docs proofreading are well accepted as usual. Thank you very
much.


-- Daniele


Re: Range type adaptation implemented

От
Karsten Hilbert
Дата:
On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote:

> The funniest surprise I've had during the development
> was an error creating the object TextRange('#', '$') in the database.
> It should be possible as
>
>     >>> ord('#'), ord('$')
>     (35, 36)
>
> but PG insisted that "range lower bound must be less than or equal to
> range upper bound". Well, turns out that:
>
>     postgres=# select ascii('#') < ascii('$'), '#' < '$';
>      ?column? | ?column?
>     ----------+----------
>      t        | f
>     (1 row)

This does not seem intuitive but is probably quite logical:

While

    ascii('#') < ascii('$')

eventually compares numerical values (35 and 36) the comparison

    '#' < '$'

operates on characters the sorting order (<) of which
depends on the lcoale in use.

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Range type adaptation implemented

От
Karsten Hilbert
Дата:
On Mon, Sep 24, 2012 at 11:45:42AM +0200, Karsten Hilbert wrote:

> > The funniest surprise I've had during the development
> > was an error creating the object TextRange('#', '$') in the database.
> > It should be possible as
> >
> >     >>> ord('#'), ord('$')
> >     (35, 36)
> >
> > but PG insisted that "range lower bound must be less than or equal to
> > range upper bound". Well, turns out that:
> >
> >     postgres=# select ascii('#') < ascii('$'), '#' < '$';
> >      ?column? | ?column?
> >     ----------+----------
> >      t        | f
> >     (1 row)
>
> This does not seem intuitive but is probably quite logical:
>
> While
>
>     ascii('#') < ascii('$')
>
> eventually compares numerical values (35 and 36) the comparison
>
>     '#' < '$'
>
> operates on characters the sorting order (<) of which
> depends on the lcoale in use.

And can thus invalidate range bounds when collation rules
are changed per-query !?!

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Range type adaptation implemented

От
Daniele Varrazzo
Дата:
On Mon, Sep 24, 2012 at 10:45 AM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:
> On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote:
>
>> The funniest surprise I've had during the development
>> was an error creating the object TextRange('#', '$') in the database.
>> It should be possible as
>>
>>     >>> ord('#'), ord('$')
>>     (35, 36)
>>
>> but PG insisted that "range lower bound must be less than or equal to
>> range upper bound". Well, turns out that:
>>
>>     postgres=# select ascii('#') < ascii('$'), '#' < '$';
>>      ?column? | ?column?
>>     ----------+----------
>>      t        | f
>>     (1 row)
>
> This does not seem intuitive but is probably quite logical:
>
> While
>
>         ascii('#') < ascii('$')
>
> eventually compares numerical values (35 and 36) the comparison
>
>         '#' < '$'
>
> operates on characters the sorting order (<) of which
> depends on the lcoale in use.

That's right: the reason is the locale. select '#' collate "C" < '$'
collate "C" returns true as naively expected.

-- Daniele


Re: Range type adaptation implemented

От
"Jonathan S. Katz"
Дата:

On Sep 24, 2012, at 5:57 AM, Daniele Varrazzo wrote:

On Mon, Sep 24, 2012 at 10:45 AM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:
On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote:

The funniest surprise I've had during the development
was an error creating the object TextRange('#', '$') in the database.
It should be possible as

ord('#'), ord('$')
   (35, 36)

but PG insisted that "range lower bound must be less than or equal to
range upper bound". Well, turns out that:

   postgres=# select ascii('#') < ascii('$'), '#' < '$';
    ?column? | ?column?
   ----------+----------
    t        | f
   (1 row)

This does not seem intuitive but is probably quite logical:

While

       ascii('#') < ascii('$')

eventually compares numerical values (35 and 36) the comparison

       '#' < '$'

operates on characters the sorting order (<) of which
depends on the lcoale in use.

That's right: the reason is the locale. select '#' collate "C" < '$'
collate "C" returns true as naively expected.

-- Daniele

With that said, referencing this line:


Because those sets of operators are supposed on the Postgres side, perhaps we could provide an error message indicating that?  That way if someone needs to look at the comparisons between ranges they know the database operators.  So maybe:

"Range objects cannot be ordered - please refer to the PostgreSQL documentation to perform this operation in the database"

Jonathan

Re: Range type adaptation implemented

От
Daniele Varrazzo
Дата:
On Mon, Sep 24, 2012 at 5:29 PM, Jonathan S. Katz
<jonathan.katz@excoventures.com> wrote:

> With that said, referencing this line:
>
> https://github.com/dvarrazzo/psycopg/blob/range-type/lib/_range.py#L135
>
> Because those sets of operators are supposed on the Postgres side, perhaps
> we could provide an error message indicating that?  That way if someone
> needs to look at the comparisons between ranges they know the database
> operators.  So maybe:
>
> "Range objects cannot be ordered - please refer to the PostgreSQL
> documentation to perform this operation in the database"

Yeah, we could improve that message. The important thing was to
disallow r1 < r2 at all, because the Python "object" type by default
implements these operators as pointer comparison: if an user gets a
result True or False out of ranges comparison he may think the
operation is allowed and returns something meaningful...

I've actually not found any object that *doesn't compare* in Python,
so I don't know exactly how to deal with it. There are objects with
partial ordering around (e.g. sets) but they return False if not
comparable.

    >>> set('a') < set('b')
    False
    >>> set('a') > set('b')
    False

We could return False to any comparison but still it's not like "you
cannot do that, mate". Anybody knows an example of unorderable object
in Python? Is there anything more appropriate than throwing TypeError?

-- Daniele


Re: Range type adaptation implemented

От
Jacob Kaplan-Moss
Дата:
On Mon, Sep 24, 2012 at 11:44 AM, Daniele Varrazzo
<daniele.varrazzo@gmail.com> wrote:
> We could return False to any comparison but still it's not like "you
> cannot do that, mate". Anybody knows an example of unorderable object
> in Python? Is there anything more appropriate than throwing TypeError?

I'd say take a cue from how Python 3 handles trying to compare disparate types:

    $ python3
    Python 3.2.2 (default, Feb 23 2012, 12:57:05)
    [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] on darwin
    >>> {} < []
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: dict() < list()

So I'd say TypeError is correct, perhaps with a message like
"unorderable type: range()" or something.

Jacob


Re: Range type adaptation implemented

От
Daniele Varrazzo
Дата:
On Mon, Sep 24, 2012 at 6:36 PM, Jacob Kaplan-Moss <jacob@jacobian.org> wrote:
> On Mon, Sep 24, 2012 at 11:44 AM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>> We could return False to any comparison but still it's not like "you
>> cannot do that, mate". Anybody knows an example of unorderable object
>> in Python? Is there anything more appropriate than throwing TypeError?
>
> I'd say take a cue from how Python 3 handles trying to compare disparate types:
>
>     $ python3
>     Python 3.2.2 (default, Feb 23 2012, 12:57:05)
>     [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] on darwin
>     >>> {} < []
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: unorderable types: dict() < list()
>
> So I'd say TypeError is correct, perhaps with a message like
> "unorderable type: range()" or something.

Good, with an hint about using postgres operators I think it would fit the bill.

Thank you.

-- Daniele