Обсуждение: Range type adaptation implemented
I have implemented adaptation between Postgres range types and Python. Usage is documented at: <http://initd.org/psycopg/docs/extras.html#range-data-types> There is a basic Range class, completely working (it is not an abstract class) but with not enough information for adaptation. A few subclasses allow mapping to and from the db, for instance NumericRange map to int4range, int8range and numrange. Adaptation of builtin range types is supported out-of-the-box. Adaptation of user-defined ranges is supported via the function register_range() that queries the database, if required creates a new range type and registers all the adapters. The usual chain of fallbacks is provided if the user doesn't want to query the database etc. The Range class doesn't re-implements the database range operator and doesn't support canonicalization because we cannot emulate fully the Postgres model. The funniest surprise I've had during the development was an error creating the object TextRange('#', '$') in the database. It should be possible as >>> ord('#'), ord('$') (35, 36) but PG insisted that "range lower bound must be less than or equal to range upper bound". Well, turns out that: postgres=# select ascii('#') < ascii('$'), '#' < '$'; ?column? | ?column? ----------+---------- t | f (1 row) Morale: it's between [hard and impossible) to canonicalize ranges the way postgres does, hence it's better to leave only the database responsible for it, otherwise we may end up with ranges valid for postgres and not for python and viceversa. On adaptation: numeric ranges are adapted using a syntax '[10,20)': this allows to have a single range type for all numerical types. Other ranges, such as the date range and in general user-defined types, are adapted using a syntax such as daterange('2012-01-01', '2012-12-31', '(]'), with the values adapted recursively. I would have preferred to implement all the ranges in the second way, because it relies on already defined adapters for the representation and returns typed values instead of string literals. I've had to adapt the number ranges as literals because there is no cast defined across postgres range types: postgres=# select '[10,20)'::int4range::numrange; ERROR: cannot cast type int4range to numrange nor AFAIK there is a constructor for an "untyped range" such as range(lower, upper, bounds). So if what we have is Python numbers, e.g. 10, 20, we cannot choose whether we want an int4range or an int8range, and Postgres won't forgive the error with a cast. The shortcoming of using a literal is that in case of ambiguity postgres will refuse to guess, but this can be solved with a ::cast on the placeholder. OTOH the lack of a generic postgres range constructor function forces us to have different Range subclasses for date, timestamp, timestamptz and all the custom ranges, because in the sql representation we have to pass the type name. I've thought about using a single Range with a single adapter looking at the type of the bounds contained by the python objects and making and using the proper PG type name, but this wouldn't work for ranges containing no bound (unbound or empty) and in that case it should resort to a literal. This is bad as a query would end up having sometimes a daterange(to, from), sometimes just a literal '(,)' or 'empty' so it may work all the time and fail unexpectedly on specific values for cast problems in the database. The above docs include the other changes implemented in these days: customizable composite adaptation and json support. My devel has still the features in separate branches to be merged together. Comments, tests and docs proofreading are well accepted as usual. Thank you very much. -- Daniele
On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote: > The funniest surprise I've had during the development > was an error creating the object TextRange('#', '$') in the database. > It should be possible as > > >>> ord('#'), ord('$') > (35, 36) > > but PG insisted that "range lower bound must be less than or equal to > range upper bound". Well, turns out that: > > postgres=# select ascii('#') < ascii('$'), '#' < '$'; > ?column? | ?column? > ----------+---------- > t | f > (1 row) This does not seem intuitive but is probably quite logical: While ascii('#') < ascii('$') eventually compares numerical values (35 and 36) the comparison '#' < '$' operates on characters the sorting order (<) of which depends on the lcoale in use. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Sep 24, 2012 at 11:45:42AM +0200, Karsten Hilbert wrote: > > The funniest surprise I've had during the development > > was an error creating the object TextRange('#', '$') in the database. > > It should be possible as > > > > >>> ord('#'), ord('$') > > (35, 36) > > > > but PG insisted that "range lower bound must be less than or equal to > > range upper bound". Well, turns out that: > > > > postgres=# select ascii('#') < ascii('$'), '#' < '$'; > > ?column? | ?column? > > ----------+---------- > > t | f > > (1 row) > > This does not seem intuitive but is probably quite logical: > > While > > ascii('#') < ascii('$') > > eventually compares numerical values (35 and 36) the comparison > > '#' < '$' > > operates on characters the sorting order (<) of which > depends on the lcoale in use. And can thus invalidate range bounds when collation rules are changed per-query !?! Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Sep 24, 2012 at 10:45 AM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote: > >> The funniest surprise I've had during the development >> was an error creating the object TextRange('#', '$') in the database. >> It should be possible as >> >> >>> ord('#'), ord('$') >> (35, 36) >> >> but PG insisted that "range lower bound must be less than or equal to >> range upper bound". Well, turns out that: >> >> postgres=# select ascii('#') < ascii('$'), '#' < '$'; >> ?column? | ?column? >> ----------+---------- >> t | f >> (1 row) > > This does not seem intuitive but is probably quite logical: > > While > > ascii('#') < ascii('$') > > eventually compares numerical values (35 and 36) the comparison > > '#' < '$' > > operates on characters the sorting order (<) of which > depends on the lcoale in use. That's right: the reason is the locale. select '#' collate "C" < '$' collate "C" returns true as naively expected. -- Daniele
On Sep 24, 2012, at 5:57 AM, Daniele Varrazzo wrote:
On Mon, Sep 24, 2012 at 10:45 AM, Karsten Hilbert
<Karsten.Hilbert@gmx.net> wrote:On Mon, Sep 24, 2012 at 09:56:14AM +0100, Daniele Varrazzo wrote:The funniest surprise I've had during the developmentwas an error creating the object TextRange('#', '$') in the database.It should be possible asord('#'), ord('$')(35, 36)but PG insisted that "range lower bound must be less than or equal torange upper bound". Well, turns out that:postgres=# select ascii('#') < ascii('$'), '#' < '$';?column? | ?column?----------+----------t | f(1 row)This does not seem intuitive but is probably quite logical:Whileascii('#') < ascii('$')eventually compares numerical values (35 and 36) the comparison'#' < '$'operates on characters the sorting order (<) of whichdepends on the lcoale in use.
That's right: the reason is the locale. select '#' collate "C" < '$'
collate "C" returns true as naively expected.
-- Daniele
With that said, referencing this line:
Because those sets of operators are supposed on the Postgres side, perhaps we could provide an error message indicating that? That way if someone needs to look at the comparisons between ranges they know the database operators. So maybe:
"Range objects cannot be ordered - please refer to the PostgreSQL documentation to perform this operation in the database"
Jonathan
On Mon, Sep 24, 2012 at 5:29 PM, Jonathan S. Katz <jonathan.katz@excoventures.com> wrote: > With that said, referencing this line: > > https://github.com/dvarrazzo/psycopg/blob/range-type/lib/_range.py#L135 > > Because those sets of operators are supposed on the Postgres side, perhaps > we could provide an error message indicating that? That way if someone > needs to look at the comparisons between ranges they know the database > operators. So maybe: > > "Range objects cannot be ordered - please refer to the PostgreSQL > documentation to perform this operation in the database" Yeah, we could improve that message. The important thing was to disallow r1 < r2 at all, because the Python "object" type by default implements these operators as pointer comparison: if an user gets a result True or False out of ranges comparison he may think the operation is allowed and returns something meaningful... I've actually not found any object that *doesn't compare* in Python, so I don't know exactly how to deal with it. There are objects with partial ordering around (e.g. sets) but they return False if not comparable. >>> set('a') < set('b') False >>> set('a') > set('b') False We could return False to any comparison but still it's not like "you cannot do that, mate". Anybody knows an example of unorderable object in Python? Is there anything more appropriate than throwing TypeError? -- Daniele
On Mon, Sep 24, 2012 at 11:44 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > We could return False to any comparison but still it's not like "you > cannot do that, mate". Anybody knows an example of unorderable object > in Python? Is there anything more appropriate than throwing TypeError? I'd say take a cue from how Python 3 handles trying to compare disparate types: $ python3 Python 3.2.2 (default, Feb 23 2012, 12:57:05) [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] on darwin >>> {} < [] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: dict() < list() So I'd say TypeError is correct, perhaps with a message like "unorderable type: range()" or something. Jacob
On Mon, Sep 24, 2012 at 6:36 PM, Jacob Kaplan-Moss <jacob@jacobian.org> wrote: > On Mon, Sep 24, 2012 at 11:44 AM, Daniele Varrazzo > <daniele.varrazzo@gmail.com> wrote: >> We could return False to any comparison but still it's not like "you >> cannot do that, mate". Anybody knows an example of unorderable object >> in Python? Is there anything more appropriate than throwing TypeError? > > I'd say take a cue from how Python 3 handles trying to compare disparate types: > > $ python3 > Python 3.2.2 (default, Feb 23 2012, 12:57:05) > [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.45)] on darwin > >>> {} < [] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: unorderable types: dict() < list() > > So I'd say TypeError is correct, perhaps with a message like > "unorderable type: range()" or something. Good, with an hint about using postgres operators I think it would fit the bill. Thank you. -- Daniele