Обсуждение: Hashable custom types

Поиск
Список
Период
Сортировка

Hashable custom types

От
Paul Ramsey
Дата:
When trying to write a recursive CTE using the PostGIS geometry type,
I was told this:

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

Is it possible to make custom types hashable? There's no hook in the
CREATE TYPE call for a hash function, but can one be hooked up
somewhere else? In an operator?

Thanks,

P

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net



Re: Hashable custom types

От
Peter Geoghegan
Дата:
On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Is it possible to make custom types hashable? There's no hook in the
> CREATE TYPE call for a hash function, but can one be hooked up
> somewhere else? In an operator?

See 35.14.6., System Dependencies on Operator Classes


-- 
Peter Geoghegan



Re: Hashable custom types

От
David Fetter
Дата:
On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote:
> When trying to write a recursive CTE using the PostGIS geometry type,
> I was told this:
> 
> ERROR:  could not implement recursive UNION
> DETAIL:  All column datatypes must be hashable.

This leads to an interesting question, which is why does our
implementation require this.  I'm guessing it's a performance
optimization.

Quoth src/backend/executor/nodeRecursiveunion.c:

/** To implement UNION (without ALL), we need a hashtable that stores tuples* already seen.  The hash key is computed
fromthe grouping columns.*/
 

As hashing can only approximately guarantee uniqueness (pigeonhole
principle, blah, blah), is there some other similarly performant
mechanism for tracking seen tuples that might work at least in cases
where we don't have a hash function for the data type?  Some kind of
tree, perhaps, or does that require too many other things (total
ordering, e.g.)?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Hashable custom types

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> On Fri, Apr 25, 2014 at 04:47:49PM -0700, Paul Ramsey wrote:
>> ERROR:  could not implement recursive UNION
>> DETAIL:  All column datatypes must be hashable.

> This leads to an interesting question, which is why does our
> implementation require this.  I'm guessing it's a performance
> optimization.

Well, you clearly need to have a notion of equality for each column
datatype, or else UNION doesn't mean anything.

In general we consider that a datatype's notion of equality can
be defined either by its default btree opclass (which supports
sort-based query algorithms) or by its default hash opclass
(which supports hash-based query algorithms).

The plain UNION code supports either sorting or hashing, but
we've not gotten around to supporting a sort-based approach
to recursive UNION.  I'm not convinced that it's worth doing ...
        regards, tom lane



Re: Hashable custom types

От
Atri Sharma
Дата:




The plain UNION code supports either sorting or hashing, but
we've not gotten around to supporting a sort-based approach
to recursive UNION.  I'm not convinced that it's worth doing ...

                        regards, tom lane



Without sorting, isnt the scope of a recursive UNION with custom datatypes pretty restrictive?

As is, even the sorting shall be a bit restrictive due to the costs associated. I feel what David has suggested upthread should be good. Maybe an experimental patch with a workload that should give a load factor 1 for the hash table should prove some performance points.

Even if thats not the case, we should really do something to improve the scope of usability of recursive UNION with custom types.

Regards,

Atri
 

--
Regards,
 
Atri
l'apprenant

Re: Hashable custom types

От
Greg Stark
Дата:
On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
> Without sorting, isnt the scope of a recursive UNION with custom datatypes
> pretty restrictive?

All the default data types are hashable. It's not hard to add a hash
operator class. In a clean slate design it would probably have been
simpler to just make it a requirement that any data type provide a
default hash operator (and probably a default btree comparator).
Postgres provides a lot of degrees of freedom but it should probably
be considered best practice to just provide both even if you don't
envision one or the other being used directly by users for indexes.


-- 
greg



Re: Hashable custom types

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Sat, Apr 26, 2014 at 6:39 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>> Without sorting, isnt the scope of a recursive UNION with custom datatypes
>> pretty restrictive?

> All the default data types are hashable. It's not hard to add a hash
> operator class. In a clean slate design it would probably have been
> simpler to just make it a requirement that any data type provide a
> default hash operator (and probably a default btree comparator).
> Postgres provides a lot of degrees of freedom but it should probably
> be considered best practice to just provide both even if you don't
> envision one or the other being used directly by users for indexes.

A btree opclass requires that you invent some one-dimensional sort order
for the datatype, which might be a difficult thing; so I think it's fully
reasonable not to require datatypes to have btree support.  Hashing
doesn't require any semantic assumptions beyond having an equality rule,
which is clearly *necessary* if you want to do stuff like UNION or
DISTINCT.  So from that standpoint it's perfectly reasonable for recursive
UNION to require a hashable equality operator, whereas the other case of
requiring a sortable operator would be a lot harder to defend.

Having said that, I can also believe that there might be datatypes for
which implementing a hash function would be a lot harder than implementing
sorting; this could be true if your equality rule allows for a lot of
different physical representations of "equal" values.  But I'm not so
excited about such cases that I want to do the work of figuring out a
way to implement recursive UNION by sorting.
        regards, tom lane



Re: Hashable custom types

От
Paul Ramsey
Дата:
On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>> Is it possible to make custom types hashable? There's no hook in the
>> CREATE TYPE call for a hash function, but can one be hooked up
>> somewhere else? In an operator?
>
> See 35.14.6., System Dependencies on Operator Classes

Coming back to this, I created an appropriate opclass...

CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry)
RETURNS boolean
AS '$libdir/postgis-2.2', 'lwgeom_hash_eq'
LANGUAGE 'c' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry)
RETURNS integer
AS '$libdir/postgis-2.2', 'lwgeom_hash'
LANGUAGE 'c' IMMUTABLE STRICT;

-- Availability: 0.9.0
CREATE OPERATOR == (
LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq,
COMMUTATOR = '==',
RESTRICT = contsel, JOIN = contjoinsel
);

CREATE OPERATOR CLASS hash_geometry_ops DEFAULT FOR TYPE geometry USING hash AS OPERATOR 1 == (geometry, geometry),
FUNCTION1 geometry_hash(geometry);
 

I even tested that it as an index!
 > create index hashidx on points using hash ( the_geom_webmercator); CREATE INDEX

But when I run my recursive query...

WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS (
   (SELECT     points.cartodb_id, points.cartodb_id as cluster_id,
points.the_geom_webmercator as geom   FROM points   WHERE points.cartodb_id in (select cartodb_id from points))   UNION
 (SELECT     pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom   FROM points pts   JOIN find_cluster n
ONST_DWithin(n.geom, pts.the_geom_webmercator, 2)   WHERE n.cartodb_id <> pts.cartodb_id)
 
)
SELECT * FROM find_cluster;

It still says I lack the secret sauce...

ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.

What's the sauce?

Thanks!

P




>
>
> --
> Peter Geoghegan



Re: Hashable custom types

От
Tom Lane
Дата:
Paul Ramsey <pramsey@cleverelephant.ca> writes:
> On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>>> Is it possible to make custom types hashable? There's no hook in the
>>> CREATE TYPE call for a hash function, but can one be hooked up
>>> somewhere else? In an operator?

>> See 35.14.6., System Dependencies on Operator Classes

> Coming back to this, I created an appropriate opclass...

> CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry)
> RETURNS boolean
> AS '$libdir/postgis-2.2', 'lwgeom_hash_eq'
> LANGUAGE 'c' IMMUTABLE STRICT;

Why are you creating a separate equality operator, rather than just using
the type's existing equality operator (I assume it's got one) in the
hash opclass?

> It still says I lack the secret sauce...

> ERROR:  could not implement recursive UNION
> DETAIL:  All column datatypes must be hashable.

UNION will preferentially glom onto the btree equality operator, if memory
serves.  If that isn't also the hash equality operator, things won't work
pleasantly.
        regards, tom lane



Re: Hashable custom types

От
Paul Ramsey
Дата:

> It still says I lack the secret sauce...

> ERROR: could not implement recursive UNION
> DETAIL: All column datatypes must be hashable.

UNION will preferentially glom onto the btree equality operator, if memory
serves. If that isn't also the hash equality operator, things won't work
pleasantly.

So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem? 

Re: Hashable custom types

От
Tom Lane
Дата:
Paul Ramsey <pramsey@cleverelephant.ca> writes:
>> UNION will preferentially glom onto the btree equality operator, if memory  
>> serves. If that isn't also the hash equality operator, things won't work  
>> pleasantly.  

> So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have
thisproblem? 
 

What I'm asking is why it would possibly be sensible to have different
notions of equality for hash and btree.  In every existing type that has
both btree and hash opclasses, the equality members of those opclasses
are *the same operator*.  You don't really want UNION giving different
answers depending on which implementation the planner happens to pick,
do you?
        regards, tom lane



Re: Hashable custom types

От
Paul Ramsey
Дата:
On July 8, 2015 at 1:36:49 PM, Tom Lane (tgl@sss.pgh.pa.us) wrote:
Paul Ramsey <pramsey@cleverelephant.ca> writes: 
>> UNION will preferentially glom onto the btree equality operator, if memory  
>> serves. If that isn't also the hash equality operator, things won't work  
>> pleasantly.  

> So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem?  

What I'm asking is why it would possibly be sensible to have different 
notions of equality for hash and btree. In every existing type that has 
both btree and hash opclasses, the equality members of those opclasses 
are *the same operator*. You don't really want UNION giving different 
answers depending on which implementation the planner happens to pick, 
do you? 

Well, that’s where things get pretty darn ugly. For reasons of practicality at the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way back in the mists of pre-history. (Basically the reasoning was “all our index work is about bounding boxes!”. I must admit, given my understanding of the zen of PostgreSQL now (and the features that PostgreSQL now has) that would not happen now.)

So… yeah. Probably ‘=‘ should be something else, probably something quite expensive but logically defensible  like a geometric equality test, but it’s not, and we have lots of third part stuff layered on top of us that expects existing semantics.

If getting the objects to UNION means that the btree and hash ops have to be the same, then that just means I’m SOL and will revert to my hack of just casting the objects to bytea to force them to UNION in the recursive CTE.

P.