Re: knngist - 0.8

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: knngist - 0.8
Дата
Msg-id 9909.1298009274@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: knngist - 0.8  (Teodor Sigaev <teodor@sigaev.ru>)
Ответы Re: knngist - 0.8  (Robert Haas <robertmhaas@gmail.com>)
Re: knngist - 0.8  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
>> I've applied all of this, and written documentation for all of it,
> Thank you a lot
>> except for the contrib/btree_gist additions which still need to be
>> redone for the revised API (and then documented!).  My patience ran out

> Done, btree_gist is reworked for a new API.

I did a quick look at this patch.  The major problem with it is of
course that it needs to be fixed for the recent extension-related
changes.  I transposed the .sql.in changes into additions to
btree_gist--1.0.sql (attached), but haven't really sanity-checked
them beyond checking that the regression tests pass.  The same mods
would need to be made in btree_gist--unpackaged--1.0.sql.

However, I feel that this is not ready to apply even given those fixes.
Problems yet to solve:

1. oid_dist() returns oid ... really?  Oid is unsigned.  I'd be inclined
to argue though that distance between Oids is a meaningless concept, so
you should remove this not just mess with the result type.  Anybody who
actually wants to form a distance between Oids should have to cast them
to an arithmetic type first.  Let the user figure out how wraparound
cases should be handled.

2. Beyond that, none of the distance routines have given any thought to
avoiding overflow.  For instance, dist_int2 had better return something
wider than int2, and so on up.  It looks to me like the internal gist
distance functions also suffer overflow risks, in that they tend to form
the difference first (in the source datatype) and only afterwards cast
to float8.

3. I was surprised that there wasn't a distance implementation for
numeric.  I suppose that this might be difficult to do without risking
overflow in conversion to float8, though.

4. I didn't much care for changing the result type of gbt_num_consistent
from bool to float8; that's just messy, and I don't see any compensating
advantage.  I suggest you leave gbt_num_consistent and its callers
alone, and add a separate gbt_num_distance routine that only handles the
KNNDistance case.

There might be more issues, I haven't read the patch in detail.
But anyway, I'm going to set it to Waiting on Author.  I think it
needs at least a day or so's work, and I can't put in that kind of
time on it now.

            regards, tom lane


diff --git a/contrib/btree_gist/btree_gist--1.0.sql b/contrib/btree_gist/btree_gist--1.0.sql
index 1ea5fa2db418d06f991f362a956fec07ccfba9e9..dd428995c185a09519ed65433081752b83b71bb7 100644
*** a/contrib/btree_gist/btree_gist--1.0.sql
--- b/contrib/btree_gist/btree_gist--1.0.sql
*************** CREATE TYPE gbtreekey_var (
*** 81,86 ****
--- 81,231 ----
      STORAGE = EXTENDED
  );

+ --distance operators
+
+ CREATE FUNCTION cash_dist(money, money)
+ RETURNS money
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = money,
+     RIGHTARG = money,
+     PROCEDURE = cash_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION date_dist(date, date)
+ RETURNS int4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = date,
+     RIGHTARG = date,
+     PROCEDURE = date_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION float4_dist(float4, float4)
+ RETURNS float4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = float4,
+     RIGHTARG = float4,
+     PROCEDURE = float4_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION float8_dist(float8, float8)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = float8,
+     RIGHTARG = float8,
+     PROCEDURE = float8_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION int2_dist(int2, int2)
+ RETURNS int2
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = int2,
+     RIGHTARG = int2,
+     PROCEDURE = int2_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION int4_dist(int4, int4)
+ RETURNS int4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = int4,
+     RIGHTARG = int4,
+     PROCEDURE = int4_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION int8_dist(int8, int8)
+ RETURNS int8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = int8,
+     RIGHTARG = int8,
+     PROCEDURE = int8_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION interval_dist(interval, interval)
+ RETURNS interval
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = interval,
+     RIGHTARG = interval,
+     PROCEDURE = interval_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION oid_dist(oid, oid)
+ RETURNS oid
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = oid,
+     RIGHTARG = oid,
+     PROCEDURE = oid_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION time_dist(time, time)
+ RETURNS interval
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = time,
+     RIGHTARG = time,
+     PROCEDURE = time_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION ts_dist(timestamp, timestamp)
+ RETURNS interval
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = timestamp,
+     RIGHTARG = timestamp,
+     PROCEDURE = ts_dist,
+     COMMUTATOR = '<->'
+ );
+
+ CREATE FUNCTION tstz_dist(timestamptz, timestamptz)
+ RETURNS interval
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
+ CREATE OPERATOR <-> (
+     LEFTARG = timestamptz,
+     RIGHTARG = timestamptz,
+     PROCEDURE = tstz_dist,
+     COMMUTATOR = '<->'
+ );


  --
*************** RETURNS bool
*** 96,101 ****
--- 241,251 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_oid_distance(internal,oid,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_oid_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 154,160 ****
  -- that's the only state that can be reproduced during an upgrade from 9.0.

  ALTER OPERATOR FAMILY gist_oid_ops USING gist ADD
!     OPERATOR    6    <> (oid, oid) ;


  --
--- 304,312 ----
  -- that's the only state that can be reproduced during an upgrade from 9.0.

  ALTER OPERATOR FAMILY gist_oid_ops USING gist ADD
!     OPERATOR    6    <> (oid, oid) ,
!     OPERATOR    15    <-> (oid, oid) FOR ORDER BY pg_catalog.oid_ops ,
!     FUNCTION    8 (oid, oid) gbt_oid_distance (internal, oid, int2, oid) ;


  --
*************** RETURNS bool
*** 170,175 ****
--- 322,332 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_int2_distance(internal,int2,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_int2_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 214,220 ****
      STORAGE        gbtreekey4;

  ALTER OPERATOR FAMILY gist_int2_ops USING gist ADD
!     OPERATOR    6    <> (int2, int2) ;


  --
--- 371,379 ----
      STORAGE        gbtreekey4;

  ALTER OPERATOR FAMILY gist_int2_ops USING gist ADD
!     OPERATOR    6    <> (int2, int2) ,
!     OPERATOR    15    <-> (int2, int2) FOR ORDER BY pg_catalog.integer_ops ,
!     FUNCTION    8 (int2, int2) gbt_int2_distance (internal, int2, int2, oid) ;


  --
*************** RETURNS bool
*** 230,235 ****
--- 389,399 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_int4_distance(internal,int4,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_int4_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 274,280 ****
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD
!     OPERATOR    6    <> (int4, int4) ;


  --
--- 438,446 ----
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD
!     OPERATOR    6    <> (int4, int4) ,
!     OPERATOR    15    <-> (int4, int4) FOR ORDER BY pg_catalog.integer_ops ,
!     FUNCTION    8 (int4, int4) gbt_int4_distance (internal, int4, int2, oid) ;


  --
*************** RETURNS bool
*** 290,295 ****
--- 456,466 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_int8_distance(internal,int8,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_int8_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 334,340 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_int8_ops USING gist ADD
!     OPERATOR    6    <> (int8, int8) ;


  --
--- 505,513 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_int8_ops USING gist ADD
!     OPERATOR    6    <> (int8, int8) ,
!     OPERATOR    15    <-> (int8, int8) FOR ORDER BY pg_catalog.integer_ops ,
!     FUNCTION    8 (int8, int8) gbt_int8_distance (internal, int8, int2, oid) ;


  --
*************** RETURNS bool
*** 350,355 ****
--- 523,533 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_float4_distance(internal,float4,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_float4_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 394,400 ****
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_float4_ops USING gist ADD
!     OPERATOR    6    <> (float4, float4) ;


  --
--- 572,580 ----
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_float4_ops USING gist ADD
!     OPERATOR    6    <> (float4, float4) ,
!     OPERATOR    15    <-> (float4, float4) FOR ORDER BY pg_catalog.float_ops ,
!     FUNCTION    8 (float4, float4) gbt_float4_distance (internal, float4, int2, oid) ;


  --
*************** RETURNS bool
*** 410,415 ****
--- 590,600 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_float8_distance(internal,float8,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_float8_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 454,460 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_float8_ops USING gist ADD
!     OPERATOR    6    <> (float8, float8) ;


  --
--- 639,647 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_float8_ops USING gist ADD
!     OPERATOR    6    <> (float8, float8) ,
!     OPERATOR    15    <-> (float8, float8) FOR ORDER BY pg_catalog.float_ops ,
!     FUNCTION    8 (float8, float8) gbt_float8_distance (internal, float8, int2, oid) ;


  --
*************** RETURNS bool
*** 470,480 ****
--- 657,677 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_ts_distance(internal,timestamp,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_tstz_consistent(internal,timestamptz,int2,oid,internal)
  RETURNS bool
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_tstz_distance(internal,timestamptz,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_ts_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 524,530 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_timestamp_ops USING gist ADD
!     OPERATOR    6    <> (timestamp, timestamp) ;


  -- Create the operator class
--- 721,729 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_timestamp_ops USING gist ADD
!     OPERATOR    6    <> (timestamp, timestamp) ,
!     OPERATOR    15    <-> (timestamp, timestamp) FOR ORDER BY pg_catalog.interval_ops ,
!     FUNCTION    8 (timestamp, timestamp) gbt_ts_distance (internal, timestamp, int2, oid) ;


  -- Create the operator class
*************** AS
*** 546,552 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_timestamptz_ops USING gist ADD
!     OPERATOR    6    <> (timestamptz, timestamptz) ;


  --
--- 745,753 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_timestamptz_ops USING gist ADD
!     OPERATOR    6    <> (timestamptz, timestamptz) ,
!     OPERATOR    15    <-> (timestamptz, timestamptz) FOR ORDER BY pg_catalog.interval_ops ,
!     FUNCTION    8 (timestamptz, timestamptz) gbt_tstz_distance (internal, timestamptz, int2, oid) ;


  --
*************** RETURNS bool
*** 562,567 ****
--- 763,773 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_time_distance(internal,time,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_timetz_consistent(internal,timetz,int2,oid,internal)
  RETURNS bool
  AS 'MODULE_PATHNAME'
*************** AS
*** 616,622 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_time_ops USING gist ADD
!     OPERATOR    6    <> (time, time) ;


  CREATE OPERATOR CLASS gist_timetz_ops
--- 822,830 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_time_ops USING gist ADD
!     OPERATOR    6    <> (time, time) ,
!     OPERATOR    15    <-> (time, time) FOR ORDER BY pg_catalog.interval_ops ,
!     FUNCTION    8 (time, time) gbt_time_distance (internal, time, int2, oid) ;


  CREATE OPERATOR CLASS gist_timetz_ops
*************** RETURNS bool
*** 653,658 ****
--- 861,871 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_date_distance(internal,date,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_date_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 697,703 ****
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_date_ops USING gist ADD
!     OPERATOR    6    <> (date, date) ;


  --
--- 910,918 ----
      STORAGE        gbtreekey8;

  ALTER OPERATOR FAMILY gist_date_ops USING gist ADD
!     OPERATOR    6    <> (date, date) ,
!     OPERATOR    15    <-> (date, date) FOR ORDER BY pg_catalog.integer_ops ,
!     FUNCTION    8 (date, date) gbt_date_distance (internal, date, int2, oid) ;


  --
*************** RETURNS bool
*** 713,718 ****
--- 928,938 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_intv_distance(internal,interval,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_intv_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 762,768 ****
      STORAGE        gbtreekey32;

  ALTER OPERATOR FAMILY gist_interval_ops USING gist ADD
!     OPERATOR    6    <> (interval, interval) ;


  --
--- 982,990 ----
      STORAGE        gbtreekey32;

  ALTER OPERATOR FAMILY gist_interval_ops USING gist ADD
!     OPERATOR    6    <> (interval, interval) ,
!     OPERATOR    15    <-> (interval, interval) FOR ORDER BY pg_catalog.interval_ops ,
!     FUNCTION    8 (interval, interval) gbt_intv_distance (internal, interval, int2, oid) ;


  --
*************** RETURNS bool
*** 778,783 ****
--- 1000,1010 ----
  AS 'MODULE_PATHNAME'
  LANGUAGE C IMMUTABLE STRICT;

+ CREATE FUNCTION gbt_cash_distance(internal,money,int2,oid)
+ RETURNS float8
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C IMMUTABLE STRICT;
+
  CREATE FUNCTION gbt_cash_compress(internal)
  RETURNS internal
  AS 'MODULE_PATHNAME'
*************** AS
*** 822,828 ****
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_cash_ops USING gist ADD
!     OPERATOR    6    <> (money, money) ;


  --
--- 1049,1057 ----
      STORAGE        gbtreekey16;

  ALTER OPERATOR FAMILY gist_cash_ops USING gist ADD
!     OPERATOR    6    <> (money, money) ,
!     OPERATOR    15    <-> (money, money) FOR ORDER BY pg_catalog.money_ops ,
!     FUNCTION    8 (money, money) gbt_cash_distance (internal, money, int2, oid) ;


  --

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Itagaki Takahiro
Дата:
Сообщение: Re: CommitFest 2011-01 as of 2011-02-04
Следующее
От: Gan Jiadong
Дата:
Сообщение: Re: About the performance of startup after dropping many tables