Обсуждение: Calculating Minkowski distance between two rows

Поиск
Список
Период
Сортировка

Calculating Minkowski distance between two rows

От
Babak Alipour
Дата:
Greetings everyone,

I'm a novice plpgsql user.
For an application, I'm trying to write a user-defined function that takes a row of some table (let's say with k fields) and takes another row from another table (again with k fields); then calculate the Euclidean, Manhattan or generally Minkowski distance (with some p) and then return an integer. 
I've written this:

CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test, OUT distance DOUBLE PRECISION) AS $$
DECLARE
tmp DOUBLE PRECISION;
BEGIN
FOR col IN SELECT column_name FROM information_schema.columns WHERE table_name=table_train LOOP
   tmp := (row1.col - row2.col);
   distance += tmp*tmp;
END LOOP;
distance := sqrt(distance);
END;
$$ LANGUAGE plpgsql;

Could anyone please help me fix this function so that I can pass any two rows of two tables (with same number of columns) and have their distance returned.

Best regards,
Babak Alipour

--
Babak Alipour ,
University of Florida

Re: Calculating Minkowski distance between two rows

От
Adrian Klaver
Дата:
On 04/25/2016 07:07 AM, Babak Alipour wrote:
> Greetings everyone,
>
> I'm a novice plpgsql user.
> For an application, I'm trying to write a user-defined function that
> takes a row of some table (let's say with k fields) and takes another
> row from another table (again with k fields); then calculate the
> Euclidean, Manhattan or generally Minkowski distance (with some p) and
> then return an integer.
> I've written this:
>
> CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test,
> OUT distance DOUBLE PRECISION) AS $$
> DECLARE
> tmp DOUBLE PRECISION;
> BEGIN
> FOR col IN SELECT column_name FROM information_schema.columns WHERE
> table_name=table_train LOOP
>    tmp := (row1.col - row2.col);
>    distance += tmp*tmp;
> END LOOP;
> distance := sqrt(distance);
> END;
> $$ LANGUAGE plpgsql;
>
> Could anyone please help me fix this function so that I can pass any two
> rows of two tables (with same number of columns) and have their distance
> returned.

You are already doing that, so do you mean any two rows of any two tables?

>
> Best regards,
> Babak Alipour
>
> --
> */Babak Alipour ,/*
> */University of Florida/*


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Calculating Minkowski distance between two rows

От
Babak Alipour
Дата:
That is correct. The function I've written only works when the two tables are named table_train and table_test; is it possible to generalize that to take in any two tables?

Thanks in advance.

>Babak

On Mon, Apr 25, 2016 at 10:24 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 04/25/2016 07:07 AM, Babak Alipour wrote:
Greetings everyone,

I'm a novice plpgsql user.
For an application, I'm trying to write a user-defined function that
takes a row of some table (let's say with k fields) and takes another
row from another table (again with k fields); then calculate the
Euclidean, Manhattan or generally Minkowski distance (with some p) and
then return an integer.
I've written this:

CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test,
OUT distance DOUBLE PRECISION) AS $$
DECLARE
tmp DOUBLE PRECISION;
BEGIN
FOR col IN SELECT column_name FROM information_schema.columns WHERE
table_name=table_train LOOP
   tmp := (row1.col - row2.col);
   distance += tmp*tmp;
END LOOP;
distance := sqrt(distance);
END;
$$ LANGUAGE plpgsql;

Could anyone please help me fix this function so that I can pass any two
rows of two tables (with same number of columns) and have their distance
returned.

You are already doing that, so do you mean any two rows of any two tables?


Best regards,
Babak Alipour

--
*/Babak Alipour ,/*
*/University of Florida/*


--
Adrian Klaver
adrian.klaver@aklaver.com



--
Babak Alipour ,
University of Florida

Re: Calculating Minkowski distance between two rows

От
Adrian Klaver
Дата:
On 04/25/2016 07:26 AM, Babak Alipour wrote:
> That is correct. The function I've written only works when the two
> tables are named table_train and table_test; is it possible to
> generalize that to take in any two tables?

I'm heading out the door and off the top of my head:

1) Get tables names as text.

2) Get the row conditions as text.

3) Use EXECUTE to build a query string:
http://www.postgresql.org/docs/9.5/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

In particular:
"A cleaner approach is to use format()'s %I specification for table or
column names (strings separated by a newline are concatenated):"

http://www.postgresql.org/docs/9.5/static/functions-string.html#FUNCTIONS-STRING-FORMAT

4) SELECT the result INTO a record variable:

http://www.postgresql.org/docs/9.5/interactive/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS

5) Do your calculations

6) Not considered, validating that number of table columns are the same
and the types are compatible.

>
> Thanks in advance.
>
>>Babak
>
> On Mon, Apr 25, 2016 at 10:24 AM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 04/25/2016 07:07 AM, Babak Alipour wrote:
>
>         Greetings everyone,
>
>         I'm a novice plpgsql user.
>         For an application, I'm trying to write a user-defined function that
>         takes a row of some table (let's say with k fields) and takes
>         another
>         row from another table (again with k fields); then calculate the
>         Euclidean, Manhattan or generally Minkowski distance (with some
>         p) and
>         then return an integer.
>         I've written this:
>
>         CREATE FUNCTION euclidean_distance(row1 table_train, row2
>         table_test,
>         OUT distance DOUBLE PRECISION) AS $$
>         DECLARE
>         tmp DOUBLE PRECISION;
>         BEGIN
>         FOR col IN SELECT column_name FROM information_schema.columns WHERE
>         table_name=table_train LOOP
>            tmp := (row1.col - row2.col);
>            distance += tmp*tmp;
>         END LOOP;
>         distance := sqrt(distance);
>         END;
>         $$ LANGUAGE plpgsql;
>
>         Could anyone please help me fix this function so that I can pass
>         any two
>         rows of two tables (with same number of columns) and have their
>         distance
>         returned.
>
>
>     You are already doing that, so do you mean any two rows of any two
>     tables?
>
>
>         Best regards,
>         Babak Alipour
>
>         --
>         */Babak Alipour ,/*
>         */University of Florida/*
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>
>
> --
> */Babak Alipour ,/*
> */University of Florida/*


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Calculating Minkowski distance between two rows

От
Francisco Olarte
Дата:
Hi:

On Mon, Apr 25, 2016 at 4:26 PM, Babak Alipour <babak.alipour@gmail.com> wrote:
> That is correct. The function I've written only works when the two tables
> are named table_train and table_test; is it possible to generalize that to
> take in any two tables?

And only when all table_train columns are numbers AND table_test
contains at least all of them AND they are numbers too.

Wouldn't it be easier to use numeric arrays to represent coordinate vectors?

Anyway, I'm not versed in all of this, wbut I would try to make a
function to turn a table record to a numeric array and then write the
numeric array version of the func and call them, divide and conquer.

Francisco Olarte.


Re: Calculating Minkowski distance between two rows

От
Begin Daniel
Дата:
I am inclined to go with Francisco's solution
Daniel

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Francisco Olarte
Sent: April-25-16 10:46
To: Babak Alipour
Cc: Adrian Klaver; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Calculating Minkowski distance between two rows

Hi:

On Mon, Apr 25, 2016 at 4:26 PM, Babak Alipour <babak.alipour@gmail.com> wrote:
> That is correct. The function I've written only works when the two 
> tables are named table_train and table_test; is it possible to 
> generalize that to take in any two tables?

And only when all table_train columns are numbers AND table_test contains at least all of them AND they are numbers
too.

Wouldn't it be easier to use numeric arrays to represent coordinate vectors?

Anyway, I'm not versed in all of this, wbut I would try to make a function to turn a table record to a numeric array
andthen write the numeric array version of the func and call them, divide and conquer.
 

Francisco Olarte.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general