Обсуждение: array/function question

Поиск
Список
Период
Сортировка

array/function question

От
Joshua Berry
Дата:
Hello All,

I'm trying to optimize a few slow queries and helper functions, and
have found a poor performing function. To improve performance, I'd
like to create a function that does the following:


Inputs:
A: an array of integers. for example: { 1, 2, 3, 4, 7 }
B: an array of integers. for example: { 1, 4, 8, 9 }

Returns
C: an array of bools the same dimensions as Array A. In this example:
{ true, false, false, false, true, false }

Effectively, this function would use Array A as a set of boolean tests
to exercise on Array B. The result array will have the save number of
elements as array A.

What I lack is the knowledge of how to
1. index and compare arrays when their input size is not known. (I
only know how to use hardcoded indexes like A[1], B[2], etc.
2. To use control structures for recursion/looping. I've read
http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html 
  but still not sure how to apply the grammar to arrays data types.

If there is a builtin array function that achieves this, that would be
good to know as well.

Cheers,

-Joshua

Joshua Berry

Re: array/function question

От
Alvaro Herrera
Дата:
Joshua Berry escribió:

> Inputs:
> A: an array of integers. for example: { 1, 2, 3, 4, 7 }
> B: an array of integers. for example: { 1, 4, 8, 9 }
>
> Returns
> C: an array of bools the same dimensions as Array A. In this example: {
> true, false, false, false, true, false }
>
> Effectively, this function would use Array A as a set of boolean tests
> to exercise on Array B. The result array will have the save number of
> elements as array A.

I think this is much easier to write in PL/Perl than PL/pgSQL.  Trivial
in fact.  Your example is flawed though (three falses instead of two) ...
I think it looks like this:

create or replace function is_element_present(int[], int[]) returns bool[] language plperl as $$
  $a = shift;
  $b = shift;
  if ($a =~ /{(.*)}/) {
     @a = split /,/, $1
  }
  if ($b =~ /{(.*)}/) {
     @b = split /,/, $1
  }
  for my $k (@b) {
    $h{$k} = 1;
  }
  @c = map { if (defined $h{$_}) { 1 } else { 0 }  } @a;
  return \@c;
$$;

Hmm, well, the fact that PL/Perl passes arrays as string kinda sucks --
fixing that takes half the code of the function!

alvherre=# select is_element_present('{1,2,3,4,7}', '{1,4,8,9}');
 is_element_present
--------------------
 {t,f,f,t,f}
(1 fila)


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: array/function question

От
Nagy Zoltan
Дата:
hi,


you should use something similar to 'merge sort'
 but only if your input is sorted (m_bx expects this)

if your subjects (numbers) are not going beyond a certain limit eg(65535)
take up an array and filter

you can generate a poly for array B's roots, and calculate A's points
-where it's 0, then the B array have the value ;)))

writing the function in C is not so easy but it will be fast ;)


create or replace function m_bx(a integer[],b integer[])
    returns    boolean[]
as
$BODY$
    declare    res    boolean[];
    declare    i    integer;
    declare    j    integer;
    declare    la    integer;
    declare    lb    integer;
begin
    i=1;
    j=1;
    la=array_upper(a,1);
    lb=array_upper(b,1);
    loop
        if i>la then
            exit;
        end if;
        if (j<=lb and a[i] = b[j]) then
            res[i]=true;
        else
            res[i]=false;
        end if;
        if(b[j]<a[i]) then
            j=j+1;
        else
            i=i+1;
        end if;
    end loop;

    return    res;
end;
$BODY$
    LANGUAGE 'plpgsql' IMMUTABLE
    COST 100;

select m_bx('{1,2,4,5}','{1,5,6}');


Joshua Berry wrote:
> Hello All,
>
> I'm trying to optimize a few slow queries and helper functions, and have
> found a poor performing function. To improve performance, I'd like to
> create a function that does the following:
>
>
> Inputs:
> A: an array of integers. for example: { 1, 2, 3, 4, 7 }
> B: an array of integers. for example: { 1, 4, 8, 9 }
>
> Returns
> C: an array of bools the same dimensions as Array A. In this example: {
> true, false, false, false, true, false }
>
> Effectively, this function would use Array A as a set of boolean tests
> to exercise on Array B. The result array will have the save number of
> elements as array A.
>
> What I lack is the knowledge of how to
> 1. index and compare arrays when their input size is not known. (I only
> know how to use hardcoded indexes like A[1], B[2], etc.
> 2. To use control structures for recursion/looping. I've read
> http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html but
> still not sure how to apply the grammar to arrays data types.
>
> If there is a builtin array function that achieves this, that would be
> good to know as well.
>
> Cheers,
>
> -Joshua
>
> Joshua Berry
>


Re: array/function question

От
Pavel Stehule
Дата:
2009/5/18 Joshua Berry <yoberi@gmail.com>:
> Hello All,
>
> I'm trying to optimize a few slow queries and helper functions, and have
> found a poor performing function. To improve performance, I'd like to create
> a function that does the following:
>
>
> Inputs:
> A: an array of integers. for example: { 1, 2, 3, 4, 7 }
> B: an array of integers. for example: { 1, 4, 8, 9 }
>

hello

try to SQL language

postgres=# create or replace function xx(anyarray, anyarray) returns
bool[] as $$
select array(select (select x = any(select y from unnest($2) g2(y)))
from unnest($1) g(x))
$$ language sql immutable;
CREATE FUNCTION
Time: 1,846 ms
postgres=# select xx(array[1,2,3,4,7], array[1,4,8,9]);
    xx
-------------
 {t,f,f,t,f}
(1 row)

if you know, so input are distinct and sorted, then you could to use function:


postgres=# create or replace function xy(anyarray, anyarray) returns
bool[] as $$
  select array(select y is not null from unnest($1) g1(x) left join
unnest($2) g2(y) on x = y order by x);
$$ language sql immutable;
CREATE FUNCTION
Time: 2,666 ms
postgres=#  select xx(array[1,2,3,4,7], array[1,4,8,9]);     xx
-------------
 {t,f,f,t,f}
(1 row)

regards
Pavel Stehule

regards
Pavel Stehule

> Returns
> C: an array of bools the same dimensions as Array A. In this example: {
> true, false, false, false, true, false }
>
> Effectively, this function would use Array A as a set of boolean tests to
> exercise on Array B. The result array will have the save number of elements
> as array A.
>
> What I lack is the knowledge of how to
> 1. index and compare arrays when their input size is not known. (I only know
> how to use hardcoded indexes like A[1], B[2], etc.
> 2. To use control structures for recursion/looping. I've read
> http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html but
> still not sure how to apply the grammar to arrays data types.
>
> If there is a builtin array function that achieves this, that would be good
> to know as well.
>
> Cheers,
>
> -Joshua
>
> Joshua Berry
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: array/function question

От
Joshua Berry
Дата:

you should use something similar to 'merge sort'
but only if your input is sorted (m_bx expects this)

In my case, order is not guaranteed, and as the result needs to match the order of the input, it seems that using some exhaustive tail recursive method is the way to go. (By that I mean a loop within a loop, testing up to m*n times where m and n are the length of the arrays passed in.

if your subjects (numbers) are not going beyond a certain limit eg(65535)
take up an array and filter

For my application, there will likely be no more than 20 elements in the array, so practical limits are not a problem.

you can generate a poly for array B's roots, and calculate A's points
-where it's 0, then the B array have the value ;)))

writing the function in C is not so easy but it will be fast ;)

Can anyone point me to documentation on the performance differences between plpgsql/plc/plperl/etc? I googled but only found a few offhanded comments from mailing list archives and online message boards. Are there any general guidelines on when it's a good idea to switch to a language other than plsql or plpsql? 

Here's my modified version of Nagy's function. This one allows unsorted array elements, ordering the tests by the order of the elements in the first array parameter.
Please forgive the lack of grace. I'd love tips on how to improve this! In particular, is there a better way to find the length of an array without having to piecewise handle the empty array case?

create or replace function m_bx(a integer[],b integer[])
  returns  boolean[]
AS
$BODY$
  declare res boolean[];
  declare i integer;
  declare j integer;
  declare la integer;
  declare lb integer;
begin
  i=1;
  j=1;
  -- array_upper returns NULL if the length of the array is 0, the following hacks provided the desired result for empty array cases
  --  la=array_upper(a,1);
  la = (select CASE WHEN count is null THEN 0 ELSE count END from (select array_upper(a::int[], 1) as count) as foo);
  --  lb=array_upper(b,1);
  lb = (select CASE WHEN count is null THEN 0 ELSE count END from (select array_upper(b::int[], 1) as count) as foo);
  loop
    if i>la then
      exit;
    end if;
    if (j>lb) then
      res[i]=false;
      j=1;
      i=i+1;
    else
      if (a[i] = b[j]) then
        --b contains this element, move to the next 
        res[i]=true;
        j=1;
        i=i+1;
      else
        j=j+1;
      end if;
    end if;
  end loop;
  return res;
end;
$BODY$
  LANGUAGE 'plpgsql' IMMUTABLE
  COST 100;

--Test cases to handle:
select m_bx('{1,2,5,4}','{5, 1, 4}'); --{t,f,t,t}
select m_bx('{1,2,5,4}','{5}'); --{f,f,t,f}
select m_bx('{1,2,5,4}','{}'); --{f,f,f,f}
select m_bx('{}'::int[],'{}'); --{}::bool

Regards,

Joshua Berry


On May 18, 2009, at 10:00 PM, Nagy Zoltan wrote:



create or replace function m_bx(a integer[],b integer[])
returns boolean[]
as
$BODY$
declare res boolean[];
declare i integer;
declare j integer;
declare la integer;
declare lb integer;
begin
i=1;
j=1;
la=array_upper(a,1);
lb=array_upper(b,1);
loop
if i>la then
exit;
end if;
if (j<=lb and a[i] = b[j]) then
res[i]=true;
else
res[i]=false;
end if;
if(b[j]<a[i]) then
j=j+1;
else
i=i+1;
end if;
end loop;

return res;
end;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE
COST 100;

select m_bx('{1,2,4,5}','{1,5,6}');


Joshua Berry wrote:
Hello All,

I'm trying to optimize a few slow queries and helper functions, and have
found a poor performing function. To improve performance, I'd like to
create a function that does the following:


Inputs:
A: an array of integers. for example: { 1, 2, 3, 4, 7 }
B: an array of integers. for example: { 1, 4, 8, 9 }

Returns
C: an array of bools the same dimensions as Array A. In this example: {
true, false, false, false, true, false }

Effectively, this function would use Array A as a set of boolean tests
to exercise on Array B. The result array will have the save number of
elements as array A.

What I lack is the knowledge of how to
1. index and compare arrays when their input size is not known. (I only
know how to use hardcoded indexes like A[1], B[2], etc.
2. To use control structures for recursion/looping. I've read
http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html but
still not sure how to apply the grammar to arrays data types.

If there is a builtin array function that achieves this, that would be
good to know as well.

Cheers,

-Joshua

Joshua Berry



Re: array/function question

От
Alvaro Herrera
Дата:
Joshua Berry escribió:

> Please forgive the lack of grace. I'd love tips on how to improve this!

Tip: follow Pavel's suggestion.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: array/function question

От
Alvaro Herrera
Дата:
Pavel Stehule escribió:

> postgres=# create or replace function xx(anyarray, anyarray) returns
> bool[] as $$
> select array(select (select x = any(select y from unnest($2) g2(y)))
> from unnest($1) g(x))
> $$ language sql immutable;
> CREATE FUNCTION

There ain't no unnest() function in 8.3 ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: array/function question

От
Pavel Stehule
Дата:
2009/5/19 Alvaro Herrera <alvherre@commandprompt.com>:
> Pavel Stehule escribió:
>
>> postgres=# create or replace function xx(anyarray, anyarray) returns
>> bool[] as $$
>> select array(select (select x = any(select y from unnest($2) g2(y)))
>> from unnest($1) g(x))
>> $$ language sql immutable;
>> CREATE FUNCTION
>
> There ain't no unnest() function in 8.3 ...

I am sorry

create or replace function unnest(anyarray) returns setof anyelement as $$
select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) g(i)
$$ language sql immutable;

when I looked on my code, it could be simplified

>> postgres=# create or replace function xx(anyarray, anyarray) returns
>> bool[] as $$
>> select array(select (select x = any($2)))
>> from unnest($1) g(x))
>> $$ language sql immutable;

regards
Pavel Stehule

>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>