Обсуждение: how to sort an input array before processing in pl/pgsql function

Поиск
Список
Период
Сортировка

how to sort an input array before processing in pl/pgsql function

От
Kenneth Marshall
Дата:
Dear PostgreSQL community,

I have the following function used to select tokens individually
from a table and avoid a sequential scan of the table:

create function lookup_tokens(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
    where uid=$1
      and token in (select $2[i]
        from generate_series(array_lower($2,1),array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';

I would like to execute the same lookup but with the array sorted. I
used the following definition of sort:

CREATE OR REPLACE FUNCTION sort(anyarray)
RETURNS anyarray AS $$
  SELECT array(SELECT * FROM unnest($1) ORDER BY 1);
$$ language sql;

and added a call to it to the function:

create function lookup_tokens_sort(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
    where uid=$1
      and token in (select (sort($2))[i]
        from generate_series(array_lower($2,1),array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';

The problem with this is that the sort() appears to be called once
per token because the run without the sort takes < 1 millisecond and
with the sort take about 4 seconds. Is there a way to call the sort()
only once and use the result in the loops instead of sorting each
time. Any help would be appreciated. The actual sorted loop will
be needed for the update functions to prevent deadlocks.

Regards,
Ken

Re: how to sort an input array before processing in pl/pgsql function

От
Kenneth Marshall
Дата:
On Fri, Jul 23, 2010 at 11:36:14AM -0500, Kenneth Marshall wrote:
> Dear PostgreSQL community,
>
> I have the following function used to select tokens individually
> from a table and avoid a sequential scan of the table:
>
> create function lookup_tokens(integer,bigint[])
>   returns setof dspam_token_data
>   language plpgsql stable
>   as '
> declare
>   v_rec record;
> begin
>   for v_rec in select * from dspam_token_data
>     where uid=$1
>       and token in (select $2[i]
>         from generate_series(array_lower($2,1),array_upper($2,1)) s(i))
>   loop
>     return next v_rec;
>   end loop;
>   return;
> end;';
>
> I would like to execute the same lookup but with the array sorted. I
> used the following definition of sort:
>
> CREATE OR REPLACE FUNCTION sort(anyarray)
> RETURNS anyarray AS $$
>   SELECT array(SELECT * FROM unnest($1) ORDER BY 1);
> $$ language sql;
>
> and added a call to it to the function:
>
> create function lookup_tokens_sort(integer,bigint[])
>   returns setof dspam_token_data
>   language plpgsql stable
>   as '
> declare
>   v_rec record;
> begin
>   for v_rec in select * from dspam_token_data
>     where uid=$1
>       and token in (select (sort($2))[i]
>         from generate_series(array_lower($2,1),array_upper($2,1)) s(i))
>   loop
>     return next v_rec;
>   end loop;
>   return;
> end;';
>
> The problem with this is that the sort() appears to be called once
> per token because the run without the sort takes < 1 millisecond and
> with the sort take about 4 seconds. Is there a way to call the sort()
> only once and use the result in the loops instead of sorting each
> time. Any help would be appreciated. The actual sorted loop will
> be needed for the update functions to prevent deadlocks.
>
> Regards,
> Ken
>

As a follow-up, I can get the desired result by placing the
sort() call in the original SQL query:

SELECT * FROM lookup_tokens( xxx, sort(yyy));

Does anyone have any ideas on how to embed the sort()
call into the pl/pgsql function without a performance impact.

Cheers,
Ken