Обсуждение: array_except -- Find elements that are not common to both arrays

Поиск
Список
Период
Сортировка

array_except -- Find elements that are not common to both arrays

От
bricklen
Дата:
I recently had need of an "array_except" function but couldn't find
any good/existing examples. Based off the neat "array_intersect"
function at http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Intersection_of_arrays,
I put together an "array_except" version to return the array elements
that are not found in both arrays.
Can anyone think of a faster version of this function? Maybe in C?
The generate_series example takes about 3.5s on the dev db I'm testing
on, which isn't too bad (for my needs at least).

create or replace function array_except(anyarray,anyarray) returns
anyarray as $$
select array_agg(elements)
from    (
        (select unnest($1) except select unnest($2))
        union
        (select unnest($2) except select unnest($1))
        ) as r (elements)
$$ language sql strict immutable;

select array_except('{this,is,a,test}'::text[],'{also,part,of,a,test,run}'::text[]);

select array_to_relation(arr)
from array_except(      (select array_agg(n) from
generate_series(1,1000000,1) as n),
                        (select array_agg(n) from
generate_series(5,1000005,1) as n)
                ) as arr;

I'm testing on 9.0.4

Re: array_except -- Find elements that are not common to both arrays

От
Merlin Moncure
Дата:


On Thursday, September 29, 2011, bricklen <bricklen@gmail.com> wrote:
> I recently had need of an "array_except" function but couldn't find
> any good/existing examples. Based off the neat "array_intersect"
> function at http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks#Intersection_of_arrays,
> I put together an "array_except" version to return the array elements
> that are not found in both arrays.
> Can anyone think of a faster version of this function? Maybe in C?
> The generate_series example takes about 3.5s on the dev db I'm testing
> on, which isn't too bad (for my needs at least).
>
> create or replace function array_except(anyarray,anyarray) returns
> anyarray as $$
> select array_agg(elements)
> from    (
>        (select unnest($1) except select unnest($2))
>        union
>        (select unnest($2) except select unnest($1))
>        ) as r (elements)
> $$ language sql strict immutable;
>
> select array_except('{this,is,a,test}'::text[],'{also,part,of,a,test,run}'::text[]);
>
> select array_to_relation(arr)
> from array_except(      (select array_agg(n) fro> generate_series(1,1000000,1) as n),
>                        (select array_agg(n) from
> generate_series(5,1000005,1) as n)
>                ) as arr;
>
> I'm testing on 9.0.4
>r
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

*) Prefer union all to union
*) prefer array constructor to array_agg when not grouping.
*) perhaps consider not reusing 'except' name with different semantic meaning

Well done
merlin (on phone & in bed)

Re: array_except -- Find elements that are not common to both arrays

От
bricklen
Дата:
On Thu, Sep 29, 2011 at 8:08 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> *) Prefer union all to union
> *) prefer array constructor to array_agg when not grouping.
> *) perhaps consider not reusing 'except' name with different semantic
> meaning
>
> Well done
> merlin (on phone & in bed)

Hi Merlin,

Thanks for the tips. I have implemented suggestion 1 & 2, and that has
shaved about 1/2 of a second off of the generate_series example below
(3.52s -> 3.48s)

Do you have a suggestion for a better name? I considered array_unique,
array_distinct etc, but those don't really describe what is being
returned IMO. Something that conjures up the "return elements that are
not common to both arrays" would be nice.

create or replace function array_except(anyarray,anyarray) returns
anyarray as $$
select ARRAY(
(
select r.*
from    (
        (select unnest($1) except select unnest($2))
        union all
        (select unnest($2) except select unnest($1))
        ) as r (elements)
))
$$ language sql strict immutable;


select array_except('{this,is,a,test}'::text[],'{also,part,of,a,test}'::text[]);

select array_to_relation(arr)
from array_except( (select array_agg(n) from
generate_series(1,1000000,1) as n) , (select array_agg(n) from
generate_series(5,1000005,1) as n) ) as arr;


More improvement suggestions gladly accepted!

Re: array_except -- Find elements that are not common to both arrays

От
Vitalii Tymchyshyn
Дата:
Since you are using except and not except all, you are not looking at
arrays with duplicates.
For this case next function what the fastest for me:

create or replace function array_except2(anyarray,anyarray) returns
anyarray as $$
select ARRAY(
(
select r.elements
from    (
         (select 1,unnest($1))
         union all
         (select 2,unnest($2))
         ) as r (arr, elements)
     group by 1
     having min(arr)=max(arr)
))
$$ language sql strict immutable;

Best regards, Vitalii Tymchyshyn

Re: array_except -- Find elements that are not common to both arrays

От
bricklen
Дата:
On Fri, Sep 30, 2011 at 5:23 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
> Since you are using except and not except all, you are not looking at arrays
> with duplicates.
> For this case next function what the fastest for me:
>
> create or replace function array_except2(anyarray,anyarray) returns
> anyarray as $$
> select ARRAY(
> (
> select r.elements
> from    (
>        (select 1,unnest($1))
>        union all
>        (select 2,unnest($2))
>        ) as r (arr, elements)
>    group by 1
>    having min(arr)=max(arr)
> ))
> $$ language sql strict immutable;

Nice! Your version shaved almost a full second off, now 2.5s from 3.4s
it was earlier.

Re: array_except -- Find elements that are not common to both arrays

От
bricklen
Дата:
On Fri, Sep 30, 2011 at 5:23 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
> Since you are using except and not except all, you are not looking at arrays
> with duplicates.
> For this case next function what the fastest for me:
>
> create or replace function array_except2(anyarray,anyarray) returns
> anyarray as $$
> select ARRAY(
> (
> select r.elements
> from    (
>        (select 1,unnest($1))
>        union all
>        (select 2,unnest($2))
>        ) as r (arr, elements)
>    group by 1
>    having min(arr)=max(arr)
> ))
> $$ language sql strict immutable;
>

I've been informed that this type of operation is called "symmetric
difference"[1], and can be represented by A ∆ B.  A couple of
alternative names were proposed, "array_symmetric_difference" and
"array_xor".
Does anyone have a preference for the name? I assume that this
function might potentially be used by others now that it is in the pg
lists, so might as well give it an appropriate name now.
Is this something that could be written in C to make it faster (I don't know C)

[1] http://en.wikipedia.org/wiki/Symmetric_difference

Re: array_except -- Find elements that are not common to both arrays

От
Ben Chobot
Дата:
On Sep 30, 2011, at 12:07 PM, bricklen wrote:

> I've been informed that this type of operation is called "symmetric
> difference"[1], and can be represented by A ∆ B.  A couple of
> alternative names were proposed, "array_symmetric_difference" and
> "array_xor".
> Does anyone have a preference for the name? I assume that this
> function might potentially be used by others now that it is in the pg
> lists, so might as well give it an appropriate name now.
> Is this something that could be written in C to make it faster (I don't know C)

FWIW, speaking as somebody who has no need of this function, "array_xor" is a pretty clear name that indicates what's
goingto happen. 

Re: array_except -- Find elements that are not common to both arrays

От
Merlin Moncure
Дата:
On Fri, Sep 30, 2011 at 3:15 PM, Ben Chobot <bench@silentmedia.com> wrote:
>
> On Sep 30, 2011, at 12:07 PM, bricklen wrote:
>
>> I've been informed that this type of operation is called "symmetric
>> difference"[1], and can be represented by A ∆ B.  A couple of
>> alternative names were proposed, "array_symmetric_difference" and
>> "array_xor".
>> Does anyone have a preference for the name? I assume that this
>> function might potentially be used by others now that it is in the pg
>> lists, so might as well give it an appropriate name now.
>> Is this something that could be written in C to make it faster (I don't know C)
>
> FWIW, speaking as somebody who has no need of this function, "array_xor" is a pretty clear name that indicates what's
goingto happen. 

+1 on this -- was going to suggest until you beat me to it.  I also
for the record really think the array_ prefix on array handling
functions is pretty silly since we support overloading -- greatly
prefer unnest() to array_unnest() etc.  So, how about xor()?

merlin

Re: array_except -- Find elements that are not common to both arrays

От
bricklen
Дата:
On Fri, Sep 30, 2011 at 2:12 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> FWIW, speaking as somebody who has no need of this function, "array_xor" is a pretty clear name that indicates
what'sgoing to happen. 
>
> +1 on this -- was going to suggest until you beat me to it.  I also
> for the record really think the array_ prefix on array handling
> functions is pretty silly since we support overloading -- greatly
> prefer unnest() to array_unnest() etc.  So, how about xor()?

Makes sense, in light of your comment about overloading.

Re: array_except -- Find elements that are not common to both arrays

От
Gavin Flower
Дата:
On 01/10/11 01:23, Vitalii Tymchyshyn wrote:
> Since you are using except and not except all, you are not looking at
> arrays with duplicates.
> For this case next function what the fastest for me:
>
> create or replace function array_except2(anyarray,anyarray) returns
> anyarray as $$
> select ARRAY(
> (
> select r.elements
> from    (
>         (select 1,unnest($1))
>         union all
>         (select 2,unnest($2))
>         ) as r (arr, elements)
>     group by 1
>     having min(arr)=max(arr)
> ))
> $$ language sql strict immutable;
>
> Best regards, Vitalii Tymchyshyn
>
Very neat!

I could see that this function could trivially be modified to handle 3
arrays.

QUESTION: Could this be modified to take an arbitrary number of arrays?

Re: array_except -- Find elements that are not common to both arrays

От
Merlin Moncure
Дата:
On Tue, Oct 4, 2011 at 2:16 AM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 01/10/11 01:23, Vitalii Tymchyshyn wrote:
>>
>> Since you are using except and not except all, you are not looking at
>> arrays with duplicates.
>> For this case next function what the fastest for me:
>>
>> create or replace function array_except2(anyarray,anyarray) returns
>> anyarray as $$
>> select ARRAY(
>> (
>> select r.elements
>> from    (
>>        (select 1,unnest($1))
>>        union all
>>        (select 2,unnest($2))
>>        ) as r (arr, elements)
>>    group by 1
>>    having min(arr)=max(arr)
>> ))
>> $$ language sql strict immutable;
>>
>> Best regards, Vitalii Tymchyshyn
>>
> Very neat!
>
> I could see that this function could trivially be modified to handle 3
> arrays.
>
> QUESTION: Could this be modified to take an arbitrary number of arrays?

hm good question.  not in sql aiui, because variadic arguments are
pushed through as arrays, and there is no such thing in postgres as a
'anyarray[]' (or any array of array for that matter).

in c, you get to do more detail processing of variadic arguments, so
you could probably rig something that way -- but the implementation
would be completely different.

alternate way to avoid the variadic problem would be to make an xor()
aggregate which chains the arrays down using the 'all sql' method
posted above -- not as fast maybe, but pretty darn cool if you ask me.

merlin