Обсуждение: [SQL] death of array?

Поиск
Список
Период
Сортировка

[SQL] death of array?

От
Rob Sargent
Дата:
I believe I have an appropriate use[1] for an array column, but I’m having a hard time using that array in a join
clause.
The SQL question is how to use an array value in the join clause?  I’m using postgres 9.6 on ubuntu 16.04


create table probandset (id UUID, probands UUID[])
create table segment(id uuid, chr int, sbp int, epb int, probandset_id)
create table people_member(people_id uuid, person_id uuid)
create table people(id uuid, name text)
create table person(id uuid, name text)

probandset.probands is a set of person.id

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING: join (select id, unnest(probands) as proband from probandset
asl) as pu on s.probandset_id = pu.id join people_member pm on pu.proband = pm.person_id join people pl on pm.people_id
=pl.id 
where pl.name = ‘target population name’

The query I have works (showing only half of it here) and I’m not pushing the performance of it too much right now as
I’mmore interested in the SQL problem. However, I am getting a seq scan on people_member, not surprisingly.
People_memberwill be blocks of people, 50 to 1000 per block and each block loaded in a single transaction, no editing:
shouldthis be clustered (and reclustered)? Over time would the seq scan go away as people_member.people_id becomes more
discriminating?(There is an index on people_member.people_id). A people has a know set of probands (and we need all
subsetsof those probands) 

Current discussions at our end on whether or not a probandset may be filled with members of more than one people.  If
notI might be able to add people_id to probandset and I am home free. 
But that still doesn’t answer the SQL question.

Thanks for reading, sorry it’s a tad wordy.
rjs

[1] I’m dealing with power sets. A given set of N element has 2**N subsets and the number of subset grows exponentially
(obviously). But to model this ‘normally’ would require an even larger number of subset member records. (That summation
leftto the student[2]).  My choice was to list each subset once with and array of elements.  Still an exponential
problem,but only one exponential problem. 
It all breaks down somewhere between 20 and 30 elements but we'll burn that bridge when we get there.
[2] Something like the sum over i=0..N of ((N choose i) * i maybe?


Re: [SQL] death of array?

От
"David G. Johnston"
Дата:
On Thu, Apr 6, 2017 at 7:32 PM, Rob Sargent <robjsargent@gmail.com> wrote:

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
  join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:

​SELECT s.* implies semi-joins - so lets see how that would work.

SELECT vals.* 
FROM ( VALUES (2),(4) ) vals (v)
WHERE EXISTS (
    SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)
        WHERE v = ANY(i)
);
// 2

​HTH

David J.

Re: [SQL] death of array?

От
Achilleas Mantzios
Дата:
On 07/04/2017 06:02, David G. Johnston wrote:
On Thu, Apr 6, 2017 at 7:32 PM, Rob Sargent <robjsargent@gmail.com> wrote:

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
  join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:

​SELECT s.* implies semi-joins - so lets see how that would work.

SELECT vals.* 
FROM ( VALUES (2),(4) ) vals (v)
WHERE EXISTS (
    SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)
        WHERE v = ANY(i)
);
// 2

I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)


​HTH

David J.



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [SQL] death of array?

От
Rob Sargent
Дата:



On 04/07/2017 12:28 AM, Achilleas Mantzios wrote:
On 07/04/2017 06:02, David G. Johnston wrote:
On Thu, Apr 6, 2017 at 7:32 PM, Rob Sargent <robjsargent@gmail.com> wrote:

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
  join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:

​SELECT s.* implies semi-joins - so lets see how that would work.

SELECT vals.* 
FROM ( VALUES (2),(4) ) vals (v)
WHERE EXISTS (
    SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)
        WHERE v = ANY(i)
);
// 2

I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)


​HTH

David J.



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations?  Or must I do the work on the array in the where clause?

I do have a gin index on probandset(probands).

rjs

We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.

Re: [SQL] death of array?

От
"David G. Johnston"
Дата:
On Fri, Apr 7, 2017 at 8:22 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations?  Or must I do the work on the array in the where clause?

​field = any(array_col) works anywhere ...​

​David J.

Re: [SQL] death of array?

От
Rob Sargent
Дата:
On 04/07/2017 09:26 AM, David G. Johnston wrote:
On Fri, Apr 7, 2017 at 8:22 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations?  Or must I do the work on the array in the where clause?

​field = any(array_col) works anywhere ...​

​David J.

Understood but  true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col))  and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).

But my understanding of the documentation of ALL is debatable.  What does "the left-hand expression is evaluated" yield for people_member.person_id? The where clause would restrict it to one people_id but thats still more that what's in probandset.probands.

Re: [SQL] death of array?

От
"David G. Johnston"
Дата:
On Fri, Apr 7, 2017 at 8:57 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Understood but  true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col))  and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).

​I didn't actually attempt to comprehend your original email.  If you want to supply a self-contained, functioning, query and expected output (ideally something simpler but that covers your main question) I'd be inclined to dig further.  Even a broken one with what you think should work would be

person = ALL(persons) doesn't really seem like it will typically work.

Here are the various array operators supplied by PostgreSQL.  You might find one of them helpful.  In particular "contains".


David J.

Re: [SQL] death of array?

От
Rob Sargent
Дата:

On 04/07/2017 10:02 AM, David G. Johnston wrote:
On Fri, Apr 7, 2017 at 8:57 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Understood but  true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col))  and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).

​I didn't actually attempt to comprehend your original email.  If you want to supply a self-contained, functioning, query and expected output (ideally something simpler but that covers your main question) I'd be inclined to dig further.  Even a broken one with what you think should work would be

person = ALL(persons) doesn't really seem like it will typically work.

Here are the various array operators supplied by PostgreSQL.  You might find one of them helpful.  In particular "contains".


David J.


Well you prodding got me a seriously reduced execution time.  Here's the actual query. The tables are all in the 'seg' schema and not substantively different than described in first post.

create temp table opt10aut
as
with optset as (
    select s.id
           , s.markerset_id
           , s.startbase
           , s.endbase
           , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
    from seg.segment s
         join seg.probandset i on s.probandset_id = i.id
     join (select people_id, array_agg(person_id) as persons from seg.people_member group by people_id) as pa on i.probands <@ pa.persons
     join seg.people o on pa.people_id = o.id
   where
         s.markerset_id = '61801888-9a81-4187-922c-4d42c0471bea'
         and o.name = '709'
)
select m.name
       , min(pval)
from optset op
     join seg.markerset_member mm on op.markerset_id = mm.markerset_id
     join seg.marker m on mm.member_id = m.id
where
     m.basepos between op.startbase and op.endbase
group by m.name

It is true that this particular people has only 60 members.  Our upper end would be ~1000.

rjs


Re: [SQL] death of array?

От
Rob Sargent
Дата:
> On Apr 7, 2017, at 10:17 AM, Rob Sargent <robjsargent@gmail.com> wrote:
>
>
> On 04/07/2017 10:02 AM, David G. Johnston wrote:
>> On Fri, Apr 7, 2017 at 8:57 AM, Rob Sargent <robjsargent@gmail.com> wrote:
>> Understood but  true if any match is found, I need every array member to match as I want them all to be of a
specificpeople as input into the query (so field = all(array_col))  and "field" here would be people_member.person_id
andusing that yields zero results in full query (very quickly though). 
>>
>> ​I didn't actually attempt to comprehend your original email.  If you want to supply a self-contained, functioning,
queryand expected output (ideally something simpler but that covers your main question) I'd be inclined to dig further.
Even a broken one with what you think should work would be 
>>
>> person = ALL(persons) doesn't really seem like it will typically work.
>>
>> Here are the various array operators supplied by PostgreSQL.  You might find one of them helpful.  In particular
"contains".
>>
>> https://www.postgresql.org/docs/9.6/static/functions-array.html
>>
>> David J.
>>
>> ​
>
> Well you prodding got me a seriously reduced execution time.  Here's the actual query. The tables are all in the
'seg'schema and not substantively different than described in first post. 
>
> create temp table opt10aut
> as
> with optset as (
>     select s.id
>            , s.markerset_id
>            , s.startbase
>            , s.endbase
>            , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as
pval
>     from seg.segment s
>          join seg.probandset i on s.probandset_id = i.id
>      join (select people_id, array_agg(person_id) as persons from seg.people_member group by people_id) as pa on
i.probands<@ pa.persons 
>      join seg.people o on pa.people_id = o.id
>    where
>          s.markerset_id = '61801888-9a81-4187-922c-4d42c0471bea'
>          and o.name = '709'
> )
> select m.name
>        , min(pval)
> from optset op
>      join seg.markerset_member mm on op.markerset_id = mm.markerset_id
>      join seg.marker m on mm.member_id = m.id
> where
>      m.basepos between op.startbase and op.endbase
> group by m.name
>
> It is true that this particular people has only 60 members.  Our upper end would be ~1000.
>
> rjs
>
>
>

What I was hoping for, I guess, is a construct like   create table t(id int /*for Achilles*/, idset int[] references
v.field) 
and it would be understood that each element can be found in (preferably the index of) v.field
Oh, and an operator to make use of the knowledge something like   select * from t join v on t.idset “are_all_in”
v.field;

Thanks to all,
rjs




Re: [SQL] death of array?

От
Achilleas Mantzios
Дата:
On 07/04/2017 18:22, Rob Sargent wrote:



On 04/07/2017 12:28 AM, Achilleas Mantzios wrote:
On 07/04/2017 06:02, David G. Johnston wrote:
On Thu, Apr 6, 2017 at 7:32 PM, Rob Sargent <robjsargent@gmail.com> wrote:

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
  join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:

​SELECT s.* implies semi-joins - so lets see how that would work.

SELECT vals.* 
FROM ( VALUES (2),(4) ) vals (v)
WHERE EXISTS (
    SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)
        WHERE v = ANY(i)
);
// 2

I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)


​HTH

David J.



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations?  Or must I do the work on the array in the where clause?

I do have a gin index on probandset(probands).

Can you give the definition of this index? Does it get used ? Did you verify with EXPLAIN ANALYZE ?
At least in 9.3, AFAIK uuid[] has no operator class for access method "gin", unless I am missing smth.


rjs

We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: [SQL] death of array?

От
Rob Sargent
Дата:
Actually that index is not expected, by me at least, to be involved in this join. (I added the uuid gin as described in the archives. I'm using Postgres 9.6)

On Apr 10, 2017, at 12:50 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

On 07/04/2017 18:22, Rob Sargent wrote:



On 04/07/2017 12:28 AM, Achilleas Mantzios wrote:
On 07/04/2017 06:02, David G. Johnston wrote:
On Thu, Apr 6, 2017 at 7:32 PM, Rob Sargent <robjsargent@gmail.com> wrote:

I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
  join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:

​SELECT s.* implies semi-joins - so lets see how that would work.

SELECT vals.* 
FROM ( VALUES (2),(4) ) vals (v)
WHERE EXISTS (
    SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)
        WHERE v = ANY(i)
);
// 2

I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)


​HTH

David J.



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations?  Or must I do the work on the array in the where clause?

I do have a gin index on probandset(probands).

Can you give the definition of this index? Does it get used ? Did you verify with EXPLAIN ANALYZE ?
At least in 9.3, AFAIK uuid[] has no operator class for access method "gin", unless I am missing smth.


rjs

We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt