Обсуждение: [SQL] death of array?
I believe I have an appropriate use[1] for an array column, but I’m having a hard time using that array in a join clause. The SQL question is how to use an array value in the join clause? I’m using postgres 9.6 on ubuntu 16.04 create table probandset (id UUID, probands UUID[]) create table segment(id uuid, chr int, sbp int, epb int, probandset_id) create table people_member(people_id uuid, person_id uuid) create table people(id uuid, name text) create table person(id uuid, name text) probandset.probands is a set of person.id I need to gather all segments whose probandset is within in a specified people. select s.* from segment s join probandset ps on s.probandset_id = ps.id --PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING: join (select id, unnest(probands) as proband from probandset asl) as pu on s.probandset_id = pu.id join people_member pm on pu.proband = pm.person_id join people pl on pm.people_id =pl.id where pl.name = ‘target population name’ The query I have works (showing only half of it here) and I’m not pushing the performance of it too much right now as I’mmore interested in the SQL problem. However, I am getting a seq scan on people_member, not surprisingly. People_memberwill be blocks of people, 50 to 1000 per block and each block loaded in a single transaction, no editing: shouldthis be clustered (and reclustered)? Over time would the seq scan go away as people_member.people_id becomes more discriminating?(There is an index on people_member.people_id). A people has a know set of probands (and we need all subsetsof those probands) Current discussions at our end on whether or not a probandset may be filled with members of more than one people. If notI might be able to add people_id to probandset and I am home free. But that still doesn’t answer the SQL question. Thanks for reading, sorry it’s a tad wordy. rjs [1] I’m dealing with power sets. A given set of N element has 2**N subsets and the number of subset grows exponentially (obviously). But to model this ‘normally’ would require an even larger number of subset member records. (That summation leftto the student[2]). My choice was to list each subset once with and array of elements. Still an exponential problem,but only one exponential problem. It all breaks down somewhere between 20 and 30 elements but we'll burn that bridge when we get there. [2] Something like the sum over i=0..N of ((N choose i) * i maybe?
I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:
I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:SELECT s.* implies semi-joins - so lets see how that would work.SELECT vals.*FROM ( VALUES (2),(4) ) vals (v)WHERE EXISTS (SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)WHERE v = ANY(i));// 2
I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)
HTHDavid J.
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations? Or must I do the work on the array in the where clause?On 07/04/2017 06:02, David G. Johnston wrote:
I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:SELECT s.* implies semi-joins - so lets see how that would work.SELECT vals.*FROM ( VALUES (2),(4) ) vals (v)WHERE EXISTS (SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)WHERE v = ANY(i));// 2
I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)HTHDavid J.-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
I do have a gin index on probandset(probands).
rjs
We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.
Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations? Or must I do the work on the array in the where clause?
Understood but true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col)) and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations? Or must I do the work on the array in the where clause?field = any(array_col) works anywhere ...David J.
But my understanding of the documentation of ALL is debatable. What does "the left-hand expression is evaluated" yield for people_member.person_id? The where clause would restrict it to one people_id but thats still more that what's in probandset.probands.
Understood but true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col)) and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).
Understood but true if any match is found, I need every array member to match as I want them all to be of a specific people as input into the query (so field = all(array_col)) and "field" here would be people_member.person_id and using that yields zero results in full query (very quickly though).I didn't actually attempt to comprehend your original email. If you want to supply a self-contained, functioning, query and expected output (ideally something simpler but that covers your main question) I'd be inclined to dig further. Even a broken one with what you think should work would beperson = ALL(persons) doesn't really seem like it will typically work.Here are the various array operators supplied by PostgreSQL. You might find one of them helpful. In particular "contains".David J.
Well you prodding got me a seriously reduced execution time. Here's the actual query. The tables are all in the 'seg' schema and not substantively different than described in first post.
create temp table opt10aut
as
with optset as (
select s.id
, s.markerset_id
, s.startbase
, s.endbase
, ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
from seg.segment s
join seg.probandset i on s.probandset_id = i.id
join (select people_id, array_agg(person_id) as persons from seg.people_member group by people_id) as pa on i.probands <@ pa.persons
join seg.people o on pa.people_id = o.id
where
s.markerset_id = '61801888-9a81-4187-922c-4d42c0471bea'
and o.name = '709'
)
select m.name
, min(pval)
from optset op
join seg.markerset_member mm on op.markerset_id = mm.markerset_id
join seg.marker m on mm.member_id = m.id
where
m.basepos between op.startbase and op.endbase
group by m.name
It is true that this particular people has only 60 members. Our upper end would be ~1000.
rjs
> On Apr 7, 2017, at 10:17 AM, Rob Sargent <robjsargent@gmail.com> wrote: > > > On 04/07/2017 10:02 AM, David G. Johnston wrote: >> On Fri, Apr 7, 2017 at 8:57 AM, Rob Sargent <robjsargent@gmail.com> wrote: >> Understood but true if any match is found, I need every array member to match as I want them all to be of a specificpeople as input into the query (so field = all(array_col)) and "field" here would be people_member.person_id andusing that yields zero results in full query (very quickly though). >> >> I didn't actually attempt to comprehend your original email. If you want to supply a self-contained, functioning, queryand expected output (ideally something simpler but that covers your main question) I'd be inclined to dig further. Even a broken one with what you think should work would be >> >> person = ALL(persons) doesn't really seem like it will typically work. >> >> Here are the various array operators supplied by PostgreSQL. You might find one of them helpful. In particular "contains". >> >> https://www.postgresql.org/docs/9.6/static/functions-array.html >> >> David J. >> >> > > Well you prodding got me a seriously reduced execution time. Here's the actual query. The tables are all in the 'seg'schema and not substantively different than described in first post. > > create temp table opt10aut > as > with optset as ( > select s.id > , s.markerset_id > , s.startbase > , s.endbase > , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval > from seg.segment s > join seg.probandset i on s.probandset_id = i.id > join (select people_id, array_agg(person_id) as persons from seg.people_member group by people_id) as pa on i.probands<@ pa.persons > join seg.people o on pa.people_id = o.id > where > s.markerset_id = '61801888-9a81-4187-922c-4d42c0471bea' > and o.name = '709' > ) > select m.name > , min(pval) > from optset op > join seg.markerset_member mm on op.markerset_id = mm.markerset_id > join seg.marker m on mm.member_id = m.id > where > m.basepos between op.startbase and op.endbase > group by m.name > > It is true that this particular people has only 60 members. Our upper end would be ~1000. > > rjs > > > What I was hoping for, I guess, is a construct like create table t(id int /*for Achilles*/, idset int[] references v.field) and it would be understood that each element can be found in (preferably the index of) v.field Oh, and an operator to make use of the knowledge something like select * from t join v on t.idset “are_all_in” v.field; Thanks to all, rjs
On 04/07/2017 12:28 AM, Achilleas Mantzios wrote:Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations? Or must I do the work on the array in the where clause?On 07/04/2017 06:02, David G. Johnston wrote:
I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:SELECT s.* implies semi-joins - so lets see how that would work.SELECT vals.*FROM ( VALUES (2),(4) ) vals (v)WHERE EXISTS (SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)WHERE v = ANY(i));// 2
I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)HTHDavid J.-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
I do have a gin index on probandset(probands).
Can you give the definition of this index? Does it get used ? Did you verify with EXPLAIN ANALYZE ?
At least in 9.3, AFAIK uuid[] has no operator class for access method "gin", unless I am missing smth.
rjs
We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On 07/04/2017 18:22, Rob Sargent wrote:
On 04/07/2017 12:28 AM, Achilleas Mantzios wrote:Thank you both for your suggestions, but does either apply to joining through the array in a flow of join operations? Or must I do the work on the array in the where clause?On 07/04/2017 06:02, David G. Johnston wrote:
I need to gather all segments whose probandset is within in a specified people.
select s.* from segment s
join probandset ps on s.probandset_id = ps.id
--PROBLEM: WOULD LIKE SOMETHING BETTER THAN THE FOLLOWING:SELECT s.* implies semi-joins - so lets see how that would work.SELECT vals.*FROM ( VALUES (2),(4) ) vals (v)WHERE EXISTS (SELECT 1 FROM ( VALUES (ARRAY[1,2,3]::integer[]) ) eyes (i)WHERE v = ANY(i));// 2
I never understood the love for UUID keys, If he changes UUID for int, install intarray and create this index :
CREATE INDEX probandset_probands_gistsmall ON probandset USING gin (probands gin__int_ops);
then he'll be able to do
.... WHERE .... intset(people_member.personid) ~ probandset.probands ...
That would boost performance quite a lot. (in my tests 100-fold)HTHDavid J.-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
I do have a gin index on probandset(probands).
Can you give the definition of this index? Does it get used ? Did you verify with EXPLAIN ANALYZE ?
At least in 9.3, AFAIK uuid[] has no operator class for access method "gin", unless I am missing smth.
rjs
We can discuss my love of UUID in a separate thread ;) but the short form is that I'm awash in separate id domains starting from 1 (or maybe 750000000) and am not about to add another.
rj.-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt