Обсуждение: [GENERAL] Tricky SQL query (tried [SQL])
Dear All,
I'm having a lot of trouble figuring out a good SQL query for the situation
below.
Consider the following table:
dev_brecard=> select * from test order by person;
person|fruit
- ------+---------
lucy |mandarins
lucy |tomatoes
lucy |pears
lucy |oranges
lucy |apples
peter |pears
peter |apples
peter |oranges
peter |prunes
robert|figs
robert|dates
stuart|apples
stuart|pears
stuart|prunes
stuart|bananas
stuart|kumquats
(16 rows)
(code for creating and populating table is in a PS at the end of this posting)
You can assume that the table is appropriately normalised and that there is
a composite primary key for it (i.e. each COMBINATION of person and fruit
will appear only once and neither of the fields can be NULL)
How do I select from all person who like 'pears' and 'apples' (in this
case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in this
case, lucy and peter)?
I re-read my SQL books but I am still somewhat stumped. Things I could
think of for that sort of query:
1) Select all persons who like 'pears'; Select all persons who like
'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION
of these sets. But does postgreSQL have a INTERSECTION operator?
2) Use nested subselects:
Select person from test where fruit='pears' and person in (
Select person from test where fruit='apples' and person in (
Select person from test where fruit='oranges'
)
)
But how efficient will this be if I start looking for 6 or seven fruits in
a table with hundreds of entries?
3) Am I storing this sort of data in to wrong kind of form (should I
somehow denormalise? if so, how?)?
Any suggestions????
thanks for any help out there!
Stuart.
PS. Code to cut and paste for table:
create table test (person varchar(25), fruit varchar(25));
insert into test values ('stuart','apples');
insert into test values ('stuart','pears');
insert into test values ('stuart','bananas');
insert into test values ('stuart','kumquats');
insert into test values ('peter','oranges');
insert into test values ('peter','prunes');
insert into test values ('lucy','mandarins');
insert into test values ('lucy','tomatoes');
insert into test values ('peter','apples');
insert into test values ('lucy','apples');
insert into test values ('peter','pears');
insert into test values ('lucy','pears');
insert into test values ('lucy','oranges');
insert into test values ('stuart','prunes');
insert into test values ('robert','figs');
insert into test values ('robert','dates');
+-------------------------+--------------------------------------+
| Stuart Rison | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street |
| Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. |
| Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk |
+-------------------------+--------------------------------------+
On Wed, 13 Jan 1999, Stuart Rison wrote:
# dev_brecard=> select * from test order by person;
# person|fruit
# - ------+---------
# lucy |mandarins
# lucy |tomatoes
# lucy |pears
# lucy |oranges
# lucy |apples
# peter |pears
# peter |apples
# peter |oranges
# peter |prunes
# robert|figs
# robert|dates
# stuart|apples
# stuart|pears
# stuart|prunes
# stuart|bananas
# stuart|kumquats
# (16 rows)
# You can assume that the table is appropriately normalised and that there is
# a composite primary key for it (i.e. each COMBINATION of person and fruit
# will appear only once and neither of the fields can be NULL)
Actually, it would be normalized a little better if you weren't
replicating person names and fruit names for every row.
# How do I select from all person who like 'pears' and 'apples' (in this
# case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in
# this case, lucy and peter)?
I got this to work:
select distinct person from test
where likes(person, 'oranges')
and likes(person, 'apples')
and likes(person, 'pears')
Where likes is defined as follows:
create function likes(text, text) returns bool as
'
declare
ret bool;
cnt integer;
begin
select count(*) into cnt from test where person = $1 and fruit = $2;
if cnt = 0 then
ret=0;
else
ret=1;
end if;
return(ret);
end;
' language 'plpgsql';
# I re-read my SQL books but I am still somewhat stumped. Things I could
# think of for that sort of query:
#
# 1) Select all persons who like 'pears'; Select all persons who like
# 'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION
# of these sets. But does postgreSQL have a INTERSECTION operator?
#
# 2) Use nested subselects:
#
# Select person from test where fruit='pears' and person in (
# Select person from test where fruit='apples' and person in (
# Select person from test where fruit='oranges'
# )
# )
#
# But how efficient will this be if I start looking for 6 or seven fruits in
# a table with hundreds of entries?
#
# 3) Am I storing this sort of data in to wrong kind of form (should I
# somehow denormalise? if so, how?)?
#
# Any suggestions????
#
# thanks for any help out there!
#
# Stuart.
#
# PS. Code to cut and paste for table:
#
# create table test (person varchar(25), fruit varchar(25));
# insert into test values ('stuart','apples');
# insert into test values ('stuart','pears');
# insert into test values ('stuart','bananas');
# insert into test values ('stuart','kumquats');
# insert into test values ('peter','oranges');
# insert into test values ('peter','prunes');
# insert into test values ('lucy','mandarins');
# insert into test values ('lucy','tomatoes');
# insert into test values ('peter','apples');
# insert into test values ('lucy','apples');
# insert into test values ('peter','pears');
# insert into test values ('lucy','pears');
# insert into test values ('lucy','oranges');
# insert into test values ('stuart','prunes');
# insert into test values ('robert','figs');
# insert into test values ('robert','dates');
#
# +-------------------------+--------------------------------------+
# | Stuart Rison | Ludwig Institute for Cancer Research |
# +-------------------------+ 91 Riding House Street |
# | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. |
# | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk |
# +-------------------------+--------------------------------------+
#
#
#
#
--
SA, beyond.com My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
IMHO the best table layout for your data would be:
DROP TABLE person;
DROP TABLE fruit;
DROP TABLE person_fruit;
CREATE TABLE person (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE fruit (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE person_fruit (p_id INT, f_id INT);
CREATE UNIQUE INDEX pk_person_fruit ON person_fruit (p_id, f_id);
INSERT INTO person (name) VALUES ('lucy');
INSERT INTO person (name) VALUES ('peter');
INSERT INTO person (name) VALUES ('robert');
INSERT INTO person (name) VALUES ('stuart');
INSERT INTO fruit (name) VALUES ('mandarins');
INSERT INTO fruit (name) VALUES ('tomatoes');
INSERT INTO fruit (name) VALUES ('pears');
INSERT INTO fruit (name) VALUES ('oranges');
INSERT INTO fruit (name) VALUES ('apples');
INSERT INTO fruit (name) VALUES ('prunes');
INSERT INTO fruit (name) VALUES ('figs');
INSERT INTO fruit (name) VALUES ('dates');
INSERT INTO fruit (name) VALUES ('bananas');
INSERT INTO fruit (name) VALUES ('kumquats');
INSERT INTO person_fruit (p_id, f_id) VALUES (1,1);
INSERT INTO person_fruit (p_id, f_id) VALUES (1,2);
INSERT INTO person_fruit (p_id, f_id) VALUES (1,3);
INSERT INTO person_fruit (p_id, f_id) VALUES (1,4);
INSERT INTO person_fruit (p_id, f_id) VALUES (1,5);
INSERT INTO person_fruit (p_id, f_id) VALUES (2,3);
INSERT INTO person_fruit (p_id, f_id) VALUES (2,5);
INSERT INTO person_fruit (p_id, f_id) VALUES (2,4);
INSERT INTO person_fruit (p_id, f_id) VALUES (2,6);
INSERT INTO person_fruit (p_id, f_id) VALUES (3,7);
INSERT INTO person_fruit (p_id, f_id) VALUES (3,8);
INSERT INTO person_fruit (p_id, f_id) VALUES (4,5);
INSERT INTO person_fruit (p_id, f_id) VALUES (4,3);
INSERT INTO person_fruit (p_id, f_id) VALUES (4,6);
INSERT INTO person_fruit (p_id, f_id) VALUES (4,9);
INSERT INTO person_fruit (p_id, f_id) VALUES (4,10);
SELECT p.name AS person, f.name AS fruit
FROM person p, fruit f, person_fruit pf
WHERE p.id = pf.p_id AND f.id = pf.f_id
ORDER BY p.name; --your 1st select;
SELECT DISTINCT p.name FROM person p
WHERE EXISTS(SELECT 1
FROM person_fruit pf
WHERE pf.p_id = p.id AND
EXISTS(SELECT 1
FROM fruit f
WHERE f.id = pf.f_id AND
f.name IN ('pears', 'apples',
'oranges'))); --Answer to your actual question
I think that this is 4th(or 5th) Normal Form (never had a DB class :^P).
Hope this helps,
-DEJ
> -----Original Message-----
> From: Dustin Sallings [mailto:dustin@spy.net]
> Sent: Wednesday, January 13, 1999 11:48 AM
> To: Stuart Rison
> Cc: pgsql-general@postgreSQL.org
> Subject: Re: [GENERAL] Tricky SQL query (tried [SQL])
>
>
> On Wed, 13 Jan 1999, Stuart Rison wrote:
>
> # dev_brecard=> select * from test order by person;
> # person|fruit
> # - ------+---------
> # lucy |mandarins
> # lucy |tomatoes
> # lucy |pears
> # lucy |oranges
> # lucy |apples
> # peter |pears
> # peter |apples
> # peter |oranges
> # peter |prunes
> # robert|figs
> # robert|dates
> # stuart|apples
> # stuart|pears
> # stuart|prunes
> # stuart|bananas
> # stuart|kumquats
> # (16 rows)
>
> # You can assume that the table is appropriately normalised
> and that there is
> # a composite primary key for it (i.e. each COMBINATION of
> person and fruit
> # will appear only once and neither of the fields can be NULL)
>
> Actually, it would be normalized a little better if you weren't
> replicating person names and fruit names for every row.
>
> # How do I select from all person who like 'pears' and
> 'apples' (in this
> # case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in
> # this case, lucy and peter)?
>
> I got this to work:
>
> select distinct person from test
> where likes(person, 'oranges')
> and likes(person, 'apples')
> and likes(person, 'pears')
>
> Where likes is defined as follows:
>
> create function likes(text, text) returns bool as
> '
> declare
> ret bool;
> cnt integer;
> begin
> select count(*) into cnt from test where person = $1
> and fruit = $2;
> if cnt = 0 then
> ret=0;
> else
> ret=1;
> end if;
> return(ret);
> end;
> ' language 'plpgsql';
>
>
> # I re-read my SQL books but I am still somewhat stumped.
> Things I could
> # think of for that sort of query:
> #
> # 1) Select all persons who like 'pears'; Select all persons who like
> # 'apples'; Select all persons who like 'oranges'; Calculate
> the INTERSECTION
> # of these sets. But does postgreSQL have a INTERSECTION operator?
> #
> # 2) Use nested subselects:
> #
> # Select person from test where fruit='pears' and person in (
> # Select person from test where fruit='apples' and person in (
> # Select person from test where fruit='oranges'
> # )
> # )
> #
> # But how efficient will this be if I start looking for 6 or
> seven fruits in
> # a table with hundreds of entries?
> #
> # 3) Am I storing this sort of data in to wrong kind of form (should I
> # somehow denormalise? if so, how?)?
> #
> # Any suggestions????
> #
> # thanks for any help out there!
> #
> # Stuart.
> #
> # PS. Code to cut and paste for table:
> #
> # create table test (person varchar(25), fruit varchar(25));
> # insert into test values ('stuart','apples');
> # insert into test values ('stuart','pears');
> # insert into test values ('stuart','bananas');
> # insert into test values ('stuart','kumquats');
> # insert into test values ('peter','oranges');
> # insert into test values ('peter','prunes');
> # insert into test values ('lucy','mandarins');
> # insert into test values ('lucy','tomatoes');
> # insert into test values ('peter','apples');
> # insert into test values ('lucy','apples');
> # insert into test values ('peter','pears');
> # insert into test values ('lucy','pears');
> # insert into test values ('lucy','oranges');
> # insert into test values ('stuart','prunes');
> # insert into test values ('robert','figs');
> # insert into test values ('robert','dates');
> #
> # +-------------------------+--------------------------------------+
> # | Stuart Rison | Ludwig Institute for Cancer Research |
> # +-------------------------+ 91 Riding House Street |
> # | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. |
> # | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk |
> # +-------------------------+--------------------------------------+
> #
> #
> #
> #
>
> --
> SA, beyond.com My girlfriend asked me which one I
> like better.
> pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65
> 51 98 D8 BE
> L_______________________ I hope the answer won't upset her.
> ____________
>
>