Обсуждение: Negating the list of selected rows of a join
Hello, I want to list the rows of a table with a text field whose values do not exist in a similar field of another table. Basically what I want to get is negated results of a join. Lets say the tables table_a and table_b have the field name. table_a table_b name age name ----- --- ----- Peter 27 Paul Paul 42 Mary 20 If I asked for a join like this: SELECT table_a.name,table_a.age FROM table_a,table_b WHERE table_a.name=table_b.name I would get: name age ----- --- Paul 42 But I want the opposite. I tried a non-equi join like this: SELECT table_a.name,table_a.age FROM table_a,table_b WHERE table_a.name<>table_b.name and I got: name age ----- --- Peter 27 Mary 20 It worked except for the case when table_b is empty. In this case the nothing was returned. Is this the expected behaviour or is it a bug in PostgreSQL? How can I make a query that works the way I want all the time, even for the case when table_b is empty? Regards, Manuel Lemos E-mail: mlemos@acm.org URL: http://www.e-na.net/the_author.html PGP key: finger://mlemos@zeus.ci.ua.pt --
Manuel Lemos wrote:
> How can I make a query that works the way I want all the time, even for the
> case when table_b is empty?
SELECT table_a.name, table_a.age
FROM table_a
WHERE NOT EXISTS (
SELECT 'x'
FROM table_b
WHERE table_b.name = table_a.name
);
Hope this will do the trick.
:) Clark Evans
Manuel Lemos <mlemos@acm.org> wrote:
> I want to list the rows of a table with a text field whose values do not
> exist in a similar field of another table. Basically what I want to get
> is negated results of a join. [...]
> It worked except for the case when table_b is empty. In this case the
> nothing was returned. Is this the expected behaviour or is it a bug in
> PostgreSQL?
If you list two (or more) tables in the 'from' clause of a select
(that is, if you do a 'join'), a result table is built, in which each
row of the first table is combined with each row from (all) the other
table(s). To clarify, do simply
SELECT table_a.name,table_b.name FROM table_a,table_b;
on your table. When one of the tables has no rows, all the rows from
the other(s) are combined with *nothing*; this gives nothing!
('combined' may be the wrong word; it's like a multiplication, and
people speak of a 'Cartesian product' of the tables)
The 'where' clause can restrict the rows of the result table to
something useful, e.g., you can restrict to 'table_a.name =
table_b.name'. A feature that probably will help you is the
construction of a so-called 'sub-select' in the where clause:
SELECT name FROM table_a
WHERE name NOT IN (SELECT name FROM table_b);
Hope it helps!
Ulf
--
======================================================================
Ulf Mehlig <umehlig@zmt.uni-bremen.de>
Center for Tropical Marine Ecology/ZMT, Bremen, Germany
----------------------------------------------------------------------
Clark Evans <clark.evans@manhattanproject.com>:
> Manuel Lemos wrote:
> > How can I make a query that works the way I want all the time, even for the
> > case when table_b is empty?
>
> SELECT table_a.name, table_a.age
> FROM table_a
> WHERE NOT EXISTS (
> SELECT 'x'
> FROM table_b
> WHERE table_b.name = table_a.name
> );
>
> Hope this will do the trick.
Maybe not -- doesn't that mean, that the query won't return a single
row in case there is *any* pair of equal names in both tables?!
Have a nice (sun)day,
Ulf
--
======================================================================
Ulf Mehlig <umehlig@zmt.uni-bremen.de>
Center for Tropical Marine Ecology/ZMT, Bremen, Germany
----------------------------------------------------------------------
Manuel Lemos wrote: > > I want to list the rows of a table with a text field whose values do not > exist in a similar field of another table. Basically what I want to get > is negated results of a join. Ulf Mehlig wrote: > SELECT name FROM table_a > WHERE name NOT IN (SELECT name FROM table_b); Clark Evans wrote: > SELECT table_a.name, table_a.age > FROM table_a > WHERE NOT EXISTS ( > SELECT 'x' > FROM table_b > WHERE table_b.name = table_a.name > ); I'm not sure about how well PostgreSQL handles these two. I'd try them both with your data set. If table_b is small (less than a few thousand rows) then Ulf's approach would work best. However, if table_b is large (more than a thousand) then I think the other approach may work better if table_b.name is indexed. Clark
Clark Evans <clark.evans@manhattanproject.com>/Ulf Mehlig
<umehlig@pandora3.uni-bremen.de> wrote:
> > > Hope this will do the trick.
> >
> > Maybe not -- doesn't that mean, that the query won't return a
> > single row in case there is *any* pair of equal names in both
> > tables?!
>
> No. It's a correlated sub-query. It's actually much more efficient
> with large tables than the other approach (which has to put
> the entire result set of table_b on the heap before it can
> process table_a). Your approach is, however, much better (by a
> large factor) if table_a is very large and table_b is very small,
> since you don't have to hit the index for table_b on every
> row of table_a...
Yes, it *will* do the trick!! Sorry, Clark, I misinterpreted your
sub-query ... have to read more carefully ...
Thanks for your correction!
Ulf
--
======================================================================
Ulf Mehlig <umehlig@zmt.uni-bremen.de>
Center for Tropical Marine Ecology/ZMT, Bremen, Germany
----------------------------------------------------------------------