Обсуждение: Trouble with strange OUTER JOIN syntax

Поиск
Список
Период
Сортировка

Trouble with strange OUTER JOIN syntax

От
Farid Hajji
Дата:
I'm trying to port a program written in ESQL/C for Informix to
PostgreSQL using ecpg [ecpg is a great helper here, despite its
shortcomings]. Right now,  I'm having a hard time with OUTER JOINs,
even when using newest PostgreSQL-7.1.1.

How would you translate the following SELECT statements with
outer joins to PostgreSQL syntax?

SELECT ...
   FROM tab1 alias_tab1, OUTER ot1 alias_ot1, OUTER ot2 alias_ot2
   WHERE ...

SELECT ...
   FROM tab1 alias_tab1, OUTER (ot1 alias_ot1, ot2 alias_ot2)
   WHERE ...

It is even possible to have multiple tables specified at FROM
before the OUTER keyword, like here:

SELECT ...
  FROM tab1 alias_tab1, tab2 alias_tab2,
       OUTER ot1 alias_ot1,
       OUTER ot2 alias_ot2
  WHERE ...

SELECT
  FROM tab1 alias_tab1, tab2 alias_tab2,
       OUTER (ot1 alias_ot1, ot2 alias_ot2)
  WHERE ...

Another situation is here (same table 'otboth' with two aliases):

SELECT ...
  FROM tab1 alias_tab1, OUTER otboth alias1, OUTER otboth alias2
  WHERE ...

I didn't find any clue in Bruce's book nor in the postgreSQL docs/faq
as of 7.1.1. If at all possible, I'd like to stick to OUTER JOINs
because they are used extensively throughout the program.

Thank you for your help,

-Farid.

--
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.


Re: Trouble with strange OUTER JOIN syntax

От
Tom Lane
Дата:
Farid Hajji <farid.hajji@ob.kamp.net> writes:
> How would you translate the following SELECT statements with
> outer joins to PostgreSQL syntax?

> SELECT ...
>   FROM tab1 alias_tab1, tab2 alias_tab2,
>        OUTER ot1 alias_ot1,
>        OUTER ot2 alias_ot2
>   WHERE ...

> SELECT
>   FROM tab1 alias_tab1, tab2 alias_tab2,
>        OUTER (ot1 alias_ot1, ot2 alias_ot2)
>   WHERE ...

Tell me what that means, and I'll tell you how to translate it.
What is being joined to what, on what keys, and which side is
allowed to become null in the join?

            regards, tom lane

Re: Trouble with strange OUTER JOIN syntax

От
Farid Hajji
Дата:
Hi Tom,

> > How would you translate the following SELECT statements with
> > outer joins to PostgreSQL syntax?
>
> > SELECT ...
> >   FROM tab1 alias_tab1, tab2 alias_tab2,
> >        OUTER ot1 alias_ot1,
> >        OUTER ot2 alias_ot2
> >   WHERE ...
>
> > SELECT
> >   FROM tab1 alias_tab1, tab2 alias_tab2,
> >        OUTER (ot1 alias_ot1, ot2 alias_ot2)
> >   WHERE ...
>
> Tell me what that means, and I'll tell you how to translate it.
> What is being joined to what, on what keys, and which side is
> allowed to become null in the join?
Having examined the statements in more detail, it seems to me like
some kind of "multijoin" is required here:

  * outer-joining more than one table to a primary table:
     SELECT ... FROM tabmain, OUTER tab2, OUTER tab3

Here, I'll guess that both tab2 and tab3 are being outer-joined
to tabmain. Therefore tab2 and tab3 columns are allowed to
be null, whereas tabmain column's are not.

  * outer-joining one (or more than one) table to a cartesian
    product of other tables:
     SELECT ... FROM tab1, tab2, OUTER tab3
     SELECT ... FROM tab1, tab2, OUTER (tab3, tab4)
     SELECT ... FROM tab1, tab2, OUTER tab3, OUTER tab4

In the first example, tab3 is being joind to the cartesian product
(inner join) of both tab1 and tab2. Therefore tab1 x tab2 is not
allowed to be null, whereas tab3 is allowed to be.

The next examples seem to generalize this: two tables (tab3 and tab4)
are being outer-joined to existing cartesian product tab1 x tab2.
I'm not sure what the difference may be between:
  OUTER (tab3, tab4)
and
  OUTER tab3, OUTER tab4.

If PostgreSQL doesn't support this feature (yet?), I'll have to simulate
it programatically. I just hoped to avoid the trouble of doing so,
because the program I'm porting contains a lot of such "multijoins".

Thanks for your help,

-Farid.

--
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.


Re: Trouble with strange OUTER JOIN syntax

От
"Eric G. Miller"
Дата:
On Fri, May 25, 2001 at 05:24:18PM +0200, Farid Hajji wrote:
> Hi Tom,
>
> > > How would you translate the following SELECT statements with
> > > outer joins to PostgreSQL syntax?
> >
> > > SELECT ...
> > >   FROM tab1 alias_tab1, tab2 alias_tab2,
> > >        OUTER ot1 alias_ot1,
> > >        OUTER ot2 alias_ot2
> > >   WHERE ...
> >
> > > SELECT
> > >   FROM tab1 alias_tab1, tab2 alias_tab2,
> > >        OUTER (ot1 alias_ot1, ot2 alias_ot2)
> > >   WHERE ...
> >
> > Tell me what that means, and I'll tell you how to translate it.
> > What is being joined to what, on what keys, and which side is
> > allowed to become null in the join?
> Having examined the statements in more detail, it seems to me like
> some kind of "multijoin" is required here:
>
>   * outer-joining more than one table to a primary table:
>      SELECT ... FROM tabmain, OUTER tab2, OUTER tab3
>
> Here, I'll guess that both tab2 and tab3 are being outer-joined
> to tabmain. Therefore tab2 and tab3 columns are allowed to
> be null, whereas tabmain column's are not.
>
>   * outer-joining one (or more than one) table to a cartesian
>     product of other tables:
>      SELECT ... FROM tab1, tab2, OUTER tab3
>      SELECT ... FROM tab1, tab2, OUTER (tab3, tab4)
>      SELECT ... FROM tab1, tab2, OUTER tab3, OUTER tab4
>
> In the first example, tab3 is being joind to the cartesian product
> (inner join) of both tab1 and tab2. Therefore tab1 x tab2 is not
> allowed to be null, whereas tab3 is allowed to be.
>
> The next examples seem to generalize this: two tables (tab3 and tab4)
> are being outer-joined to existing cartesian product tab1 x tab2.
> I'm not sure what the difference may be between:
>   OUTER (tab3, tab4)
> and
>   OUTER tab3, OUTER tab4.
>
> If PostgreSQL doesn't support this feature (yet?), I'll have to simulate
> it programatically. I just hoped to avoid the trouble of doing so,
> because the program I'm porting contains a lot of such "multijoins".

How 'bout:

SELECT * FROM foo LEFT JOIN bar ON foo.foo_id = bar.foo_id
LEFT JOIN baz on bar.id = baz.bar_id;

simple example output:

 foo_id | data | foo_id | bar_id |  data   | baz_id | bar_id |    data
--------+------+--------+--------+---------+--------+--------+-------------
      1 | one  |      1 |      1 | one-one |      1 |      1 | one-one-one
      1 | one  |      1 |      1 | one-one |      2 |      1 | one-one-two
      1 | one  |      1 |      2 | one-two |        |        |
      2 | two  |        |        |         |        |        |
(4 rows)

"baz" has no candidate with "bar_id" = 2, and "bar" has no candidate where
"foo_id" = 2. But it's not a product, as baz has a {3,3,'three-three-three'}
tuple that won't join.  You have to specify a join condition, or you get a
product.

--
Eric G. Miller <egm2@jps.net>

Re: Trouble with strange OUTER JOIN syntax

От
Tom Lane
Дата:
Farid Hajji <farid.hajji@ob.kamp.net> writes:
> The next examples seem to generalize this: two tables (tab3 and tab4)
> are being outer-joined to existing cartesian product tab1 x tab2.
> I'm not sure what the difference may be between:
>   OUTER (tab3, tab4)
> and
>   OUTER tab3, OUTER tab4.

I suspect that these correspond to the standard syntaxes

SELECT ... FROM
    (tab1 JOIN tab2 ON condition12)
    LEFT JOIN
    (tab3 JOIN tab4 ON condition34)
    ON condition1234;

and

SELECT ... FROM
    ((tab1 JOIN tab2 ON condition12)
    LEFT JOIN
    tab3 ON condition123)
    LEFT JOIN
    tab4 ON condition1234;

respectively.  In the first case, tab1 is inner-joined to tab2 and
tab3 is separately inner-joined to tab4, then the results are
outer-joined (with the tab3*tab4 product being the nullable side).
In the second case, tab1 is inner-joined to tab2, then tab3 is
outer-joined to this product, and finally tab4 is outer-joined to
the result.  Obviously these orderings can yield different results
because of null-row addition (whereas it wouldn't really matter if
all the joins were inner joins).

Note that the standard syntax makes you attach a join condition
(for example, "tab1.x = tab2.y") to each of these operations, rather
than intuiting which parts of the WHERE clause are to be taken as the
join condition.  Again, this wouldn't matter for inner joins but it
makes a big difference for outer joins.  Example:

select * from tab1 left join tab2 on (tab1.a = tab2.b and tab2.c = 0);

is not at all the same as

select * from tab1 left join tab2 on (tab1.a = tab2.b) where tab2.c = 0;

The WHERE clause is not the join condition, but is applied after the
join is done (and null rows are inserted).  So, for example, if tab1
contains just A=1 and tab2 contains just B=1, C=2, the first case
produces output 1,NULL,NULL because there are no tab2 rows that meet
the outer-join condition with tab1's row.  But the second case produces
no output rows at all --- the outer-join produces 1,1,2 which is then
removed by the WHERE filter.

The standard's syntax is rather verbose and ugly, but it has the great
virtue of handling outer joins unambiguously.  None of the vendor-
specific syntaxes I've seen are very clear about the implications of
an outer join condition.

BTW, you need PG 7.1 or later to work with outer joins.

            regards, tom lane

Re: Trouble with strange OUTER JOIN syntax

От
Farid Hajji
Дата:
> > Having examined the statements in more detail, it seems to me like
> > some kind of "multijoin" is required here:
> >
> >   * outer-joining more than one table to a primary table:
> >      SELECT ... FROM tabmain, OUTER tab2, OUTER tab3
> >
> > Here, I'll guess that both tab2 and tab3 are being outer-joined
> > to tabmain. Therefore tab2 and tab3 columns are allowed to
> > be null, whereas tabmain column's are not.
> >
> >   * outer-joining one (or more than one) table to a cartesian
> >     product of other tables:
> >      SELECT ... FROM tab1, tab2, OUTER tab3
> >      SELECT ... FROM tab1, tab2, OUTER (tab3, tab4)
> >      SELECT ... FROM tab1, tab2, OUTER tab3, OUTER tab4
> >
> > In the first example, tab3 is being joind to the cartesian product
> > (inner join) of both tab1 and tab2. Therefore tab1 x tab2 is not
> > allowed to be null, whereas tab3 is allowed to be.
> >
> > The next examples seem to generalize this: two tables (tab3 and tab4)
> > are being outer-joined to existing cartesian product tab1 x tab2.
> > I'm not sure what the difference may be between:
> >   OUTER (tab3, tab4)
> > and
> >   OUTER tab3, OUTER tab4.
> >
> > If PostgreSQL doesn't support this feature (yet?), I'll have to simulate
> > it programatically. I just hoped to avoid the trouble of doing so,
> > because the program I'm porting contains a lot of such "multijoins".
>
> How 'bout:
>
> SELECT * FROM foo LEFT JOIN bar ON foo.foo_id = bar.foo_id
> LEFT JOIN baz on bar.id = baz.bar_id;
>
> simple example output:
>
>  foo_id | data | foo_id | bar_id |  data   | baz_id | bar_id |    data
> --------+------+--------+--------+---------+--------+--------+-------------
>       1 | one  |      1 |      1 | one-one |      1 |      1 | one-one-one
>       1 | one  |      1 |      1 | one-one |      2 |      1 | one-one-two
>       1 | one  |      1 |      2 | one-two |        |        |
>       2 | two  |        |        |         |        |        |
> (4 rows)
>
> "baz" has no candidate with "bar_id" = 2, and "bar" has no candidate where
> "foo_id" = 2. But it's not a product, as baz has a {3,3,'three-three-three'}
> tuple that won't join.  You have to specify a join condition, or you get a
> product.
Yes, chaining the outer join(s) did the trick for me here!

> Eric G. Miller <egm2@jps.net>

Many thanks,

-Farid.

--
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.


Re: Trouble with strange OUTER JOIN syntax

От
Farid Hajji
Дата:
> Farid Hajji <farid.hajji@ob.kamp.net> writes:
> > The next examples seem to generalize this: two tables (tab3 and tab4)
> > are being outer-joined to existing cartesian product tab1 x tab2.
> > I'm not sure what the difference may be between:
> >   OUTER (tab3, tab4)
> > and
> >   OUTER tab3, OUTER tab4.
>
> I suspect that these correspond to the standard syntaxes
>
> SELECT ... FROM
>     (tab1 JOIN tab2 ON condition12)
>     LEFT JOIN
>     (tab3 JOIN tab4 ON condition34)
>     ON condition1234;
>
> and
>
> SELECT ... FROM
>     ((tab1 JOIN tab2 ON condition12)
>     LEFT JOIN
>     tab3 ON condition123)
>     LEFT JOIN
>     tab4 ON condition1234;
>
> respectively.  In the first case, tab1 is inner-joined to tab2 and
> tab3 is separately inner-joined to tab4, then the results are
> outer-joined (with the tab3*tab4 product being the nullable side).
> In the second case, tab1 is inner-joined to tab2, then tab3 is
> outer-joined to this product, and finally tab4 is outer-joined to
> the result.  Obviously these orderings can yield different results
> because of null-row addition (whereas it wouldn't really matter if
> all the joins were inner joins).
Hmmm... yes, this could be so. I'll cross-check with Informix to
make sure though.

> Note that the standard syntax makes you attach a join condition
> (for example, "tab1.x = tab2.y") to each of these operations, rather
> than intuiting which parts of the WHERE clause are to be taken as the
> join condition.  Again, this wouldn't matter for inner joins but it
> makes a big difference for outer joins.  Example:
>
> select * from tab1 left join tab2 on (tab1.a = tab2.b and tab2.c = 0);
>
> is not at all the same as
>
> select * from tab1 left join tab2 on (tab1.a = tab2.b) where tab2.c = 0;
>
> The WHERE clause is not the join condition, but is applied after the
> join is done (and null rows are inserted).  So, for example, if tab1
> contains just A=1 and tab2 contains just B=1, C=2, the first case
> produces output 1,NULL,NULL because there are no tab2 rows that meet
> the outer-join condition with tab1's row.  But the second case produces
> no output rows at all --- the outer-join produces 1,1,2 which is then
> removed by the WHERE filter.
Ahh... that was the missing part in the puzzle. I was already wondering
why my queries all yielded empty result sets. This was actually the
exact reason! Many thanks for pointing this out.

A section on multi-joins in the user's manual, that covers these issues
would be rather useful, IMHO.

> The standard's syntax is rather verbose and ugly, but it has the great
> virtue of handling outer joins unambiguously.  None of the vendor-
> specific syntaxes I've seen are very clear about the implications of
> an outer join condition.
Indeed...

> BTW, you need PG 7.1 or later to work with outer joins.
I'm using 7.1.1 and it works fine now...

>             regards, tom lane

Many thanks,

-Farid.

--
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | farid.hajji@ob.kamp.net
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
One OS To Rule Them All And In The Darkness Bind Them... --Bill Gates.