Обсуждение: Weird behaviour on a join with multiple keys

Поиск
Список
Период
Сортировка

Weird behaviour on a join with multiple keys

От
Charlie Clark
Дата:
Hi,

I'm getting unexpected results on a query which involves joining two
tables on two common variables (firstname and lastname).

This is the basic query:

SELECT table1.lastname, table1.firstname
FROM  table1
INNER JOIN table2 ON
(table2.name = table1.name
AND
table2.vorname = table1.vorname)

This is returning many rows fewer than I expect and is ignoring a lot
where table1.firstname = table2.firstname AND table1.lastname =
table2.lastname. In fact when I extend the query by a WHERE clause
such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are
not returned by the original query.

I'm not very au fait with the inner workings of PostgreSQL but
EXPLAIN does not seem, to me at least, to provide an explanation for
the missing results.

"Merge Join  (cost=1987.97..2121.24 rows=34 width=22)"
"  Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND
("outer"."?column4?" = "inner"."?column4?"))"
"  ->  Sort  (cost=364.97..375.99 rows=4409 width=22)"
"        Sort Key: (table1.lastname)::text, (table1.firstname)::text"
"        ->  Seq Scan on table1  (cost=0.00..98.09 rows=4409 width=22)"
"  ->  Sort  (cost=1623.00..1667.00 rows=17599 width=21)"
"        Sort Key: (table2.lastname)::text, (table2.firstname)::text"
"        ->  Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)"

Am I missing something big and obvious here?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




Re: Weird behaviour on a join with multiple keys

От
Omar Eljumaily
Дата:
What happens if you do an outer join instead of an inner join?

Charlie Clark wrote:
> Hi,
>
> I'm getting unexpected results on a query which involves joining two
> tables on two common variables (firstname and lastname).
>
> This is the basic query:
>
> SELECT table1.lastname, table1.firstname
> FROM  table1
> INNER JOIN table2 ON
> (table2.name = table1.name
> AND
> table2.vorname = table1.vorname)
>
> This is returning many rows fewer than I expect and is ignoring a lot
> where table1.firstname = table2.firstname AND table1.lastname =
> table2.lastname. In fact when I extend the query by a WHERE clause
> such as WHERE table1.lastname IS LIKE 'Sch%' I get results which are
> not returned by the original query.
>
> I'm not very au fait with the inner workings of PostgreSQL but EXPLAIN
> does not seem, to me at least, to provide an explanation for the
> missing results.
>
> "Merge Join  (cost=1987.97..2121.24 rows=34 width=22)"
> "  Merge Cond: (("outer"."?column3?" = "inner"."?column3?") AND
> ("outer"."?column4?" = "inner"."?column4?"))"
> "  ->  Sort  (cost=364.97..375.99 rows=4409 width=22)"
> "        Sort Key: (table1.lastname)::text, (table1.firstname)::text"
> "        ->  Seq Scan on table1  (cost=0.00..98.09 rows=4409 width=22)"
> "  ->  Sort  (cost=1623.00..1667.00 rows=17599 width=21)"
> "        Sort Key: (table2.lastname)::text, (table2.firstname)::text"
> "        ->  Seq Scan on table2 (cost=0.00..381.99 rows=17599 width=21)"
>
> Am I missing something big and obvious here?
>
> Charlie
> --
> Charlie Clark
> Helmholtzstr. 20
> Düsseldorf
> D- 40215
> Tel: +49-211-938-5360
> GSM: +49-178-782-6226
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


Re: Weird behaviour on a join with multiple keys

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/08/07 16:09, Charlie Clark wrote:
> Hi,
>
> I'm getting unexpected results on a query which involves joining two
> tables on two common variables (firstname and lastname).
>
> This is the basic query:
>
> SELECT table1.lastname, table1.firstname
> FROM  table1
> INNER JOIN table2 ON
> (table2.name = table1.name
> AND
> table2.vorname = table1.vorname)
>
> This is returning many rows fewer than I expect and is ignoring a lot
> where table1.firstname = table2.firstname AND table1.lastname =
> table2.lastname.

Huh?  Why should you?  You're not joining on firstname and lastname.

What happens if you do it like this:
SELECT T1.LASTNAME, T2.FIRSTNAME
FROM TABLE1 T1,
     TABLE2 T2
WHERE T1.NAME = T2.NAME
  AND T1.VORNAME = T2.VORNAME;


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF8N1zS9HxQb37XmcRAp3wAKCRJ1kuoqbc8YPOZwx+53+JRqvD/ACfVvFy
zK8u0+RYuMiBxEnURVc74Jc=
=9oxj
-----END PGP SIGNATURE-----

Re: Weird behaviour on a join with multiple keys

От
Tom Lane
Дата:
Charlie Clark <charlie@begeistert.org> writes:
> I'm getting unexpected results on a query which involves joining two
> tables on two common variables (firstname and lastname).

That looks like it should work.  Given that you describe the columns as
"names" I'm supposing they are of textual datatypes.  Maybe you have a
messed-up encoding or locale situation that is causing the sorts to not
work properly?  What PG version is this exactly, on what platform, and
what do "show lc_collate" and "show server_encoding" say?

            regards, tom lane

Re: Weird behaviour on a join with multiple keys

От
Charlie Clark
Дата:
Am 09.03.2007 um 05:30 schrieb Tom Lane:

> Charlie Clark <charlie@begeistert.org> writes:
>> I'm getting unexpected results on a query which involves joining two
>> tables on two common variables (firstname and lastname).
>
> That looks like it should work.  Given that you describe the
> columns as
> "names" I'm supposing they are of textual datatypes.  Maybe you have a
> messed-up encoding or locale situation that is causing the sorts to
> not
> work properly?  What PG version is this exactly, on what platform, and
> what do "show lc_collate" and "show server_encoding" say?

I'm running PostgreSQL 8.1.4 on Mac OS X

psytec=# show lc_collate;
lc_collate
-------------
de_DE.UTF-8
(1 row)

psytec=# show server_encoding;
server_encoding
-----------------
LATIN1
(1 row)

I thought that it might be something to do with the encoding - one of
the tables has just been imported and I had some "fun" doing that but
it "looks" okay now. Is there a way of checking?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




Re: Weird behaviour on a join with multiple keys

От
Tom Lane
Дата:
Charlie Clark <charlie@begeistert.org> writes:
> psytec=# show lc_collate;
> lc_collate
> -------------
> de_DE.UTF-8
> (1 row)

> psytec=# show server_encoding;
> server_encoding
> -----------------
> LATIN1
> (1 row)

There's your problem right there.  The string comparison routines are
built on strcoll(), which is going to expect UTF8-encoded data because
of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
characters in the database, they will most likely look like invalid
encoding to strcoll(), and on most platforms that causes it to behave
very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
server_encoding.

            regards, tom lane

Re: Weird behaviour on a join with multiple keys

От
Charlie Clark
Дата:
Am 09.03.2007 um 16:15 schrieb Tom Lane:

>> psytec=# show lc_collate;
>> lc_collate
>> -------------
>> de_DE.UTF-8
>> (1 row)
>
>> psytec=# show server_encoding;
>> server_encoding
>> -----------------
>> LATIN1
>> (1 row)
>
> There's your problem right there.  The string comparison routines are
> built on strcoll(), which is going to expect UTF8-encoded data because
> of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
> characters in the database, they will most likely look like invalid
> encoding to strcoll(), and on most platforms that causes it to behave
> very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
> server_encoding.

That does indeed seem to have been the problem even though the
examples I was looking at were all using plain ASCII characters. Glad
to know it wasn't a bug and to have learned something new.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226




Re: Weird behaviour on a join with multiple keys

От
Tom Lane
Дата:
Charlie Clark <charlie@begeistert.org> writes:
> Am 09.03.2007 um 16:15 schrieb Tom Lane:
>> There's your problem right there.  The string comparison routines are
>> built on strcoll(), which is going to expect UTF8-encoded data because
>> of the LC_COLLATE setting.  If there are any high-bit-set LATIN1
>> characters in the database, they will most likely look like invalid
>> encoding to strcoll(), and on most platforms that causes it to behave
>> very oddly.  You need to keep lc_collate (and lc_ctype) in sync with
>> server_encoding.

> That does indeed seem to have been the problem even though the
> examples I was looking at were all using plain ASCII characters. Glad
> to know it wasn't a bug and to have learned something new.

Well, it *is* a bug: we really shouldn't let you select incompatible
locale and encoding settings.  This gotcha has been known for a long
time, but it's not clear that there's a bulletproof, portable way to
determine which encoding a particular locale setting implies ...

            regards, tom lane