Обсуждение: Subselect problem

Поиск
Список
Период
Сортировка

Subselect problem

От
"Wellmann, Harald"
Дата:
I'm not sure if I'm making wrong assumptions on the semantics of
subselect or if this a problem in PostgreSQL:

I'm using query with the following structure

select a1.id, c1.foo
from a as a1
join b on ...
join c as c1 on ...
where a.id in (
    select a2.id
    from a as a2
    left join c as c2 on ...
    where c2.bla is null)

The query unexpectedly returns an empty result set. When I replace the
subselect by the equivalent list of integer literals (i.e. just the ID
values returned by the subselect run as a separate query), the result
set is no longer empty:

select a.id, c.foo
from a
join b on ...
join c on ...
where a.id in (123, 456, ... )

My impression is that PostgreSQL somehow confuses the references to the
tables a and c from the outer and the inner select.

I have tried to isolate the problem with a small set of entries, but
when I delete unrelated entries from my tables, the query plan changes
and the problem no longer occurs, which again confirms my suspicion that
this a problem in Postgres.

The problem occurs with PostgreSQL 8.4.0. I cannot reproduce it with
PostgreSQL 8.3.7.

If this is indeed a bug in Postgres, please let me know the best way to
provide more information. The tables in question have a few hundred rows
each, so I could probably come up with a script demonstrating the
problem.

Best regards,

Harald

*******************************************
innovative systems GmbH Navigation-Multimedia
Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser
Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980

*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sindoder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese
Mail.Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
receivedthis e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosureor distribution of the contents in this e-mail is strictly forbidden. 
*******************************************

Re: Subselect problem

От
Tom Lane
Дата:
"Wellmann, Harald" <harald.wellmann@harman.com> writes:
> The problem occurs with PostgreSQL 8.4.0. I cannot reproduce it with
> PostgreSQL 8.3.7.

There are known bugs in 8.4.0 having to do with improperly exchanging
the ordering of semijoins (IN joins) and other joins.  You haven't
provided enough information to test whether your case is one of them.
If you can try CVS branch tip or a recent nightly snapshot, there
might still be enough time to do something about it for 8.4.1,
if it isn't fixed already.

            regards, tom lane

Re: Subselect problem

От
"Wellmann, Harald"
Дата:
Ok, here's some more details.

This is my query:

SELECT ah.FEATURE_ID, FG.TILE_NUM FROM
nndb.admin_hierarchy ah
JOIN NNDB.LINK_ADMIN LA
ON ah.FEATURE_ID = LA.ADMIN_ID
JOIN psi.FEATURE_GEOMETRY FG
ON LA.LINK_ID = FG.NNDB_FEATURE_ID
WHERE  fg.tile_num = 8613949  and ah.feature_id in (SELECT AH2.FEATURE_ID
FROM NNDB.ADMIN_HIERARCHY AH2
LEFT JOIN psi.FEATURE_GEOMETRY FG2
ON AH2.FEATURE_ID = FG2.NNDB_FEATURE_ID
WHERE FG2.nndb_feature_id IS NULL)


This is the output of EXPLAIN:


"Hash Join  (cost=87.20..105.37 rows=13 width=8)"
"  Hash Cond: (fg.nndb_feature_id = la.link_id)"
"  ->  Bitmap Heap Scan on feature_geometry fg  (cost=4.45..22.51 rows=26 width=8)"
"        Recheck Cond: (tile_num = 8613949)"
"        ->  Bitmap Index Scan on nx_featuregeometry_tilenum  (cost=0.00..4.45 rows=26 width=0)"
"              Index Cond: (tile_num = 8613949)"
"  ->  Hash  (cost=82.59..82.59 rows=13 width=8)"
"        ->  Nested Loop Semi Join  (cost=73.89..82.59 rows=13 width=8)"
"              Join Filter: (ah.feature_id = la.admin_id)"
"              ->  Seq Scan on admin_hierarchy ah  (cost=0.00..1.13 rows=13 width=4)"
"              ->  Materialize  (cost=73.89..99.95 rows=2606 width=12)"
"                    ->  Nested Loop  (cost=50.20..71.29 rows=2606 width=12)"
"                          ->  HashAggregate  (cost=50.20..50.21 rows=1 width=4)"
"                                ->  Hash Anti Join  (cost=48.95..50.19 rows=1 width=4)"
"                                      Hash Cond: (ah2.feature_id = fg2.nndb_feature_id)"
"                                      ->  Seq Scan on admin_hierarchy ah2  (cost=0.00..1.13 rows=13 width=4)"
"                                      ->  Hash  (cost=31.20..31.20 rows=1420 width=4)"
"                                            ->  Seq Scan on feature_geometry fg2  (cost=0.00..31.20 rows=1420
width=4)"
"                          ->  Index Scan using linkadmin_adminid on link_admin la  (cost=0.00..17.82 rows=261
width=8)"
"                                Index Cond: (la.admin_id = ah2.feature_id)"


The query result is empty.

However, using the query

SELECT ah.FEATURE_ID, FG.TILE_NUM FROM
nndb.admin_hierarchy ah
JOIN NNDB.LINK_ADMIN LA
ON ah.FEATURE_ID = LA.ADMIN_ID
JOIN psi.FEATURE_GEOMETRY FG
ON LA.LINK_ID = FG.NNDB_FEATURE_ID
WHERE  fg.tile_num = 8613949  and ah.feature_id in (170303063)


where 170303063 is one of the values returned by the subselect in the first query when run in isolation, you get a
non-emptyresult set, so there seems to be a problem with the subselect. 

If you need any other information to decide whether this is one of the known bugs or a new one, just let me know what
exactlyyou need. I can provide a backup of the three tables in question, which should be enough to isolate the problem. 

Best regards,
Harald



> -----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Gesendet: Mittwoch, 2. September 2009 20:09
> An: Wellmann, Harald
> Cc: pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] Subselect problem
>
> "Wellmann, Harald" <harald.wellmann@harman.com> writes:
> > The problem occurs with PostgreSQL 8.4.0. I cannot
> reproduce it with
> > PostgreSQL 8.3.7.
>
> There are known bugs in 8.4.0 having to do with improperly
> exchanging the ordering of semijoins (IN joins) and other
> joins.  You haven't provided enough information to test
> whether your case is one of them.
> If you can try CVS branch tip or a recent nightly snapshot,
> there might still be enough time to do something about it for
> 8.4.1, if it isn't fixed already.
>
>             regards, tom lane
>

*******************************************
innovative systems GmbH Navigation-Multimedia
Geschaeftsfuehrung: Edwin Summers - Michael Juergen Mauser
Sitz der Gesellschaft: Hamburg - Registergericht: Hamburg HRB 59980

*******************************************
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat
sindoder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und loeschen Sie diese
Mail.Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. 
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have
receivedthis e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosureor distribution of the contents in this e-mail is strictly forbidden. 
*******************************************