Re: problem (bug?) with "in (subquery)"

Поиск
Список
Период
Сортировка
От Luca Pireddu
Тема Re: problem (bug?) with "in (subquery)"
Дата
Msg-id 200507150952.01732.lucap@shaw.ca
обсуждение исходный текст
Ответ на Re: problem (bug?) with "in (subquery)"  (Michael Fuhr <mike@fuhr.org>)
Ответы Re: problem (bug?) with "in (subquery)"  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
On July 15, 2005 07:34, Michael Fuhr wrote:
> On Thu, Jul 14, 2005 at 01:34:21AM -0600, Luca Pireddu wrote:
> > I have the following query that isn't behaving like I would expect:

Thanks for creating the reduced test case Michael.  My apologies for not doing 
it myself.

> > 
> > select * from strains s where s.id in (select strain_id from 
pathway_strains);
> 
> Any reason the subquery isn't doing "SELECT DISTINCT strain_id"?

because I don't need to according to the specification of "in".  However, it 
does generate the correct output.  So does

select distinct * from strains s where s.id in (select strain_id from 
pathway_strains);

> 
> > I would expect each strain record to appear only once.  Instead I get 
output 
> > like this, where the same strain id appears many times:
> > 
> >   id   |     name     | organism
> > -------+--------------+----------
> >     83 | common       |       82 
> >     83 | common       |       82 
> >     83 | common       |       82 
> 
> What happens when you try each of the following?  Do they give the
> expected results?  I did some tests and I'm wondering if the planner's
> hash join is responsible for the duplicate rows.
> 
> SELECT * FROM strains WHERE id IN (
>   SELECT strain_id FROM pathway_strains ORDER BY strain_id
> );

With the "order by"  it works as it should, not generating duplicate rows.

> 
> CREATE TEMPORARY TABLE foo AS SELECT strain_id FROM pathway_strains;
> SELECT * FROM strains WHERE id IN (SELECT strain_id FROM foo);

This one's interesting.  It only returns the unique rows.  

> 
> SET enable_hashjoin TO off;
> SELECT * FROM strains WHERE id IN (SELECT strain_id FROM pathway_strains);

With hashjoin off the query returns the correct output.

On July 15, 2005 08:58, Tom Lane wrote:
> Ah-hah: this one is the fault of create_unique_path, which quoth
> 
>     /*
>      * If the input is a subquery whose output must be unique already, we
>      * don't need to do anything.
>      */
> 
> Of course, that needs to read "... unique already, *and we are using all
> of its output columns in our DISTINCT list*, we don't need to do
> anything."
> 
>             regards, tom lane

In any case, it looks like Tom has already found the problem :-)  Thanks guys!

Luca


В списке pgsql-sql по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: problem (bug?) with "in (subquery)"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: problem (bug?) with "in (subquery)"