Re: Help with sql

Поиск
Список
Период
Сортировка
От Steve Crawford
Тема Re: Help with sql
Дата
Msg-id 4FF78947.5080902@pinpointresearch.com
обсуждение исходный текст
Ответ на Help with sql  (Perry Smith <pedzsan@gmail.com>)
Ответы Re: Help with sql  (Perry Smith <pedzsan@gmail.com>)
Список pgsql-general
On 07/06/2012 02:34 PM, Perry Smith wrote:
> Hi Guys,
>
> This isn't a PostgreSQL specific question but just a SQL question.  If this is not an appropriate question for this
list,please let me know. 
>
> It is also, perhaps, a really silly question.
>
> This query (without the 'explain' keyword) , when executed takes forever and a day:
>
>> condor_development=> explain select id from filesets where id not in ( select fileset_id from
service_pack_fileset_maps); 
>>                                            QUERY PLAN
>> ----------------------------------------------------------------------------------------------
>>   Seq Scan on filesets  (cost=0.00..71937742.00 rows=26088 width=4)
>>     Filter: (NOT (SubPlan 1))
>>     SubPlan 1
>>       ->  Materialize  (cost=0.00..2517.78 rows=95852 width=4)
>>             ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 rows=95852 width=4)
>> (5 rows)
> This query returns within a second:
>
>> condor_development=> explain select id from filesets where id not in ( select distinct fileset_id from
service_pack_fileset_maps); 
>>                                            QUERY PLAN
>> ----------------------------------------------------------------------------------------------
>>   Seq Scan on filesets  (cost=2102.31..3153.53 rows=26088 width=4)
>>     Filter: (NOT (hashed SubPlan 1))
>>     SubPlan 1
>>       ->  HashAggregate  (cost=1903.15..2062.48 rows=15933 width=4)
>>             ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 rows=95852 width=4)
>> (5 rows)
> The difference is the "distinct" keyword in the inner select.
>
> What I'm confused about is why isn't the "distinct" implicit?  I thought the construct "blah in ( select ... )" was
using"sets" and an item (I thought) can not be in a set more than once. 
>
> Perhaps my question is the opposite really?  Why would you not always use "distinct" in the inner select when the
operatoris "in" or "not in" ? 
>
> And if I can throw in another question on top: is there a different method other than "not in" that would work
better?
Actually it is *very* PostgreSQL specific. In fact, it may even be
PostgreSQL *version* specific as you are delving into how the planner
decides how to handle a query.

It appears that the planner is assuming, based on collected stats and
available indexes, that there will be roughly 1/6 the records returned
by the "distinct" query and thus chose a different method to join the
records. One useful piece of information would be the indexes on the two
tables.

As to other methods, you can use:
... where not exists (select 1 from service_pack_fileset_maps where
fileset_id = filesets.id)...
(Note: as alluded to above, ...not in... works better in some releases
and ...not exists... better in others due to improvements over time.)

Still another method:
select id from filesets except select fileset_id from
service_pack_fileset_maps;

Cheers,
Steve


В списке pgsql-general по дате отправления:

Предыдущее
От: ach
Дата:
Сообщение: Re: index and data tablespaces on two separate drives or one RAID 0?
Следующее
От: John R Pierce
Дата:
Сообщение: Re: Re: index and data tablespaces on two separate drives or one RAID 0?