Re: Help with sql

Поиск
Список
Период
Сортировка
От Perry Smith
Тема Re: Help with sql
Дата
Msg-id 1E64A6FD-FF31-48DE-B1BA-0C50F7AB4521@gmail.com
обсуждение исходный текст
Ответ на Re: Help with sql  (Steve Crawford <scrawford@pinpointresearch.com>)
Ответы Re: Help with sql  (Chris Angelico <rosuav@gmail.com>)
Список pgsql-general
On Jul 6, 2012, at 7:56 PM, Steve Crawford wrote:

> On 07/06/2012 02:34 PM, Perry Smith wrote:
>> Hi Guys,
>>
>> This isn't a PostgreSQL specific question but just a SQL question.  If this is not an appropriate question for this
list,please let me know. 
>>
>> It is also, perhaps, a really silly question.
>>
>> This query (without the 'explain' keyword) , when executed takes forever and a day:
>>
>>> condor_development=> explain select id from filesets where id not in ( select fileset_id from
service_pack_fileset_maps); 
>>>                                           QUERY PLAN
>>> ----------------------------------------------------------------------------------------------
>>>  Seq Scan on filesets  (cost=0.00..71937742.00 rows=26088 width=4)
>>>    Filter: (NOT (SubPlan 1))
>>>    SubPlan 1
>>>      ->  Materialize  (cost=0.00..2517.78 rows=95852 width=4)
>>>            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 rows=95852 width=4)
>>> (5 rows)
>> This query returns within a second:
>>
>>> condor_development=> explain select id from filesets where id not in ( select distinct fileset_id from
service_pack_fileset_maps); 
>>>                                           QUERY PLAN
>>> ----------------------------------------------------------------------------------------------
>>>  Seq Scan on filesets  (cost=2102.31..3153.53 rows=26088 width=4)
>>>    Filter: (NOT (hashed SubPlan 1))
>>>    SubPlan 1
>>>      ->  HashAggregate  (cost=1903.15..2062.48 rows=15933 width=4)
>>>            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 rows=95852 width=4)
>>> (5 rows)
>> The difference is the "distinct" keyword in the inner select.
>>
>> What I'm confused about is why isn't the "distinct" implicit?  I thought the construct "blah in ( select ... )" was
using"sets" and an item (I thought) can not be in a set more than once. 
>>
>> Perhaps my question is the opposite really?  Why would you not always use "distinct" in the inner select when the
operatoris "in" or "not in" ? 
>>
>> And if I can throw in another question on top: is there a different method other than "not in" that would work
better?
> Actually it is *very* PostgreSQL specific. In fact, it may even be PostgreSQL *version* specific as you are delving
intohow the planner decides how to handle a query. 
>
> It appears that the planner is assuming, based on collected stats and available indexes, that there will be roughly
1/6the records returned by the "distinct" query and thus chose a different method to join the records. One useful piece
ofinformation would be the indexes on the two tables. 
>
> As to other methods, you can use:
> ... where not exists (select 1 from service_pack_fileset_maps where fileset_id = filesets.id)...
> (Note: as alluded to above, ...not in... works better in some releases and ...not exists... better in others due to
improvementsover time.) 
>
> Still another method:
> select id from filesets except select fileset_id from service_pack_fileset_maps;

Thanks guys.

Small side note: I thought I saw "set difference" in the documentation but I couldn't find it.  It appears "EXCEPT" is
setdifference.  Thank you for that tidbit. 

The database is mostly static.  I run through a very lengthy process to populate the database maybe once a month and
thenit is 99% read-only.  By far, most of the accesses are via a view that I have that is rather long and ugly so I
won'tpaste it in.  I've tried to make this particular view as fast as possible so the indexes, etc are what I think
willhelp that out. 

The version is psql (PostgreSQL) 9.0.4

Each table has a key of "id" -- this database was created by / used by Ruby on Rails and that is how it likes to do
things.

> condor_development=> \d service_pack_fileset_maps
>                                         Table "public.service_pack_fileset_maps"
>      Column      |            Type             |                               Modifiers
  
>
-----------------+-----------------------------+------------------------------------------------------------------------
>  id              | integer                     | not null default
nextval('service_pack_fileset_maps_id_seq'::regclass)
>  service_pack_id | integer                     | not null
>  fileset_id      | integer                     | not null
>  created_at      | timestamp without time zone |
>  updated_at      | timestamp without time zone |
> Indexes:
>     "service_pack_fileset_maps_pkey" PRIMARY KEY, btree (id)
>     "service_pack_fileset_maps_service_pack_id_key" UNIQUE, btree (service_pack_id, fileset_id)
>     "index_service_pack_fileset_maps_on_fileset_id" btree (fileset_id)
> Foreign-key constraints:
>     "service_pack_fileset_maps_fileset_id_fkey" FOREIGN KEY (fileset_id) REFERENCES filesets(id) ON DELETE CASCADE
DEFERRABLE
>     "service_pack_fileset_maps_service_pack_id_fkey" FOREIGN KEY (service_pack_id) REFERENCES service_packs(id) ON
DELETECASCADE DEFERRABLE 

> condor_development=> \d filesets
>                                      Table "public.filesets"
>    Column   |            Type             |                       Modifiers
> ------------+-----------------------------+-------------------------------------------------------
>  id         | integer                     | not null default nextval('filesets_id_seq'::regclass)
>  lpp_id     | integer                     | not null
>  vrmf       | character varying(255)      | not null
>  created_at | timestamp without time zone |
>  updated_at | timestamp without time zone |
> Indexes:
>     "filesets_pkey" PRIMARY KEY, btree (id)
>     "filesets_lpp_id_key" UNIQUE, btree (lpp_id, vrmf)
> Foreign-key constraints:
>     "filesets_lpp_id_fkey" FOREIGN KEY (lpp_id) REFERENCES lpps(id) ON DELETE CASCADE DEFERRABLE
> Referenced by:
>     TABLE "fileset_aix_file_maps" CONSTRAINT "fileset_aix_file_maps_fileset_id_fkey" FOREIGN KEY (fileset_id)
REFERENCESfilesets(id) ON DELETE CASCADE DEFERRABLE 
>     TABLE "fileset_ptf_maps" CONSTRAINT "fileset_ptf_maps_fileset_id_fkey" FOREIGN KEY (fileset_id) REFERENCES
filesets(id)ON DELETE CASCADE DEFERRABLE 
>     TABLE "package_fileset_maps" CONSTRAINT "package_fileset_maps_fileset_id_fkey" FOREIGN KEY (fileset_id)
REFERENCESfilesets(id) ON DELETE CASCADE DEFERRABLE 
>     TABLE "service_pack_fileset_maps" CONSTRAINT "service_pack_fileset_maps_fileset_id_fkey" FOREIGN KEY (fileset_id)
REFERENCESfilesets(id) ON DELETE CASCADE DEFERRABLE 
>     TABLE "upd_pc_views" CONSTRAINT "upd_pc_views_fileset_id_fkey" FOREIGN KEY (fileset_id) REFERENCES filesets(id)
ONDELETE CASCADE DEFERRABLE 

Thank you again for your help,
pedz



В списке pgsql-general по дате отправления:

Предыдущее
От: Joe Miller
Дата:
Сообщение: Re: Re: index and data tablespaces on two separate drives or one RAID 0?
Следующее
От: Chris Angelico
Дата:
Сообщение: Re: Help with sql