От Tilo Buschmann
Тема Re: fast DISTINCT or EXIST
Msg-id 20070407182407.297e7e35@wonderland
обсуждение исходный текст
Ответ на Re: fast DISTINCT or EXIST  (Tom Lane)
Ответы Re: fast DISTINCT or EXIST  (Tom Lane)
Re: fast DISTINCT or EXIST  (Arjen van der Meijden)
Список pgsql-performance
Дерево обсуждения
fast DISTINCT or EXIST  (Tilo Buschmann, )
 Re: fast DISTINCT or EXIST  (Arjen van der Meijden, )
  Re: fast DISTINCT or EXIST  (Tom Lane, )
   Re: fast DISTINCT or EXIST  (Tilo Buschmann, )
    Re: fast DISTINCT or EXIST  (Tom Lane, )
    Re: fast DISTINCT or EXIST  (Arjen van der Meijden, )
Hi everyone,

On Sat, 07 Apr 2007 11:54:08 -0400
Tom Lane <> wrote:

> Arjen van der Meijden <> writes:
> > If that is your main culprit, you could also use two limits based on the
> > fact that there will be at most X songs per cd which would match your
> > title (my not very educated guess is 3x). Its a bit ugly... but if that
> > is what it takes to make postgresql not scan your entire index, so be it...
> > SELECT ... FROM cd
> >    JOIN tracks ...
> > WHERE IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
> >       WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30)
> > as foo LIMIT 10)
> I think that's the only way.  There is no plan type in Postgres that
> will generate unique-ified output without scanning the whole input
> first, except for Uniq on pre-sorted input, which we can't use here
> because the tsearch scan isn't going to deliver the rows in cd_id order.

Unfortunately, the query above will definitely not work correctly, if
someone searches for "a" or "the".

The correct query does not perform as well as I hoped.

cddb=# EXPLAIN ANALYSE SELECT cd.cd_id,cd.artist,cd.title,tracks.title FROM cd JOIN tracks USING (cd_id) WHERE cd_id IN
(SELECTDISTINCT tracks.cd_id FROM tracks WHERE tracks.tstitle @@ plainto_tsquery('simple','sympathy') LIMIT 10); 
                                                                              QUERY PLAN

 Nested Loop  (cost=61031.41..64906.58 rows=139 width=69) (actual time=31236.562..31810.940 rows=166 loops=1)
   ->  Nested Loop  (cost=61031.41..61176.20 rows=10 width=50) (actual time=31208.649..31388.289 rows=10 loops=1)
         ->  Limit  (cost=61031.41..61089.74 rows=10 width=4) (actual time=31185.972..31186.024 rows=10 loops=1)
               ->  Unique  (cost=61031.41..61124.74 rows=16 width=4) (actual time=31185.967..31186.006 rows=10 loops=1)
                     ->  Sort  (cost=61031.41..61078.07 rows=18665 width=4) (actual time=31185.961..31185.977 rows=11
                           Sort Key: public.tracks.cd_id
                           ->  Bitmap Heap Scan on tracks  (cost=536.76..59707.31 rows=18665 width=4) (actual
time=146.222..30958.057rows=1677 loops=1) 
                                 Recheck Cond: (tstitle @@ '''sympathy'''::tsquery)
                                 ->  Bitmap Index Scan on tstitle_tracks_idx  (cost=0.00..532.09 rows=18665 width=0)
(actualtime=126.328..126.328 rows=1677 loops=1) 
                                       Index Cond: (tstitle @@ '''sympathy'''::tsquery)
         ->  Index Scan using cd_id_key on cd  (cost=0.00..8.62 rows=1 width=46) (actual time=20.218..20.219 rows=1
               Index Cond: (cd.cd_id = "IN_subquery".cd_id)
   ->  Index Scan using cdid_tracks_idx on tracks  (cost=0.00..358.08 rows=1197 width=27) (actual time=39.935..42.247
         Index Cond: (cd.cd_id = public.tracks.cd_id)
 Total runtime: 31811.256 ms
(15 rows)

It gets better when the rows are in memory (down to 10.452 ms), but
Murphy tells me, that the content that I need will never be in memory.

I think I disregarded this variant at first, because it limits the
possibility to restrict the cd artist and title.

> I can see how to build one: make a variant of HashAggregate that returns
> each input row immediately after hashing it, *if* it isn't a duplicate
> of one already in the hash table.  But it'd be a lot of work for what
> seems a rather specialized need.


Actually, I hoped to find an alternative, that does not involve

Best Regards,


В списке pgsql-performance по дате отправления:

Сообщение: Re: SCSI vs SATA
От: Ron
Сообщение: Re: SCSI vs SATA