Re: Query plan, nested EXISTS

Поиск
Список
Период
Сортировка
От Matt Daw
Тема Re: Query plan, nested EXISTS
Дата
Msg-id CAA2LLOFkL2VepKAUsRUirfgJeT5P=SMUNntTOd194cR_1xooMA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Query plan, nested EXISTS  (Matt Daw <matt@shotgunsoftware.com>)
Список pgsql-performance
Hi Tom, v9.2.1 looks good!

 Aggregate  (cost=420808.99..420809.00 rows=1 width=0) (actual time=147.345..147.345 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=0.00..420786.71 rows=8914 width=0) (actual time=13.847..147.219 rows=894 loops=1)
         ->  Index Scan using notes_retirement_date_project on notes a  (cost=0.00..67959.22 rows=12535 width=4) (actual time=13.811..71.741 rows=12469 loops=1)
               Index Cond: (project_id = 114)
         ->  Nested Loop Semi Join  (cost=0.00..28.14 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=12469)
               ->  Index Scan using note_links_note on note_links b  (cost=0.00..12.37 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12469)
                     Index Cond: (note_id = a.id)
                     Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text))
                     Rows Removed by Filter: 1
               ->  Index Scan using versions_pkey on versions c  (cost=0.00..15.76 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=11794)
                     Index Cond: (id = b.entity_id)
                     Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text))
                     Rows Removed by Filter: 1
 Total runtime: 147.411 ms
(14 rows)

On Fri, Sep 28, 2012 at 2:47 PM, Matt Daw <matt@shotgunsoftware.com> wrote:
Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance and report back.

Matt


On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Daw <matt@shotgunsoftware.com> writes:
> Howdy, I've been debugging a client's slow query today and I'm curious
> about the query plan. It's picking a plan that hashes lots of rows from the
> versions table (on v9.0.10)...

> EXPLAIN ANALYZE
> SELECT COUNT(*) FROM notes a WHERE
> a.project_id = 114 AND
> EXISTS (
>     SELECT 1 FROM note_links b
>     WHERE
>     b.note_id = a.id AND
>     b.entity_type = 'Version' AND
>     EXISTS (
>         SELECT 1 FROM versions c
>         WHERE
>         c.id = b.entity_id AND
>         c.code ILIKE '%comp%' AND
>         c.retirement_date IS NULL
>     ) AND
>     b.retirement_date IS NULL
> )

I think the real problem here is that 9.0 is incapable of avoiding a
full table scan on "note_links", which means it doesn't really have any
better option than to do the inner EXISTS as a full-table semijoin.
This is because it can't push a.id down through two levels of join, and
because the semijoins don't commute, there's no way to get a.id into the
scan of note_links to pull out only the useful rows.  The hack with
LIMIT avoids this problem by preventing the inner EXISTS from being
treated as a full-fledged semijoin; but of course that hack leaves you
vulnerable to very bad plans if the statistics are such that a nestloop
join isn't the best bet for the inner EXISTS.

The work I did for parameterized paths in 9.2 was intended to address
exactly this type of scenario.  I would be interested to know if 9.2
does this any better for you.

                        regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Possible Performance Regression with Transitive Comparisons vs. Constants
Следующее
От: Виктор Егоров
Дата:
Сообщение: NestedLoops over BitmapScan question