Обсуждение: Exists, limit and alternate plans

Поиск
Список
Период
Сортировка

Exists, limit and alternate plans

От
Віталій Тимчишин
Дата:
Hello.

Today I've found out strange results for query below.
select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.2 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC) 4.2.1 20070719  [FreeBSD], 64-bit

--Original query:
explain analyze select exists(select * from investor i where i.company_id = this_.id) from COMPANY this_ order by this_.rank desc, this_.id asc limit 10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=144.489..144.556 rows=10 loops=1)
   ->  Index Scan using comp_rank_id on company this_  (cost=0.00..34616009.08 rows=6831169 width=16) (actual time=144.484..144.524 rows=10 loops=1)
         SubPlan 1
           ->  Index Scan using company_invs on investor i  (cost=0.00..9.52 rows=2 width=0) (never executed)
                 Index Cond: ((company_id)::bigint = $0)
         SubPlan 2
           ->  Seq Scan on investor i  (cost=0.00..1836.17 rows=41717 width=8) (actual time=0.006..72.364 rows=41722 loops=1)
 Total runtime: 144.975 ms
(8 rows)

--set enable_seqscan=false;
explain analyze select exists(select * from investor i where i.company_id = this_.id) from COMPANY this_ order by this_.rank desc, this_.id asc limit 10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=0.045..0.177 rows=10 loops=1)
   ->  Index Scan using comp_rank_id on company this_  (cost=0.00..34616009.08 rows=6831169 width=16) (actual time=0.041..0.146 rows=10 loops=1)
         SubPlan 1
           ->  Index Scan using company_invs on investor i  (cost=0.00..9.52 rows=2 width=0) (actual time=0.007..0.007 rows=1 loops=10)
                 Index Cond: ((company_id)::bigint = $0)
         SubPlan 2
           ->  Seq Scan on investor i  (cost=10000000000.00..10000001836.17 rows=41717 width=8) (never executed)
 Total runtime: 0.253 ms
(8 rows)

--limit inside exists
explain analyze select exists(select * from investor i where i.company_id = this_.id limit 1) from COMPANY this_ order by this_.rank desc, this_.id asc limit 10;
 Limit  (cost=0.00..50.67 rows=10 width=16) (actual time=0.052..0.219 rows=10 loops=1)
   ->  Index Scan using comp_rank_id on company this_  (cost=0.00..34616009.08 rows=6831169 width=16) (actual time=0.049..0.189 rows=10 loops=1)
         SubPlan 1
           ->  Limit  (cost=0.00..4.76 rows=1 width=422) (actual time=0.011..0.011 rows=1 loops=10)
                 ->  Index Scan using company_invs on investor i  (cost=0.00..9.52 rows=2 width=422) (actual time=0.007..0.007 rows=1 loops=10)
                       Index Cond: ((company_id)::bigint = $0)
 Total runtime: 0.291 ms
(7 rows)

So, my Qs:
1) Do we really have alternative plans for SubPlan that are selected at runtime? Wow.
2) Why "Seq scan" plan is selected by default? Is it because of outer limit not being applied when calculating costs for subplans at runtime?
3) Why does limit inside exists helps? Is it simply because new "alternative" logic in not applied for "complex case"?

--
Best regards,
 Vitalii Tymchyshyn

Re: Exists, limit and alternate plans

От
Tom Lane
Дата:
=?KOI8-U?B?96bUwcymyiD0yc3eydvJzg==?= <tivv00@gmail.com> writes:
> So, my Qs:
> 1) Do we really have alternative plans for SubPlan that are selected at
> runtime? Wow.

Yup, see the AlternativeSubPlan stuff.

> 2) Why "Seq scan" plan is selected by default? Is it because of outer limit
> not being applied when calculating costs for subplans at runtime?

It's currently driven off the estimated rowcount for the parent plan
node --- 6831169 in your example.  The subplan cannot see that the
parent plan node will be terminated short of full execution, so it
thinks that hashing the whole investor table will be a win.
Obviously it'd be nice to improve that for cases like upper LIMITs.

> 3) Why does limit inside exists helps?

I think it defeats the applicability of the hash-the-whole-subtable
approach.

            regards, tom lane