Re: why query plan for the inner SELECT of WHERE x IN is wrong, but when run the inner query alone is OK?

Поиск
Список
Период
Сортировка
От Miernik
Тема Re: why query plan for the inner SELECT of WHERE x IN is wrong, but when run the inner query alone is OK?
Дата
Msg-id 20080809223208.7297.0.NOFFLE@turbacz.local
обсуждение исходный текст
Ответ на why query plan for the inner SELECT of WHERE x IN is wrong, but when run the inner query alone is OK?  (Miernik <public@public.miernik.name>)
Список pgsql-performance
Miernik <public@public.miernik.name> wrote:
> Something goes wrong that this query plan thinks there is only gonna be
> 1 row from (SELECT uid FROM alog ... ) so chooses such query plan, and
> thus it runs forever (at least so long that I didn't bother to wait,
> like 10 minutes):
>
>
> miernik=> EXPLAIN UPDATE cnts SET p0 = FALSE WHERE uid IN (SELECT uid FROM alog WHERE pid = 3452654 AND o = 1);
>                                          QUERY PLAN
> -----------------------------------------------------------------------------------------------
> Nested Loop IN Join  (cost=0.00..3317.34 rows=1 width=44)
>   ->  Seq Scan on cnts  (cost=0.00..36.00 rows=2000 width=44)
>   ->  Index Scan using alog_uid_idx on alog  (cost=0.00..296.95 rows=1 width=4)
>         Index Cond: ((alog.uid)::integer = (cnts.uid)::integer)
>         Filter: ((alog.pid = 3452654::numeric) AND (alog.o = 1::numeric))
> (5 rows)

Well, in fact its not only the autovacuum/manual VACUUM ANALYZE that
changed, its a new copy of the cnts table with only 1000 rows, and
before it was a 61729 row table. The new, smaller, 1000 row table is
recreated, but I have a copy of the old 61729 row table, and guess what?
It runs correctly! And the query plan of the exactly the same query, on
a table of the exactly same structure and indexes, differing only by
having 61729 rows instead of 1000 rows, is like this:

I've done a SELECT uid plan, instead of an UPDATE plan, but it should
be no difference. This is a plan that is quick:

miernik=> EXPLAIN SELECT uid FROM cnts_old WHERE uid IN (SELECT uid FROM alog WHERE pid = 3452654 AND o = 1);
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Nested Loop  (cost=9077.07..9238.61 rows=12 width=4)
   ->  HashAggregate  (cost=9077.07..9077.29 rows=22 width=4)
         ->  Bitmap Heap Scan on alog  (cost=93.88..9069.00 rows=3229 width=4)
               Recheck Cond: ((pid = 3452654::numeric) AND (o = 1::numeric))
               ->  Bitmap Index Scan on alog_pid_o  (cost=0.00..93.07 rows=3229 width=0)
                     Index Cond: ((pid = 3452654::numeric) AND (o = 1::numeric))
   ->  Index Scan using cnts_old_pkey on cnts_old  (cost=0.00..7.32 rows=1 width=4)
         Index Cond: ((cnts_old.uid)::integer = (alog.uid)::integer)
(8 rows)


I present a SELECT uid plan with the 1000 table also below, just to be
sure, this is the "bad" plan, that takes forever:

miernik=> EXPLAIN SELECT uid FROM cnts WHERE uid IN (SELECT uid FROM alog WHERE pid = 3452654 AND o = 1);
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop IN Join  (cost=0.00..3532.70 rows=1 width=4)
   ->  Seq Scan on cnts  (cost=0.00..26.26 rows=1026 width=4)
   ->  Index Scan using alog_uid_idx on alog  (cost=0.00..297.32 rows=1 width=4)
         Index Cond: ((alog.uid)::integer = (cnts.uid)::integer)
         Filter: ((alog.pid = 3452654::numeric) AND (alog.o = 1::numeric))
(5 rows)


I've also got a version of the cnts table with only 14 rows, called
cnts_small, and the query plan on that one is below:

miernik=> EXPLAIN SELECT uid FROM cnts_small WHERE uid IN (SELECT uid FROM alog WHERE pid = 3452654 AND o = 1);
                                                                          QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop IN Join  (cost=99.05..1444.29 rows=1 width=4)
   ->  Seq Scan on cnts_small  (cost=0.00..1.14 rows=14 width=4)
   ->  Bitmap Heap Scan on alog  (cost=99.05..103.07 rows=1 width=4)
         Recheck Cond: (((alog.uid)::integer = (cnts_small.uid)::integer) AND (alog.pid = 3452654::numeric) AND (alog.o
=1::numeric)) 
         ->  BitmapAnd  (cost=99.05..99.05 rows=1 width=0)
               ->  Bitmap Index Scan on alog_uid_idx  (cost=0.00..5.21 rows=80 width=0)
                     Index Cond: ((alog.uid)::integer = (cnts_small.uid)::integer)
               ->  Bitmap Index Scan on alog_pid_o  (cost=0.00..92.78 rows=3229 width=0)
                     Index Cond: ((alog.pid = 3452654::numeric) AND (alog.o = 1::numeric))
(9 rows)

That one is fast too. And the structure and indexes of cnts_small is
exactly the same as of cnts and cnts_old. So it works OK if I use a 14
row table and if I use a 61729 row table, but breaks when I use a 1000
row table. Any ideas?

--
Miernik
http://miernik.name/

В списке pgsql-performance по дате отправления:

Предыдущее
От: Miernik
Дата:
Сообщение: Re: why query plan for the inner SELECT of WHERE x IN is wrong, but when run the inner query alone is OK?
Следующее
От: Miernik
Дата:
Сообщение: Re: why query plan for the inner SELECT of WHERE x IN is wrong, but when run the inner query alone is OK?