Re: Optimizer improvements: to do or not to do?

Поиск
Список
Период
Сортировка
От Say42
Тема Re: Optimizer improvements: to do or not to do?
Дата
Msg-id 1158145203.102440.55670@b28g2000cwb.googlegroups.com
обсуждение исходный текст
Ответ на Re: Optimizer improvements: to do or not to do?  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: Optimizer improvements: to do or not to do?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Peter Eisentraut wrote:
> But you haven't offered any analysis about the cause of this problem, so any
> speculation about normalization, usual cases, caching effects and so on are
> unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where   exists (select code from belg_mobile tc       where c.bnum >= tc.code
andc.bnum like tc.code || '%'       order by tc.code desc limit 1)
 

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost) + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost) + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery) ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops) + 0.01 (cpu_tuple_cost)) * 68 (record count) = 1
(pagesfetch cost) + 1.19 (cpu cost) = 2.19
 

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost) * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Teodor Sigaev
Дата:
Сообщение: GIN documentation
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Getting a move on for 8.2 beta