Обсуждение: cost_nonsequential_access()
The comment describing cost_nonsequential_access() says that the two
functions "meet in the middle". They meet at random_page_cost/2,
however, not in the middle between 1 and random_page_cost. For
random_page_cost < 2 the result can be less than 1 for relpages near
effective_cache_size. I don't think that this was intended.
This patch makes sure that cost_nonsequential_access() is always between
1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.
Servus
Manfred
diff -Ncr ../base/src/backend/optimizer/path/costsize.c src/backend/optimizer/path/costsize.c
*** ../base/src/backend/optimizer/path/costsize.c Tue Jun 1 05:02:52 2004
--- src/backend/optimizer/path/costsize.c Tue Jun 8 08:34:27 2004
***************
*** 189,199 ****
* for now by assuming we are given an effective_cache_size parameter.
*
* Given a guesstimated cache size, we estimate the actual I/O cost per page
! * with the entirely ad-hoc equations:
* if relpages >= effective_cache_size:
! * random_page_cost * (1 - (effective_cache_size/relpages)/2)
* if relpages < effective_cache_size:
! * 1 + (random_page_cost/2-1) * (relpages/effective_cache_size) ** 2
* These give the right asymptotic behavior (=> 1.0 as relpages becomes
* small, => random_page_cost as it becomes large) and meet in the middle
* with the estimate that the cache is about 50% effective for a relation
--- 189,200 ----
* for now by assuming we are given an effective_cache_size parameter.
*
* Given a guesstimated cache size, we estimate the actual I/O cost per page
! * with the entirely ad-hoc equations (writing rpc for random_page_cost and
! * relsize for relpages/effective_cache_size):
* if relpages >= effective_cache_size:
! * rpc - (rpc-1)/2 * 1/relsize
* if relpages < effective_cache_size:
! * 1 + (rpc-1)/2 * relsize
* These give the right asymptotic behavior (=> 1.0 as relpages becomes
* small, => random_page_cost as it becomes large) and meet in the middle
* with the estimate that the cache is about 50% effective for a relation
***************
*** 213,221 ****
relsize = relpages / effective_cache_size;
if (relsize >= 1.0)
! return random_page_cost * (1.0 - 0.5 / relsize);
else
! return 1.0 + (random_page_cost * 0.5 - 1.0) * relsize * relsize;
}
/*
--- 214,222 ----
relsize = relpages / effective_cache_size;
if (relsize >= 1.0)
! return random_page_cost - (random_page_cost - 1.0) / 2.0 / relsize;
else
! return 1.0 + (random_page_cost - 1.0) / 2.0 * relsize;
}
/*
Manfred Koizar <mkoi-pg@aon.at> writes:
> The comment describing cost_nonsequential_access() says that the two
> functions "meet in the middle". They meet at random_page_cost/2,
> however, not in the middle between 1 and random_page_cost. For
> random_page_cost < 2 the result can be less than 1 for relpages near
> effective_cache_size. I don't think that this was intended.
You're right, I failed to consider that random_page_cost might be less
than 2.
> This patch makes sure that cost_nonsequential_access() is always between
> 1 and randon_page_cost and the functions meet a (1+random_page_cost)/2.
This patch seems to do considerably more violence to the equations than
is needed to cover that oversight, though. The old behavior was
intentionally nonlinear in relsize; this is not.
regards, tom lane
On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >This patch seems to do considerably more violence to the equations than >is needed to cover that oversight, though. The old behavior was >intentionally nonlinear in relsize; this is not. The comment says "entirely ad-hoc" and I didn't see a particular reason why the lower half should be nonlinear in relsize while the upper half is linear in 1/relsize. So I opted for the "more esthetic" symmetrical function. :-) http://www.pivot.at/pg/costsize.jpg original http://www.pivot.at/pg/costsize_2.jpg nonlinear/linear http://www.pivot.at/pg/costsize_3a.jpg nonlinear http://www.pivot.at/pg/costsize_3b.jpg linear I don't have a strong opinion for either relsize or relsize^2. So please add " * relsize" or " / relsize" as appropriate before you apply (if you intend to apply). Servus Manfred
Manfred Koizar <mkoi-pg@aon.at> writes:
> On Tue, 08 Jun 2004 11:13:43 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This patch seems to do considerably more violence to the equations than
>> is needed to cover that oversight, though. The old behavior was
>> intentionally nonlinear in relsize; this is not.
> The comment says "entirely ad-hoc" and I didn't see a particular reason
> why the lower half should be nonlinear in relsize while the upper half
> is linear in 1/relsize.
Incremental changes in the relsize fraction are not going to change the
cost much except near 1, so I was after a curve that went like this
(pardon the crude artwork):
rpc ***********************
*******
****
**
cost *
*
*
**
****
********
1.0 *************
0 1 large
relsize fraction
I don't think replacing the lower half of this with a straight line
segment is an improvement.
Possibly the relsize axis ought to be measured on a log scale, or
something like that, but that didn't seem to work nicely when relsize
approaches zero.
regards, tom lane
On Tue, 08 Jun 2004 13:13:01 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Possibly the relsize axis ought to be measured on a log scale, or >something like that, but that didn't seem to work nicely when relsize >approaches zero. In my experiments I used log(relsize) on the x axis, and I don't think that the graph looks unpleasant for small relsize. My thought was (and is) that we are much more interested in whether relpages is 1/100, 1/10, 1, 10, 100 times effective_cache_size than whether it is relpages +/- 1000, 2000, 3000, ... I played around with some numbers that could be considered fairly realistic. You might want to look at the graphs I linked to in the previous message or download http://www.pivot.at/pg/costsize.sxc. But I think we are wasting too much effort. The graphs don't look too different, whether you use relsize or relsize^2. Maybe relsize^3 is optimal? Nobody knows. The important part of the patch is that the result is scaled and shifted into the range 1 to random_page_cost. Whatever you decide to do is ok with me. Servus Manfred