"Jim C. Nasby" <decibel@decibel.org> writes:
> On Thu, May 19, 2005 at 09:31:47AM -0700, Josh Berkus wrote:
>> can test our formula for accuracy and precision. However, such a formula
>> *does* need to take into account concurrent activity, updates, etc ... that
>> is, it needs to approximately estimate the relative cost on a live database,
>> not a test one.
> Well, that raises an interesting issue, because AFAIK none of the cost
> estimate functions currently do that.
I'm unconvinced that it'd be a good idea, either. People already
complain that the planner's choices change when they ANALYZE; if the
current load factor or something like that were to be taken into account
then you'd *really* have a problem with irreproducible behavior.
It might make sense to have something a bit more static, perhaps a GUC
variable that says "plan on the assumption that there's X amount of
concurrent activity". I'm not sure what scale to measure X on, nor
exactly how this would factor into the estimates anyway --- but at least
this approach would maintain reproducibility of behavior.
> Another issue is: what state should the buffers/disk cache be in?
The current cost models are all based on the assumption that every query
starts from ground zero: nothing in cache. Which is pretty bogus in
most real-world scenarios. We need to think about ways to tune that
assumption, too. Maybe this is actually the same discussion, because
certainly one of the main impacts of a concurrent environment is on what
you can expect to find in cache.
regards, tom lane