Kevin,
> I have to keep a very narrow focus on this, or there is likely that
> nothing will come of it. The particular area which is my target
> here is the accuracy of the cost values on the subplans
> considered by the optimizer.
Sure. What the rest of us are focused on is helping you build a generally
useful tool which can be used to solve future problems and unexpected
performance issues as well. I really see needing to collect all of the
information possible from EXPLAIN ANALYZE ... not collecting just three bits
and throwing the rest of the stuff away.
I'd use a structure more like:
query_instances (
query text
run_instance int
estimated_cost float
analyze_time float
actual_time float
time_run timestamp
GUCs text[]
)
query_steps (
query_instance FK
step_id SERIAL
parent_step FK
node_name text
node_type text FK
cost_start float
cost_end float
est_rows INT8
time_start float
time_end float
actual_rows INT8
loops INT8
db_object name
condition_type text FK
condition_detail text
)
so, for example, the query step:
" -> Seq Scan on detail0009 (cost=0.00..20500.11
rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1)"
" Filter: ((txcontenttype ~~* '%html%'::text)
AND ((vchost)::text ~~* '%www.%'::text))"
Could be displayed as:
query_instance 12008
step_id 14701
parent_step 14698
node_name Seq Scan on detail0009
node_type Seq Scan
cost_start 0
cost_end 20500.11
est_rows 26
time_start 453.0
time_end 5983.0
actual_rows 53588
loops 1
db_object detail009
condition_type Filter
condition_detail ((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~*
'%www.%'::text))
By collecting all of this data, you make it possible to perform other sorts of
analysis on the cost estimates. For example, statistical analysis might
tell us that 3-or-more-condition filters take significantly longer to execute
than single-condition filters, which would be important to know for the cost
model. Limiting it to collecting only 3 of the 13 bits of node data produced
by EA would very much limit the usefulness of the tool and the reliability of
its statistics.
--
Josh Berkus
Aglio Database Solutions
San Francisco