Proposal for new SET variables for optimizer costs
От | Tom Lane |
---|---|
Тема | Proposal for new SET variables for optimizer costs |
Дата | |
Msg-id | 14601.949786166@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: [HACKERS] Proposal for new SET variables for optimizer costs
(Bruce Momjian <pgman@candle.pha.pa.us>)
Re: [HACKERS] Proposal for new SET variables for optimizer costs (Philip Warner <pjw@rhyme.com.au>) |
Список | pgsql-hackers |
I am about to implement some changes to the planner/optimizer's cost model, following up to the thread on pghackers beginning on 20 Jan. The main conclusion of that thread was that we needed to charge more for a page fetched nonsequentially than for a page fetched sequentially. After further investigation I have concluded that it is also appropriate to include explicit modeling of the cost of evaluation of WHERE clauses. For example, using the regression database and a query like select * from tenk1 where (unique1 = 1 and unique2 = 101) or (unique1 = 2 and unique2 = 102) or (unique1 = 3 and unique2 = 103) or ... 100 OR clauses ... (unique1 = 100 and unique2 = 200); (which is not too implausible for certain automatic query generators), I observe that a sequential scan takes about 6 seconds, vs. less than a second for a similar query with only 10 clauses. That says that the cost of evaluating a WHERE clause this large is far from negligible. The optimizer needs to account for this because different query plans can have a considerable impact on the number of tuples that the WHERE clause is evaluated for --- in this example, if we use indexscans to pull out just the tuples with the right values of 'unique1', then the WHERE clause need only be checked at 100 tuples, not all 10000. I believe it would be reasonable to charge a certain amount per operator or function appearing in the WHERE clause in order to account for this effect. (Currently I see no need to model the cost of evaluating the targetlist expressions. The same expressions should get evaluated for the same tuples no matter what query plan the optimizer picks, so we might as well just leave that cost out of our comparisons.) Also, as was previously mentioned on pghackers, I would like to add SET variables to control enabling/disabling of particular query plan types, so that different plans can be checked with less hassle than restarting psql with a new PGOPTIONS setting. This all leads to the following proposal for redoing the optimizer plan cost SET variables. The variables proposed below would replace COST_HEAP and COST_INDEX, which are poorly named IMHO and are definitely very misleadingly documented at present. (Note that all costs will still be referenced to the cost of a disk page fetch. We will take 1.0 as the cost of a sequential page fetch.) SET variable name Internal variable Proposed default RANDOM_PAGE_COST random_page_cost 4.0 Cost of fetching a disk block nonsequentially (as a multiple of the cost of a sequential block fetch). CPU_TUPLE_COST cpu_tuple_cost 0.01 Cost of CPU time per tuple processed within a query (as a fraction of the cost of a sequential disk block fetch). This renames the existing SET variable COST_HEAP (cpu_page_weight); but the default value is smaller than it used to be, since WHERE clause evaluation will now be accounted for separately. CPU_INDEX_TUPLE_COST cpu_index_tuple_cost 0.001 Cost of CPU time per index tuple processed within a query (as a fraction of the cost of a sequential disk block fetch). This renames the existing SET variable COST_INDEX (cpu_index_page_weight); but the default value is much smaller than it used to be, since the operator evaluation cost will account for the bulk of the cost of visiting an index tuple. CPU_OPERATOR_COST cpu_operator_cost 0.0025 Cost of CPU time per operator or function evaluated in a WHERE clause. Note that this would apply to operators evaluated at index tuples as well as those evaluated against heap tuples. (The proposed default corresponds to a ratio of 5 microsec against 2 millisec for a sequential block fetch, which seems to be about right on my workstation.) ENABLE_SEQSCAN enable_seqscan ON ENABLE_INDEXSCAN enable_indexscan ON ENABLE_TIDSCAN enable_tidscan ON ENABLE_SORT enable_sort ON ENABLE_NESTLOOP enable_nestloop ON ENABLE_MERGEJOIN enable_mergejoin ON ENABLE_HASHJOIN enable_hashjoin ON Provide access via SET to the already-existing internal optimizer control flags. Currently, it is possible to have COST_HEAP and COST_INDEX set automatically during connection startup; libpq will do that if the environment variables PGCOSTHEAP and/or PGCOSTINDEX are defined on the client side. If we want to continue that behavior, the environment variables for these variables would be named PGRANDOMPAGECOST etc (remove underscores and prepend PG). I'm not sure if we want to continue inventing client-side environment variables, however. Comments? Ideas for better names? Anyone object to renaming the existing variables? (BTW, although it could be argued that this might break existing scripts that set COST_HEAP or COST_INDEX, I doubt that there are any ... and given the existing doco, I doubt even more that anyone is setting appropriate values ...) regards, tom lane
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Dmitry SamersoffДата:
Сообщение: RE: [HACKERS] Linux MANDRAKE startup startup script is broken ?
Следующее
От: Bruce MomjianДата:
Сообщение: Re: [HACKERS] Proposal for new SET variables for optimizer costs