Обсуждение: how to calibrate the cost model parameters
Hello,
On my system, the estimated query cost has huge difference with the real query cost. For example, a simple select query, the estimated query cost is 100X bigger than real cost, I think I need to calibrate the cost model parameters below(I still use default value)
#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
I googled it, and only find some suggestions, for example set random_page_cost to 1 or 2 for SSD, however, there is no doc about how to calibrate the cost model parameters to make the estimated query cost close to the real query cost.
Thanks,
James
On Sun, 2022-05-29 at 15:29 +0000, jian xu wrote: > On my system, the estimated query cost has huge difference with the real query cost. > For example, a simple select query, the estimated query cost is 100X bigger than real cost, > I think I need to calibrate the cost model parameters below(I still use default value) > > #seq_page_cost = 1.0 # measured on an arbitrary scale > #random_page_cost = 4.0 # same scale as above > #cpu_tuple_cost = 0.01 # same scale as above > #cpu_index_tuple_cost = 0.005 # same scale as above > #cpu_operator_cost = 0.0025 # same scale as above > > I googled it, and only find some suggestions, for example set random_page_cost to 1 or 2 > for SSD, however, there is no doc about how to calibrate the cost model parameters to make > the estimated query cost close to the real query cost. That is as designed, see that "arbitrary scale" comment on "seq_page_cost". Sure, you could adjust all cost parameters so that 1 roughly corresponds to the number of milliseconds for the actual query, but there is little benefit in that. The numbers are only used to compare plans with each other. Moreover, settings like "enable_seqscan" and "enable_nestloop" can artificially distort the numbers. Yours, Laurenz Albe
Thanks Laurenz. It is ok that these values are "arbitrary scale", but we still need to calibrate them to get the correct "arbitrary scale". For example if seq_page_cost is 1 and cpu_tuple_cost is 0.01, how to verify the io seq cost is 100 times slower than cpu access cost?
"enable_seqscan" and "enable_nestloop" can work in some cases, but it doesn’t work with any cases.
Does anyone have experience to calibrate the cost model parameters to get the correct value? Thanks
James
From: Laurenz Albe
Sent: Monday, May 30, 2022 7:23 AM
To: jian xu; pgsql-admin@postgresql.org
Subject: Re: how to calibrate the cost model parameters
On Sun, 2022-05-29 at 15:29 +0000, jian xu wrote:
> On my system, the estimated query cost has huge difference with the real query cost.
> For example, a simple select query, the estimated query cost is 100X bigger than real cost,
> I think I need to calibrate the cost model parameters below(I still use default value)
>
> #seq_page_cost = 1.0 # measured on an arbitrary scale
> #random_page_cost = 4.0 # same scale as above
> #cpu_tuple_cost = 0.01 # same scale as above
> #cpu_index_tuple_cost = 0.005 # same scale as above
> #cpu_operator_cost = 0.0025 # same scale as above
>
> I googled it, and only find some suggestions, for example set random_page_cost to 1 or 2
> for SSD, however, there is no doc about how to calibrate the cost model parameters to make
> the estimated query cost close to the real query cost.
That is as designed, see that "arbitrary scale" comment on "seq_page_cost".
Sure, you could adjust all cost parameters so that 1 roughly corresponds to
the number of milliseconds for the actual query, but there is little benefit
in that.
The numbers are only used to compare plans with each other. Moreover, settings
like "enable_seqscan" and "enable_nestloop" can artificially distort the numbers.
Yours,
Laurenz Albe
Thanks Laurenz. It is ok that these values are "arbitrary scale", but we still need to calibrate them to get the correct "arbitrary scale". For example if seq_page_cost is 1 and cpu_tuple_cost is 0.01, how to verify the io seq cost is 100 times slower than cpu access cost?
"enable_seqscan" and "enable_nestloop" can work in some cases, but it doesn’t work with any cases.
Does anyone have experience to calibrate the cost model parameters to get the correct value? Thanks
On Mon, 2022-05-30 at 17:12 -0400, Jeff Janes wrote: > On Mon, May 30, 2022 at 3:40 PM jian xu <jamesxu@outlook.com> wrote: > > Thanks Laurenz. It is ok that these values are "arbitrary scale", but we still need > > to calibrate them to get the correct"arbitrary scale". For example if seq_page_cost > > is 1 and cpu_tuple_cost is 0.01, how to verify the io seq cost is 100 times slower than cpu access cost? > > "enable_seqscan" and "enable_nestloop" can work in some cases, but it doesn’t work with any cases. > > Does anyone have experience to calibrate the cost model parameters to get the correct value? Thanks > > I put a lot of time into it, and I think you will find that there is no consistent way to do so. > For cpu_tuple_cost, for example, are the hint bits already set? Do they need to get set? > Are the clog pages still in memory? Is there contention on the lock used to determine if a given > transaction is still running? How many columns does the table have? How many of them need to be > accessed for the current case, and how far left and right are they in the table and are they toasted? > > For seq reads, how much contention is there for the buffer mapping partition lock? What is the > throughput of your IO system? How many other seq reads will be occuring at the same time? Etc. > > For any change you intend to make, do you have a good enough load generator and test system set > up so you can test that it doesn't make something else worse? +1 I think that the values of those parameters are more set by experience and tradition than by measurable physical evidence. In a way, it just happens to work (mostly). For example, who says that on a spinning disk, random I/O is four times as slow as sequential I/O? Very likely, you have to factor in that part of the I/O requests are satisfied from the kernel page cache. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > I think that the values of those parameters are more set by experience and tradition than > by measurable physical evidence. In a way, it just happens to work (mostly). > For example, who says that on a spinning disk, random I/O is four times as slow as > sequential I/O? Very likely, you have to factor in that part of the I/O requests are > satisfied from the kernel page cache. The 4.0 value *is* based on some real evidence --- many years ago, I put in a lot of machine time to get an estimate of that for non-cached fetches on the hardware I was using at the time. If somebody did that over again today, they'd likely get a somewhat different value, even assuming they were still using spinning-rust storage. But it doesn't matter too much. A value around that is probably fine for rotating storage, while if you are using SSD it's likely better to use something around 1.0, and beyond that any inaccuracies in the cost constants are usually swamped by imperfections of the cost models and other data. For instance, it's not that useful to sweat about what the right value is for cpu_operator_cost when we've never spent any serious effort on assigning plausible procost factors to different functions. (Worse, the whole notion that a given function has a fixed cost independent of its input data is sadly inadequate.) regards, tom lane
Thanks everyone. The reason I asked this question is, my system has a complex query, which uses CTE, then join 7 other tables. It also has where clause using json data filter. Sometimes it runs slowly(5 minutes), but sometimes it runs fast(2 seconds), (query text is same , no parameter). I know it is because data distribution change. The table is updated frequently.
I compared the execution plan, the fast query joins other tables first, then join CTE, however, the slow question joins CTE first, then join other tables. I analyzed the tables, it didn’t work, I increased the column sampling from 100 to 10000 for all the columns used in where clause, didn’t work. I set the random io cost to 1, it didn’t work.
I am not able to modify the query. The only solution I find is
- use
join_collapse_limit
to force the join order, however, it has limitation, I need to modify the code to set it before submitting the query, and set it back to original value after query run, in case it didn’t change back, other query will affect. - Set both sequence and random io cost to 0.1, it can always generate correct plan.
So I suspect the cost model parameters are not fully optimized on my system, the fast and slow execution plan cost might be very close, so a little bit data change will cause the plan change, and generated a wrong plan.
Thanks
James
Sent from Mail for Windows
From: Tom Lane
Sent: Tuesday, May 31, 2022 10:44 AM
To: Laurenz Albe
Cc: Jeff Janes; jian xu; pgsql-admin@postgresql.org
Subject: Re: how to calibrate the cost model parameters
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I think that the values of those parameters are more set by experience and tradition than
> by measurable physical evidence. In a way, it just happens to work (mostly).
> For example, who says that on a spinning disk, random I/O is four times as slow as
> sequential I/O? Very likely, you have to factor in that part of the I/O requests are
> satisfied from the kernel page cache.
The 4.0 value *is* based on some real evidence --- many years ago, I put
in a lot of machine time to get an estimate of that for non-cached fetches
on the hardware I was using at the time. If somebody did that over again
today, they'd likely get a somewhat different value, even assuming they
were still using spinning-rust storage. But it doesn't matter too much.
A value around that is probably fine for rotating storage, while if you
are using SSD it's likely better to use something around 1.0, and beyond
that any inaccuracies in the cost constants are usually swamped by
imperfections of the cost models and other data. For instance, it's not
that useful to sweat about what the right value is for cpu_operator_cost
when we've never spent any serious effort on assigning plausible procost
factors to different functions. (Worse, the whole notion that a given
function has a fixed cost independent of its input data is sadly
inadequate.)
regards, tom lane
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:DengXian; panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:"\@DengXian"; panose-1:2 1 6 0 3 1 1 1 1 1;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}code {mso-style-priority:99; font-family:"Courier New";}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; margin-top:0in; margin-right:0in; margin-bottom:0in; margin-left:.5in; font-size:11.0pt; font-family:"Calibri",sans-serif;}.MsoChpDefault {mso-style-type:export-only;}div.WordSection1 {page:WordSection1;}ol {margin-bottom:0in;}ul {margin-bottom:0in;} Thanks everyone. The reason I asked this question is, my system has a complex query, which uses CTE, then join 7 other tables. It also has where clause using json data filter. Sometimes it runs slowly(5 minutes), but sometimes it runs fast(2 seconds), (query text is same , no parameter). I know it is because data distribution change. The table is updated frequently.
I compared the execution plan, the fast query joins other tables first, then join CTE, however, the slow question joins CTE first, then join other tables. I analyzed the tables, it didn’t work, I increased the column sampling from 100 to 10000 for all the columns used in where clause, didn’t work. I set the random io cost to 1, it didn’t work.
I am not able to modify the query. The only solution I find is
- use
join_collapse_limit
to force the join order, however, it has limitation, I need to modify the code to set it before submitting the query, and set it back to original value after query run, in case it didn’t change back, other query will affect.- Set both sequence and random io cost to 0.1, it can always generate correct plan.
So I suspect the cost model parameters are not fully optimized on my system, the fast and slow execution plan cost might be very close, so a little bit data change will cause the plan change, and generated a wrong plan.
Thanks
James
Sent from Mail for Windows
From: Tom Lane
Sent: Tuesday, May 31, 2022 10:44 AM
To: Laurenz Albe
Cc: Jeff Janes; jian xu; pgsql-admin@postgresql.org
Subject: Re: how to calibrate the cost model parameters
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I think that the values of those parameters are more set by experience and tradition than
> by measurable physical evidence. In a way, it just happens to work (mostly).
> For example, who says that on a spinning disk, random I/O is four times as slow as
> sequential I/O? Very likely, you have to factor in that part of the I/O requests are
> satisfied from the kernel page cache.
The 4.0 value *is* based on some real evidence --- many years ago, I put
in a lot of machine time to get an estimate of that for non-cached fetches
on the hardware I was using at the time. If somebody did that over again
today, they'd likely get a somewhat different value, even assuming they
were still using spinning-rust storage. But it doesn't matter too much.
A value around that is probably fine for rotating storage, while if you
are using SSD it's likely better to use something around 1.0, and beyond
that any inaccuracies in the cost constants are usually swamped by
imperfections of the cost models and other data. For instance, it's not
that useful to sweat about what the right value is for cpu_operator_cost
when we've never spent any serious effort on assigning plausible procost
factors to different functions. (Worse, the whole notion that a given
function has a fixed cost independent of its input data is sadly
inadequate.)
regards, tom lane
Angular momentum makes the world go 'round.
Thanks Ron. This is what we are doing right now. we have job to analyze tables every 2 hours. However, the update statistics jobs cause the execution plan change, and query runs slowly sometimes.
I think the problem is, the “good” and “bad” query execution plan cost are close, when data changes, analyze table will update the statistics, and make the plan cost change, due to the cost model parameters is not fully optimized, it picks the “bad” execution plan.
Thanks,
James
Sent from Mail for Windows
From: Ron
Sent: Thursday, June 2, 2022 2:17 AM
To: pgsql-admin@lists.postgresql.org
Subject: Re: how to calibrate the cost model parameters
Manually (aka through a cron script) ANALYZE those seven tables on a regular basis. During the business day, I ANALYZE some tables every two hours.
On 5/31/22 21:20, jian xu wrote:
Thanks everyone. The reason I asked this question is, my system has a complex query, which uses CTE, then join 7 other tables. It also has where clause using json data filter. Sometimes it runs slowly(5 minutes), but sometimes it runs fast(2 seconds), (query text is same , no parameter). I know it is because data distribution change. The table is updated frequently.
I compared the execution plan, the fast query joins other tables first, then join CTE, however, the slow question joins CTE first, then join other tables. I analyzed the tables, it didn’t work, I increased the column sampling from 100 to 10000 for all the columns used in where clause, didn’t work. I set the random io cost to 1, it didn’t work.
I am not able to modify the query. The only solution I find is
- use
join_collapse_limit
to force the join order, however, it has limitation, I need to modify the code to set it before submitting the query, and set it back to original value after query run, in case it didn’t change back, other query will affect.- Set both sequence and random io cost to 0.1, it can always generate correct plan.
So I suspect the cost model parameters are not fully optimized on my system, the fast and slow execution plan cost might be very close, so a little bit data change will cause the plan change, and generated a wrong plan.
Thanks
James
Sent from Mail for Windows
From: Tom Lane
Sent: Tuesday, May 31, 2022 10:44 AM
To: Laurenz Albe
Cc: Jeff Janes; jian xu; pgsql-admin@postgresql.org
Subject: Re: how to calibrate the cost model parameters
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I think that the values of those parameters are more set by experience and tradition than
> by measurable physical evidence. In a way, it just happens to work (mostly).
> For example, who says that on a spinning disk, random I/O is four times as slow as
> sequential I/O? Very likely, you have to factor in that part of the I/O requests are
> satisfied from the kernel page cache.
The 4.0 value *is* based on some real evidence --- many years ago, I put
in a lot of machine time to get an estimate of that for non-cached fetches
on the hardware I was using at the time. If somebody did that over again
today, they'd likely get a somewhat different value, even assuming they
were still using spinning-rust storage. But it doesn't matter too much.
A value around that is probably fine for rotating storage, while if you
are using SSD it's likely better to use something around 1.0, and beyond
that any inaccuracies in the cost constants are usually swamped by
imperfections of the cost models and other data. For instance, it's not
that useful to sweat about what the right value is for cpu_operator_cost
when we've never spent any serious effort on assigning plausible procost
factors to different functions. (Worse, the whole notion that a given
function has a fixed cost independent of its input data is sadly
inadequate.)
regards, tom lane
--
Angular momentum makes the world go 'round.