RE: how to calibrate the cost model parameters

Поиск
Список
Период
Сортировка
От jian xu
Тема RE: how to calibrate the cost model parameters
Дата
Msg-id BL0PR02MB4561653CDBBBC250097DB2BFA1DF9@BL0PR02MB4561.namprd02.prod.outlook.com
обсуждение исходный текст
Ответ на Re: how to calibrate the cost model parameters  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: how to calibrate the cost model parameters  (Ron <ronljohnsonjr@gmail.com>)
Список pgsql-admin

Thanks everyone. The reason I asked this question is, my system has a complex query, which uses CTE, then join 7 other tables. It also has where clause using json data filter. Sometimes it runs slowly(5 minutes), but sometimes it runs fast(2 seconds), (query text is same , no parameter). I know it is because data distribution change. The table is  updated frequently.

I compared the execution plan, the fast query joins other tables first, then join CTE, however, the slow question joins CTE first, then join other tables. I analyzed the tables, it didn’t work, I increased the column sampling from 100 to 10000 for all the columns used in where clause, didn’t work. I set the random io cost to 1, it didn’t work.

I am not able to modify the query. The only solution I find is

  1. use join_collapse_limit to force the join order, however, it has limitation, I need to modify the code to set it before submitting the query, and set it back to original value after query run, in case it didn’t change back, other query will affect.
  2. Set both sequence and random io cost to 0.1, it can always generate correct plan.

So I suspect the cost model parameters are not fully optimized on my system, the fast and slow execution plan cost might be very close, so a little bit data change will cause the plan change, and generated a wrong plan.

Thanks

James

 

Sent from Mail for Windows

 

From: Tom Lane
Sent: Tuesday, May 31, 2022 10:44 AM
To: Laurenz Albe
Cc: Jeff Janes; jian xu; pgsql-admin@postgresql.org
Subject: Re: how to calibrate the cost model parameters

 

Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I think that the values of those parameters are more set by experience and tradition than
> by measurable physical evidence.  In a way, it just happens to work (mostly).
> For example, who says that on a spinning disk, random I/O is four times as slow as
> sequential I/O?  Very likely, you have to factor in that part of the I/O requests are
> satisfied from the kernel page cache.

The 4.0 value *is* based on some real evidence --- many years ago, I put
in a lot of machine time to get an estimate of that for non-cached fetches
on the hardware I was using at the time.  If somebody did that over again
today, they'd likely get a somewhat different value, even assuming they
were still using spinning-rust storage.  But it doesn't matter too much.
A value around that is probably fine for rotating storage, while if you
are using SSD it's likely better to use something around 1.0, and beyond
that any inaccuracies in the cost constants are usually swamped by
imperfections of the cost models and other data.  For instance, it's not
that useful to sweat about what the right value is for cpu_operator_cost
when we've never spent any serious effort on assigning plausible procost
factors to different functions.  (Worse, the whole notion that a given
function has a fixed cost independent of its input data is sadly
inadequate.)

                        regards, tom lane

 

В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Casting json (or jsonb) to real
Следующее
От: Wells Oliver
Дата:
Сообщение: Re: Casting json (or jsonb) to real