Overriding the optimizer
| От | Craig A. James |
|---|---|
| Тема | Overriding the optimizer |
| Дата | |
| Msg-id | 43A1F6DB.3080805@modgraph-usa.com обсуждение исходный текст |
| Ответ на | Re: Simple Join (Mark Kirkwood <markir@paradise.net.nz>) |
| Ответы |
Re: Overriding the optimizer
Re: Overriding the optimizer Re: Overriding the optimizer Re: Overriding the optimizer |
| Список | pgsql-performance |
I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain
plans,and received a fairly resounding "No". The general feeling I get is that a lot of work has gone into the
optimizer,and by God we're going to use it!
I think this is just wrong, and I'm curious whether I'm alone in this opinion.
Over and over, I see questions posted to this mailing list about execution plans that don't work out well. Many times
thereare good answers - add an index, refactor the design, etc. - that yield good results. But, all too often the
answercomes down to something like this recent one:
> Right on. Some of these "coerced" plans may perform
> much better. If so, we can look at tweaking your runtime
> config: e.g.
>
> effective_cache_size
> random_page_cost
> default_statistics_target
>
> to see if said plans can be chosen "naturally".
I see this over and over. Tweak the parameters to "force" a certain plan, because there's no formal way for a
developerto say, "I know the best plan."
There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a
developercan possibly know. Here's a real-life example that caused me major headaches. It's a trivial query, but
Postgrestotally blows it:
select * from my_table
where row_num >= 50000 and row_num < 100000
and myfunc(foo, bar);
How can Postgres possibly know what "myfunc()" does? In this example, my_table is about 10 million rows and row_num is
indexed. When the row_num range is less than about 30,000, Postgres (correctly) uses an row_num index scan, then
filtersby myfunc(). But beyond that, it chooses a sequential scan, filtering by myfunc(). This is just wrong.
Postgrescan't possibly know that myfunc() is VERY expensive. The correct plan would be to switch from index to
filteringon row_num. Even if 99% of the database is selected by row_num, it should STILL at least filter by row_num
first,and only filter by myfunc() as the very last step.
How can a database with no ability to override a plan possibly cope with this?
Without the explicit ability to override the plan Postgres generates, these problems dominate our development efforts.
Postgresdoes an excellent job optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to get
right. We spend huge amounts of time on trial-and-error queries, second guessing Postgress, creating unnecessary
temporarytables, sticking in the occasional OFFSET in a subquery to prevent merging layers, and so forth.
This same application also runs on Oracle, and although I've cursed Oracle's stupid planner many times, at least I can
forceit to do it right if I need to.
The danger of forced plans is that inexperienced developers tend to abuse them. So it goes -- the documentation should
beclear that forced plans are always a last resort.
But there's no getting around the fact that Postgres needs a way for a developer to specify the execution plan.
Craig
В списке pgsql-performance по дате отправления: