Re: anti-join chosen even when slower than old plan

Поиск
Список
Период
Сортировка
От Bob Lunney
Тема Re: anti-join chosen even when slower than old plan
Дата
Msg-id 926836.80178.qm@web39707.mail.mud.yahoo.com
обсуждение исходный текст
Ответ на Re: anti-join chosen even when slower than old plan  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
Список pgsql-performance
--- On Thu, 11/11/10, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote:

> From: Mladen Gogala <mladen.gogala@vmsinfo.com>
> Subject: Re: [PERFORM] anti-join chosen even when slower than old plan
> To: "Kenneth Marshall" <ktm@rice.edu>
> Cc: "Robert Haas" <robertmhaas@gmail.com>, "Tom Lane" <tgl@sss.pgh.pa.us>, "Kevin Grittner"
<Kevin.Grittner@wicourts.gov>,"pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org> 
> Date: Thursday, November 11, 2010, 9:15 AM
> Kenneth Marshall wrote:
> > I agree with the goal of avoiding the need for a GUC.
> This needs to
> > be as automatic as possible. One idea I had had was
> computing a value
> > for the amount of cache data in the system by keeping
> a sum or a
> > weighted sum of the table usage in the system. Smaller
> tables and
> > indexes would contribute a smaller amount to the
> total, while larger
> > indexes and tables would contribute a larger amount.
> Then by comparing
> > this running total to the effective_cache_size, set
> the random and
> > sequential costs for a query. This would allow the
> case of many 4MB
> > tables to favor disk I/O more than memory I/O. The
> weighting could
> > be a function of simultaneous users of the table. I
> know this is a
> > bit of hand-waving but some sort of dynamic feedback
> needs to be
> > provided to the planning process as system use
> increases.
> >
> > Regards,
> > Ken
> >
> >   
> Kenneth, you seem to be only concerned with the accuracy of
> the planning process, not with the plan stability. As a DBA
> who has to monitor real world applications, I find things
> like an execution plan changing with the use of the system
> to be my worst nightmare. The part where you say that "this
> needs to be as automatic as possible" probably means that I
> will not be able to do anything about it, if the optimizer,
> by any chance, doesn't get it right. That looks to me like
> an entirely wrong way to go.
> When application developer tunes the SQL both him and me
> expect that SQL to always perform that way, not to change
> the execution plan because the system is utilized more than
> it was 1 hour ago. Nobody seems to have taken my suggestion
> about having a parameter
> which would simply "invent" the percentage out of thin air
> seriously, because it's obviously not accurate.
> However, the planner accuracy is not the only concern.
> Running applications on the system usually requires plan
> stability. Means of
> external control of the execution plan, DBA knobs and
> buttons that can be turned and pushed to produce the desired
> plan are also very much desired.
>
> -- Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>

Mladen,

Been there, done that with Oracle for more years than I care to remember or admit.  Having the necessary knobs was both
dauntingand a godsend, depending on if you could find the right one(s) to frob during production use, and you turned
themthe right way and amount.  I personally find having less knobbage with PostgreSQL to be a huge benefit over Oracle.
In that spirit, I offer the following suggestion: (Ken's original suggestion inspired me, so if I misunderstand it,
Ken,please correct me.) 

What if the code that managed the shared buffer cache kept track of how many buffers were in the cache for each table
andindex?  Then the optimizer could know the ratio of cached to non-cached table of index buffers (how many pages are
inPG's buffer cache vs. the total number of pages required for the entire table, assuming autovacuum is working well)
andplan accordingly.  It would even be possible to skew the estimate based on the ratio of shared_buffers to
effective_cache_size. The optimizer could then dynamically aadjust the random and sequential costs per query prior to
planning,with (hopefully) plans optimized to the current condition of the server and host caches just prior to
execution.

There are lots of assumptions here, the primary ones being the shared buffer cache's state doesn't change significantly
betweenthe start of planning and actual execution time, and the host is dedicated to running the database and nothing
elsethat would trash the host's file system cache.  I admit that I haven't looked at the code for this yet, so I don't
knowif I'm on to something or off in the weeds. 

Regards,

Bob Lunney






В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: anti-join chosen even when slower than old plan
Следующее
От: Mladen Gogala
Дата:
Сообщение: Re: anti-join chosen even when slower than old plan