Re: How does PG know if data is in memory?

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: How does PG know if data is in memory?
Дата
Msg-id 4CA9488C.2080502@2ndquadrant.com
обсуждение исходный текст
Ответ на How does PG know if data is in memory?  (Fabrício dos Anjos Silva<fabricio.silva@linkcom.com.br>)
Ответы Re: How does PG know if data is in memory?  (Jeremy Harris <jgh@wizmail.org>)
Список pgsql-performance
Fabrício dos Anjos Silva wrote:
> After reading lots of documentation, I still don't understand fully
> how PG knows if some needed data is in memory or in second storage.
> While choosing the best query plan, the optimizer must take this into
> account. Does PG consider this? If so, how does it know?

There are actually two different questions here, and I'm not sure both
have been completely clarified for you by the discussion yet.

PostgreSQL has its own dedicated pool of memory, sized by
shared_buffers.  When you request a page, if it's already in memory, you
get pointed toward that copy and no physical I/O happens.  If it's not,
PostgreSQL asks the OS to read the page.  It's possible that will return
a page that's in the OS cache; the database currently has no idea when
this does or doesn't happen though.  The hit % counters in the database
only reflect shared_buffers hits, not OS ones.  Some work to integrate
the OS cache information into the database has been done, the current
leading project in that area is pgfincore:
http://pgfoundry.org/projects/pgfincore/

However, none of this information is considered at all by the query
optimizer.  It makes plans without any knowledge of what is or isn't in
RAM right now, either the dedicated database memory or the OS cache.
Only the ratios of the planner constants are really considered.  You can
set those on a query by query basis to provide subtle hints when you
know something the planner doesn't, but you have to be very careful
about doing that as those plans tend to get obsolete eventually when you
do that trick.

I had a brain-storming session on this subject with a few of the hackers
in the community in this area a while back I haven't had a chance to do
something with yet (it exists only as a pile of scribbled notes so
far).  There's a couple of ways to collect data on what's in the
database and OS cache, and a couple of ways to then expose that data to
the optimizer.  But that needs to be done very carefully, almost
certainly as only a manual process at first, because something that's
producing cache feedback all of the time will cause plans to change all
the time, too.  Where I suspect this is going is that we may end up
tracking various statistics over time, then periodically providing a way
to export a mass of "typical % cached" data back to the optimizer for
use in plan cost estimation purposes.  But the idea of monitoring
continuously and always planning based on the most recent data available
has some stability issues, both from a "too many unpredictable plan
changes" and a "bad short-term feedback loop" perspective, as mentioned
by Tom and Kevin already.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: How does PG know if data is in memory?
Следующее
От: Greg Smith
Дата:
Сообщение: Re: How does PG know if data is in memory?