Re: Protect syscache from bloating with negative cache entries

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Protect syscache from bloating with negative cache entries
Дата
Msg-id 2365.1547577156@sss.pgh.pa.us
обсуждение исходный текст
Ответ на RE: Protect syscache from bloating with negative cache entries  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Ответы Re: Protect syscache from bloating with negative cache entries  ("andres@anarazel.de" <andres@anarazel.de>)
Re: Protect syscache from bloating with negative cache entries  (Kenneth Marshall <ktm@rice.edu>)
RE: Protect syscache from bloating with negative cache entries  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Список pgsql-hackers
"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> But the syscache/relcache bloat still remains a problem, when there are many live tables and application connections.
Would you agree to solve this in some way?  I thought Horiguchi-san's latest patches would solve this and the negative
entries. Can we consider that his patch and yours are orthogonal, i.e., we can pursue Horiguchi-san's patch after yours
iscommitted? 

Certainly, what I've done here doesn't preclude adding some wider solution to
the issue of extremely large catcaches.  I think it takes the pressure off
for one rather narrow problem case, and the mechanism could be used to fix
other ones.  But if you've got an application that just plain accesses a
huge number of objects, this isn't going to make your life better.

> (As you said, some parts of Horiguchi-san's patches may be made simpler.  For example, the ability to change another
session'sGUC variable can be discussed in a separate thread.) 

Yeah, that idea seems just bad from here ...

> I think we need some limit to the size of the relcache, syscache, and plancache.  Oracle and MySQL both have it,
usingLRU to evict less frequently used entries.  You seem to be concerned about the LRU management based on your
experience,but would it really cost so much as long as each postgres process can change the LRU list without
coordinationwith other backends now?  Could you share your experience? 

Well, we *had* an LRU mechanism for the catcaches way back when.  We got
rid of it --- see commit 8b9bc234a --- because (a) maintaining the LRU
info was expensive and (b) performance fell off a cliff in scenarios where
the cache size limit was exceeded.  You could probably find some more info
about that by scanning the mail list archives from around the time of that
commit, but I'm too lazy to do so right now.

That was a dozen years ago, and it's possible that machine performance
has moved so much since then that the problems are gone or mitigated.
In particular I'm sure that any limit we would want to impose today will
be far more than the 5000-entries-across-all-caches limit that was in use
back then.  But I'm not convinced that a workload that would create 100K
cache entries in the first place wouldn't have severe problems if you
tried to constrain it to use only 80K entries.  I fear it's just wishful
thinking to imagine that the behavior of a larger cache won't be just
like a smaller one.  Also, IIRC some of the problem with the LRU code
was that it resulted in lots of touches of unrelated data, leading to
CPU cache miss problems.  It's hard to see how that doesn't get even
worse with a bigger cache.

As far as the relcache goes, we've never had a limit on that, but there
are enough routine causes of relcache flushes --- autovacuum for instance
--- that I'm not really convinced relcache bloat can be a big problem in
production.

The plancache has never had a limit either, which is a design choice that
was strongly influenced by our experience with catcaches.  Again, I'm
concerned about the costs of adding a management layer, and the likelihood
that cache flushes will simply remove entries we'll soon have to rebuild.

> FYI, Oracle provides one parameter, shared_pool_size, that determine the
> size of a memory area that contains SQL plans and various dictionary
> objects.  Oracle decides how to divide the area among constituents.  So
> it could be possible that one component (e.g. table/index metadata) is
> short of space, and another (e.g. SQL plans) has free space.  Oracle
> provides a system view to see the free space and hit/miss of each
> component.  If one component suffers from memory shortage, the user
> increases shared_pool_size.  This is similar to what Horiguchi-san is
> proposing.

Oracle seldom impresses me as having designs we ought to follow.
They have a well-earned reputation for requiring a lot of expertise to
operate, which is not the direction this project should be going in.
In particular, I don't want to "solve" cache size issues by exposing
a bunch of knobs that most users won't know how to twiddle.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mitar
Дата:
Сообщение: Re: [PATCH] Allow UNLISTEN during recovery
Следующее
От: "andres@anarazel.de"
Дата:
Сообщение: Re: Protect syscache from bloating with negative cache entries