RE: Protect syscache from bloating with negative cache entries

Поиск
Список
Период
Сортировка
От Tsunakawa, Takayuki
Тема RE: Protect syscache from bloating with negative cache entries
Дата
Msg-id 0A3221C70F24FB45833433255569204D1FB9723F@G01JPEXMBYT05
обсуждение исходный текст
Ответ на Re: Protect syscache from bloating with negative cache entries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Protect syscache from bloating with negative cache entries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> I'm not sure what you mean by "necessary" and "unnecessary" here. What
> matters is how often an entry is accessed - if it's accessed often, it makes sense
> to keep it in the cache. Otherwise evict it. Entries not accessed for 5 minutes are
> clearly not accessed very often, so and getting rid of them will not hurt the
> cache hit ratio very much.

Right, "necessary" and "unnecessary" were imprecise, and it matters how frequent the entries are accessed.  What made
mesay "unnecessary" is the pg_statistic entry left by CREATE/DROP TEMP TABLE which is never accessed again.
 


> So I agree with Robert a time-based approach should work well here. It does
> not have the issues with setting exact syscache size limit, it's kinda self-adaptive
> etc.
> 
> In a way, this is exactly what the 5 minute rule [1] says about caching.
> 
> [1] http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf

Then, can we just set 5min to syscache_prune_min_age?  Otherwise, how can users set the expiration period?


> > The idea of expiration applies to the case where we want possibly
> > stale entries to vanish and load newer data upon the next access.
> > For example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP.
> > Is the catcache based on the same idea with them?  No.
> >
> 
> I'm not sure what has this to do with those other databases.

I meant that the time-based eviction is not very good, because it could cause less frequently entries to vanish even
whenmemory is not short.  Time-based eviction reminds me of Memcached, Redis, DNS, etc. that evicts long-lived entries
toavoid stale data, not to free space for other entries.  I think size-based eviction is sufficient like
shared_buffers,OS page cache, CPU cache, disk cache, etc.
 


> I'm certainly worried about the performance aspect of it. The syscache is in a
> plenty of hot paths, so adding overhead may have significant impact. But that
> depends on how complex the eviction criteria will be.

The LRU chain manipulation, dlist_move_head() in SearchCatCacheInternal(), may certainly incur some overhead.  If it
hasvisible impact, then we can do the manipulation only when the user set an upper limit on the cache size.
 

> And then there may be cases conflicting with the criteria, i.e. running into
> just-evicted entries much more often. This is the issue with the initially
> proposed hard limits on cache sizes, where it'd be trivial to under-size it just a
> little bit.

In that case, the user can just enlarge the catcache.


> Not sure which mail you're referring to - this seems to be the first e-mail from
> you in this thread (per our archives).

Sorry, MauMau is me, Takayuki Tsunakawa.


> I personally don't find explicit limit on cache size very attractive, because it's
> rather low-level and difficult to tune, and very easy to get it wrong (at which
> point you fall from a cliff). All the information is in backend private memory, so
> how would you even identify syscache is the thing you need to tune, or how
> would you determine the correct size?

Just like other caches, we can present a view that shows the hits, misses, and the hit ratio of the entire catcaches.
Ifthe hit ratio is low, the user can enlarge the catcache size.  That's what Oracle and MySQL do as I referred to in
thisthread.  The tuning parameter is the size.  That's all.  Besides, the v13 patch has as many as 4 parameters:
cache_memory_target,cache_prune_min_age, cache_entry_limit, cache_entry_limit_prune_ratio.  I don't think I can give
theuser good intuitive advice on how to tune these.
 


> > https://en.wikipedia.org/wiki/Cache_(computing)
> >
> > "To be cost-effective and to enable efficient use of data, caches must
> > be relatively small."
> >
> 
> Relatively small compared to what? It's also a question of how expensive cache
> misses are.

I guess the author meant that the cache is "relatively small" compared to the underlying storage: CPU cache is smaller
thanDRAM, DRAM is smaller than SSD/HDD.  In our case, we have to pay more attention to limit the catcache memory
consumption,especially because they are duplicated in multiple backend processes.
 


> I don't know, but that does not seem very attractive. Each memory context has
> some overhead, and it does not solve the issue of never releasing memory to
> the OS. So we'd still have to rebuild the contexts at some point, I'm afraid.

I think there is little additional overhead on each catcache access -- processing overhead is the same as when using
aset,and the memory overhead is as much as several dozens (which is the number of catcaches) of MemoryContext
structure. The slab context (slab.c) returns empty blocks to OS unlike the allocation context (aset.c).
 


Regards
Takayuki Tsunakawa


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: pg11.1: dsa_area could not attach to segment
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: proposal: variadic argument support for least, greatest function