Hi,
On 2017-05-05 14:24:07 -0600, Mathieu Fenniak wrote:
> The stalls occur unpredictably on my production system, but generally seem
> to be correlated with schema operations. My source database has about
> 100,000 tables; it's a one-schema-per-tenant multi-tenant SaaS system.
I'm unfortunately not entirely surprised you're seeing some issues in
that case. We're invalidating internal caches a bit bit
overjudiciously, and that invalidation is triggered by schema changes.
> I've performed a CPU sampling with the OSX `sample` tool based upon
> reproduction approach #1:
> https://gist.github.com/mfenniak/366d7ed19b2d804f41180572dc1600d8 It
> appears that most of the time is spent in the
> RelfilenodeMapInvalidateCallback and CatalogCacheIdInvalidate cache
> invalidation callbacks, both of which appear to be invalidating caches
> based upon the cache value.
I think optimizing those has some value (and I see Tom is looking at
that aspect, but the bigger thing would probably be to do fewer lookups.
> Has anyone else run into this kind of performance problem? Any thoughts on
> how it might be resolved? I don't mind putting in the work if someone
> could describe what is happening here, and have a discussion with me about
> what kind of changes might be necessary to improve the performance.
If you could provide an easily runnable sql script that reproduces the
issue, I'll have a look. I think I have a rough idea what to do.
Greetings,
Andres Freund