Re: hyrax vs. RelationBuildPartitionDesc

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: hyrax vs. RelationBuildPartitionDesc
Дата
Msg-id CA+Tgmob-cska+-WUC7T-G4zkSJp7yum6M_bzYd4YFzwQ51qiow@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hyrax vs. RelationBuildPartitionDesc  (Andres Freund <andres@anarazel.de>)
Ответы Re: hyrax vs. RelationBuildPartitionDesc  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, May 1, 2019 at 11:59 AM Andres Freund <andres@anarazel.de> wrote:
> The message I'm replying to is marked as an open item. Robert, what do
> you think needs to be done here before release?  Could you summarize,
> so we then can see what others think?

The original issue on this thread was that hyrax started running out
of memory when it hadn't been doing so previously.  That happened
because, for complicated reasons, commit
898e5e3290a72d288923260143930fb32036c00c resulted in the leak being
hit lots of times in CLOBBER_CACHE_ALWAYS builds instead of just once.
Commits 2455ab48844c90419714e27eafd235a85de23232 and
d3f48dfae42f9655425d1f58f396e495c7fb7812 fixed that problem.

In the email at issue, Tom complains about two things.  First, he
complains about the fact that I try to arrange things so that relcache
data remains valid for as long as required instead of just copying it.
Second, he complains about the fact repeated ATTACH and DETACH
PARTITION operations can produce a slow session-lifespan memory leak.

I think it's reasonable to fix the second issue for v12.  I am not
sure how important it is, because (1) the leak is small, (2) it seems
unlikely that anyone would execute enough ATTACH/DETACH PARTITION
operations in one backend for the leak to amount to anything
significant, and (3) if a relcache flush ever happens when you don't
have the relation open, all of the "leaked" memory will be un-leaked.
However, I believe that we could fix it as follows.  First, invent
rd_pdoldcxt and put the first old context there; if that pointer is
already in use, then parent the new context under the old one.
Second, in RelationDecrementReferenceCount, if the refcount hits 0,
nuke rd_pdoldcxt and set the pointer back to NULL.  With that change,
you would only keep the old PartitionDesc around until the ref count
hits 0, whereas at present, you keep the old PartitionDesc around
until an invalidation happens while the ref count is 0.

I think the first issue is not v12 material.  Tom proposed to fix it
by copying all the relevant data out of the relcache, but his own
performance results show a pretty significant hit, and AFAICS he
hasn't pointed to anything that's actually broken by the current
approach.  What I think should be done is actually generalize the
approach I took in this patch, so that instead of the partition
directory holding a reference count, the planner or executor holds
one.  Then not only could people who want information about the
PartitionDesc avoid copying stuff from the relcache, but maybe other
things as well.  I think that would be significantly better than
continuing to double-down on the current copy-everything approach,
which really only works well in a world where a table can't have all
that much metadata, which is clearly not true for PostgreSQL any more.
I'm not sure that Tom is actually opposed to this approach -- although
I may have misunderstood his position -- but where we disagree, I
think, is that I see what I did in this commit as a stepping-stone
toward a better world, and he sees it as something that should be
killed with fire until that better world has fully arrived.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: REINDEX INDEX results in a crash for an index of pg_class since9.6
Следующее
От: Robert Haas
Дата:
Сообщение: Re: pgsql: Compute XID horizon for page level index vacuum on primary.