Re: [HACKERS] POC: Sharing record typmods between backends

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: [HACKERS] POC: Sharing record typmods between backends
Дата	4 сентября 2017 г. 09:14:39
Msg-id	CAEepm=2Vs5iR4MO4LWe3Ap0jiQngoT3jrSAGPV3BiGd3i3n6ig@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] POC: Sharing record typmods between backends (Andres Freund <andres@anarazel.de>)
Ответы	Re: [HACKERS] POC: Sharing record typmods between backends
Список	pgsql-hackers

Дерево обсуждения

Thanks for the review and commits so far.  Here's a rebased, debugged
and pgindented version of the remaining patches.  I ran pgindent with

--list-of-typedefs="SharedRecordTableKey,SharedRecordTableEntry,SharedTypmodTableEntry,SharedRecordTypmodRegistry,Session"
to fix some weirdness around these new typenames.

While rebasing the 0002 patch (removal of tqueue.c's remapping logic),
I modified the interface of the newly added
ExecParallelCreateReaders() function from commit 51daa7bd because it
no longer has any reason to take a TupleDesc.

On Fri, Aug 25, 2017 at 1:46 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-08-21 11:02:52 +1200, Thomas Munro wrote:
>> 2.  Andres didn't like what I did to DecrTupleDescRefCount, namely
>> allowing to run when there is no ResourceOwner.  I now see that this
>> is probably an indication of a different problem; even if there were a
>> worker ResourceOwner as he suggested (or perhaps a session-scoped one,
>> which a worker would reset before being reused), it wouldn't be the
>> one that was active when the TupleDesc was created.  I think I have
>> failed to understand the contracts here and will think/read about it
>> some more.
>
> Maybe I'm missing something, but isn't the issue here that using
> DecrTupleDescRefCount() simply is wrong, because we're not actually
> necessarily tracking the TupleDesc via the resowner mechanism?

Yeah.  Thanks.

> If you look at the code, in the case it's a previously unknown tupledesc
> it's registered with:
>
>         entDesc = CreateTupleDescCopy(tupDesc);
> ...
>         /* mark it as a reference-counted tupdesc */
>         entDesc->tdrefcount = 1;
> ...
>         RecordCacheArray[newtypmod] = entDesc;
> ...
>
> Note that there's no PinTupleDesc(), IncrTupleDescRefCount() or
> ResourceOwnerRememberTupleDesc() managing the reference from the
> array. Nor was there one before.
>
> We have other code managing TupleDesc lifetimes similarly, and look at
> how they're freeing it:
>                 /* Delete tupdesc if we have it */
>                 if (typentry->tupDesc != NULL)
>                 {
>                         /*
>                          * Release our refcount, and free the tupdesc if none remain.
>                          * (Can't use DecrTupleDescRefCount because this reference is not
>                          * logged in current resource owner.)
>                          */
>                         Assert(typentry->tupDesc->tdrefcount > 0);
>                         if (--typentry->tupDesc->tdrefcount == 0)
>                                 FreeTupleDesc(typentry->tupDesc);
>                         typentry->tupDesc = NULL;
>                 }

Right.  I have changed shared_record_typmod_registry_worker_detach()
to be more like that, with an explanation.

> This also made me think about how we're managing the lookup from the
> shared array:
>
>                                         /*
>                                          * Our local array can now point directly to the TupleDesc
>                                          * in shared memory.
>                                          */
>                                         RecordCacheArray[typmod] = tupdesc;
>
> Uhm. Isn't that highly highly problematic? E.g. tdrefcount manipulations
> which are done by all lookups (cf. lookup_rowtype_tupdesc()) would in
> that case manipulate shared memory in a concurrency unsafe manner.

No.  See this change, in that and similar code paths:

-       IncrTupleDescRefCount(tupDesc);
+       PinTupleDesc(tupDesc);

The difference between IncrTupleDescRefCount() and PinTupleDesc() is
that the latter recognises non-refcounted tuple descriptors
(tdrefcount == -1) and does nothing.  Shared tuple descriptors are not
reference counted (see TupleDescCopy() which initialises
dst->tdrefcount to -1).  It was for foolish symmetry that I was trying
to use ReleaseTupleDesc() in shared_record_typmod_registry_detach()
before, since it also knows about non-refcounted tuple descriptors,
but that's not appropriate: it calls DecrTupleDescRefCount() which
assumes that we're using resource owners.  We're not.

To summarise the object lifetime management situation created by this
patch: shared TupleDesc objects accumulate in per-session DSM memory
until eventually the session ends and the DSM memory goes away.  A bit
like CacheMemoryContext: there is no retail cleanup of shared
TupleDesc objects.  BUT: the DSM detach callback is used to clear out
backend-local pointers to that stuff (and any non-shared reference
counted TupleDesc objects that might be found), in anticipation of
being able to reuse a worker process one day (which will involve
attaching to a new session, so we mustn't retain any traces of the
previous session in our local state).  Maybe I'm trying to be a little
too clairvoyant there...

I improved the cleanup code: now it frees RecordCacheArray and
RecordCacheHash and reinstalls NULL pointers.  Also it deals with
errors in GetSessionDsmHandle() better.

I renamed the members of Session to include a "shared_" prefix, which
seems a bit clearer.

I refactored it so that it never makes needless local copies of
TupleDesc objects (previously assign_record_type_typmod() would create
an extra local copy and cache that, which was wasteful).  That
actually makes much of the discussion above moot: on detach, a worker
should now ONLY find shared non-refcounted TupleDesc objects in the
local caches, so the FreeTupleDesc() case is unreachable...

The leader on the other hand can finish up with a mixture of local and
shared TupleDesc objects in its cache, if it had some before it ran a
parallel query.  Its detach hook doesn't try to free those so it
doesn't matter.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

shared-record-typmods-v10.patchset.tgz

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] POC: Sharing record typmods between backends

Вложения