Re: [HACKERS] POC: Sharing record typmods between backends

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: [HACKERS] POC: Sharing record typmods between backends
Дата
Msg-id CAEepm=0H176UhBedyS455ABJvQFeB+qA4WOAPP=D2oc9k8pqKQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] POC: Sharing record typmods between backends  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Will respond to the actionable code review points separately with a
new patch set, but first:

On Wed, Aug 16, 2017 at 10:06 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-08-15 17:44:55 +1200, Thomas Munro wrote:
>> > @@ -99,12 +72,9 @@ CreateTemplateTupleDesc(int natts, bool hasoid)
>> >
>> >  /*
>> >   * CreateTupleDesc
>> > - *             This function allocates a new TupleDesc pointing to a given
>> > + *             This function allocates a new TupleDesc by copying a given
>> >   *             Form_pg_attribute array.
>> >   *
>> > - * Note: if the TupleDesc is ever freed, the Form_pg_attribute array
>> > - * will not be freed thereby.
>> > - *
>> >
>> > I'm leaning towards no, but you could argue that we should just change
>> > that remark to be about constr?
>>
>> I don't see why.
>
> Because for that the freeing bit is still true, ie. it's still
> separately allocated.

It's true of struct tupleDesc in general but not true of objects
returned by this function in respect of the arguments to the function.
In master, that comment is a useful warning that the object will hold
onto but never free the attrs array you pass in.  The same doesn't
apply to constr so I don't think we need to say anything.

>> > Review of 0003:
>> >
>> > I'm not doing a too detailed review, given I think there's some changes
>> > in the pipeline.
>>
>> Yep.  In the new patch set the hash table formerly known as DHT is now
>> in patch 0004 and I made the following changes based on your feedback:
>>
>> 1.  Renamed it to "dshash".  The files are named dshash.{c,h}, and the
>> prefix on identifiers is dshash_.  You suggested dsmhash, but the "m"
>> didn't seem to make much sense.  I considered dsahash, but dshash
>> seemed better.  Thoughts?
>
> WFM.  Just curious, why didn't m make sense? I was referring to dynamic
> shared memory hash - seems right. Whether there's an intermediary dsa
> layer or not...

I think of DSA as a defining characteristic that dshash exists to work
with (it's baked into dshash's API), but DSM as an implementation
detail which dshash doesn't directly depend on.  Therefore I don't
like the "m".

I speculate that in future we might have build modes where DSA doesn't
use DSM anyway: it could use native pointers and maybe even a
different allocator in a build that either uses threads or
non-portable tricks to carve out a huge amount of virtual address
space so that it can map memory in at the same location in each
backend.  In that universe DSA would still be providing the service of
grouping allocations together into a scope for "rip cord" cleanup
(possibly by forwarding to MemoryContext stuff) but otherwise compile
away to nearly nothing.

>> > +static int32
>> > +find_or_allocate_shared_record_typmod(TupleDesc tupdesc)
>> > +{
>> >
>> > +       /*
>> > +        * While we still hold the atts_index entry locked, add this to
>> > +        * typmod_index.  That's important because we don't want anyone to be able
>> > +        * to find a typmod via the former that can't yet be looked up in the
>> > +        * latter.
>> > +        */
>> > +       PG_TRY();
>> > +       {
>> > +               typmod_index_entry =
>> > +                       dht_find_or_insert(CurrentSharedRecordTypmodRegistry.typmod_index,
>> > +                                                          &typmod, &found);
>> > +               if (found)
>> > +                       elog(ERROR, "cannot create duplicate shared record typmod");
>> > +       }
>> > +       PG_CATCH();
>> > +       {
>> > +               /*
>> > +                * If we failed to allocate or elog()ed, we have to be careful not to
>> > +                * leak the shared memory.  Note that we might have created a new
>> > +                * atts_index entry above, but we haven't put anything in it yet.
>> > +                */
>> > +               dsa_free(CurrentSharedRecordTypmodRegistry.area, shared_dp);
>> > +               PG_RE_THROW();
>> > +       }
>> >
>> > Not entirely related, but I do wonder if we don't need abetter solution
>> > to this. Something like dsa pointers that register appropriate memory
>> > context callbacks to get deleted in case of errors?
>>
>> Huh, scope guards.  I have had some ideas about some kind of
>> destructor mechanism that might replace what we're doing with DSM
>> detach hooks in various places and also work in containers like hash
>> tables (ie entries could have destructors), but doing it with the
>> stack is another level...
>
> Not sure what you mean with 'stack'?

I probably read too much into your words.  I was imagining something
conceptually like the following, since the "appropriate memory
context" in the code above is actually a stack frame:
 dsa_pointer p = ...;
 ON_ERROR_SCOPE_EXIT(dsa_free, area, p); /* yeah, I know, no variadic macros */
 elog(ERROR, "boo"); /* this causes p to be freed */

The point being that if the caller of this function catches the error
then ITS on-error-cleanup stack mustn't run, but this one's must.
Hence requirement for awareness of stack.  I'm not sure it's actually
any easier to use this than the existing try/catch macros and I'm not
proposing it, but I thought perhaps you were.

> +/*
> + * A struct encapsulating some elements of a user's session.  For now this
> + * manages state that applies to parallel query, but it principle it could
> + * include other things that are currently global variables.
> + */
> +typedef struct Session
> +{
> +       dsm_segment        *segment;            /* The session-scoped DSM segment. */
> +       dsa_area           *area;                       /* The session-scoped DSA area. */
> +
> +       /* State managed by typcache.c. */
> +       SharedRecordTypmodRegistry *typmod_registry;
> +       dshash_table   *record_table;   /* Typmods indexed by tuple descriptor */
> +       dshash_table   *typmod_table;   /* Tuple descriptors indexed by typmod */
> +} Session;
>
>
> Interesting. I was apparently thinking slightly differently. I'd have
> thought we'd have Session struct in statically allocated shared
> memory. Which'd then have dsa_handle, dshash_table_handle, ... members.

A session needs (1) some backend-private state to hold the addresses
of stuff in this process's memory map and (2) some shared state worth
pointing to.  My patch introduces both of those things, but doesn't
need to make the shared state part 'discoverable'.  Workers get their
hands on it by receiving it explicitly from a leader.

I think that you're right about a cluster-wide PGSESSION array being
useful if we divorce session from processes completely and write some
kind of scheduler.  But for the purposes of this patch set we don't
seem to need to any decisions about that.  Leaders passing DSM handles
over to workers seems to be enough for now.  Does this make sense?

-- 
Thomas Munro
http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] The error message "sorry, too many clients already" isimprecise
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization