Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id CA+Tgmoac_z4QCbb+un8sMMCqU_XGjt6cD6zYV1D1xD5Y=32eAw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Thinking a bit harder about this, I suppose there could be a kind of
> object called a SharedBufFileManager (insert better name) which you
> can store in a DSM segment.  The leader backend that initialises a DSM
> segment containing one of these would then call a constructor function
> that sets an internal refcount to 1 and registers an on_dsm_detach
> callback for its on-detach function.  All worker backends that attach
> to the DSM segment would need to call an attach function for the
> SharedBufFileManager to increment a refcount and also register the
> on_dsm_detach callback, before any chance that an error might be
> thrown (is that difficult?); failure to do so could result in file
> leaks.  Then, when a BufFile is to be shared (AKA exported, made
> unifiable), a SharedBufFile object can be initialised somewhere in the
> same DSM segment and registered with the SharedBufFileManager.
> Internally all registered SharedBufFile objects would be linked
> together using offsets from the start of the DSM segment for link
> pointers.  Now when SharedBufFileManager's on-detach function runs, it
> decrements the refcount in the SharedBufFileManager, and if that
> reaches zero then it runs a destructor that spins through the list of
> SharedBufFile objects deleting files that haven't already been deleted
> explicitly.

I think this is approximately reasonable, but I think it could be made
simpler by having fewer separate objects.  Let's assume the leader can
put an upper bound on the number of shared BufFiles at the time it's
sizing the DSM segment (i.e. before InitializeParallelDSM).  Then it
can allocate a big ol' array with a header indicating the array size
and each element containing enough space to identify the relevant
details of 1 shared BufFile.  Now you don't need to do any allocations
later on, and you don't need a linked list.  You just loop over the
array and do what needs doing.

> There are a couple of problems with the above though.  Firstly, doing
> reference counting in DSM segment on-detach hooks is really a way to
> figure out when the DSM segment is about to be destroyed by keeping a
> separate refcount in sync with the DSM segment's refcount, but it
> doesn't account for pinned DSM segments.  It's not your use-case or
> mine currently, but someone might want a DSM segment to live even when
> it's not attached anywhere, to be reattached later.  If we're trying
> to use DSM segment lifetime as a scope, we'd be ignoring this detail.
> Perhaps instead of adding our own refcount we need a new kind of hook
> on_dsm_destroy.

I think it's good enough to plan for current needs now.  It's not
impossible to change this stuff later, but we need something that
works robustly right now without being too invasive.  Inventing whole
new system concepts because of stuff we might someday want to do isn't
a good idea because we may easily guess wrong about what direction
we'll want to go in the future.  This is more like building a wrench
than a 747: a 747 needs to be extensible and reconfigurable and
upgradable because it costs $350 million.   A wrench costs $10 at
Walmart and if it turns out we bought the wrong one, we can just throw
it out and get a different one later.

> Secondly, I might not want to be constrained by a
> fixed-sized DSM segment to hold my SharedBufFile objects... there are
> cases where I need to shared a number of batch files that is unknown
> at the start of execution time when the DSM segment is sized (I'll
> write about that shortly on the Parallel Shared Hash thread).  Maybe I
> can find a way to get rid of that requirement.  Or maybe it could
> support DSA memory too, but I don't think it's possible to use
> on_dsm_detach-based cleanup routines that refer to DSA memory because
> by the time any given DSM segment's detach hook runs, there's no
> telling which other DSM segments have been detached already, so the
> DSA area may already have partially vanished; some other kind of hook
> that runs earlier would be needed...

Again, wrench.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal