Re: Dynamic Shared Memory stuff

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Dynamic Shared Memory stuff
Дата
Msg-id CA+TgmoayUzQ6Kjs5osEV+JNpVvK=b3mDg=dLDeiTeFJ+97BNRA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Dynamic Shared Memory stuff  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Dynamic Shared Memory stuff
Список pgsql-hackers
On Thu, Dec 5, 2013 at 11:12 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm. Those two use cases are quite different. For message-passing, you want
> a lot of small queues, but for parallel sort, you want one huge allocation.
> I wonder if we shouldn't even try a one-size-fits-all solution.
>
> For message-passing, there isn't much need to even use dynamic shared
> memory. You could just assign one fixed-sized, single-reader multiple-writer
> queue for each backend.

True, although if the queue needs to 1MB, or even 128kB, that would
bloat the static shared-memory footprint over the server pretty
significantly.  And I don't know that we know that a small queue will
be adequate in all cases.  If you've got a worker backend feeding data
back to the user backend, the size of the queue limits how far ahead
of the user backend that worker can get.  Big is good, because then
the user backend won't stall on read, but small is also good, in case
the query is cancelled or hits an error.  It is far from obvious to me
that one-size-fits-all is the right solution.

> For parallel sort, you'll want to utilize all the available memory and all
> CPUs for one huge sort. So all you really need is a single huge shared
> memory segment. If one process is already using that 512GB segment to do a
> sort, you do *not* want to allocate a second 512GB segment. You'll want to
> wait for the first operation to finish first. Or maybe you'll want to have
> 3-4 somewhat smaller segments in use at the same time, but not more than
> that.

This is all true, but it has basically nothing to do with parallelism.work_mem is a poor model, but I didn't invent it.
Hopefully some day
 
someone will fix it, maybe even me, but that's a separate project.

> I really think we need to do something about it. To use your earlier example
> of parallel sort, it's not acceptable to permanently leak a 512 GB segment
> on a system with 1 TB of RAM.
>
> One idea is to create the shared memory object with shm_open, and wait until
> all the worker processes that need it have attached to it. Then,
> shm_unlink() it, before using it for anything. That way the segment will be
> automatically released once all the processes close() it, or die. In
> particular, kill -9 will release it. (This is a variant of my earlier idea
> to create a small number of anonymous shared memory file descriptors in
> postmaster startup with shm_open(), and pass them down to child processes
> with fork()). I think you could use that approach with SysV shared memory as
> well, by destroying the segment with sgmget(IPC_RMID) immediately after all
> processes have attached to it.

That's a very interesting idea.  I've been thinking that we needed to
preserve the property that new workers could attach to the shared
memory segment at any time, but that might not be necessary in all
case.  We could introduce a new dsm operation that means "i promise no
one else needs to attach to this segment".  Further attachments would
be disallowed by dsm.c regardless of the implementation in use, and
dsm_impl.c would also be given a chance to perform
implementation-specific operations, like shm_unlink and
shmctl(IPC_RMID).  This new operation, when used, would help to reduce
the chance of leaks and perhaps catch other programming errors as
well.

What should we call it?  dsm_finalize() is the first thing that comes
to mind, but I'm not sure I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: shared memory message queues
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Proof of concept: standalone backend with full FE/BE protocol