Re: Dynamic Shared Memory stuff

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Dynamic Shared Memory stuff
Дата
Msg-id 52A0A600.5080805@vmware.com
обсуждение исходный текст
Ответ на Re: Dynamic Shared Memory stuff  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Dynamic Shared Memory stuff
Re: Dynamic Shared Memory stuff
Список pgsql-hackers
On 11/20/2013 09:58 PM, Robert Haas wrote:
> On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> How many allocations? What size will they have have typically, minimum and
>> maximum?
>
> The facility is intended to be general, so the answer could vary
> widely by application.  The testing that I have done so far suggests
> that for message-passing, relatively small queue sizes (a few kB,
> perhaps 1 MB at the outside) should be sufficient.  However,
> applications such as parallel sort could require vast amounts of
> shared memory.  Consider a machine with 1TB of memory performing a
> 512GB internal sort.  You're going to need 512GB of shared memory for
> that.

Hmm. Those two use cases are quite different. For message-passing, you 
want a lot of small queues, but for parallel sort, you want one huge 
allocation. I wonder if we shouldn't even try a one-size-fits-all solution.

For message-passing, there isn't much need to even use dynamic shared 
memory. You could just assign one fixed-sized, single-reader 
multiple-writer queue for each backend.

For parallel sort, you'll want to utilize all the available memory and 
all CPUs for one huge sort. So all you really need is a single huge 
shared memory segment. If one process is already using that 512GB 
segment to do a sort, you do *not* want to allocate a second 512GB 
segment. You'll want to wait for the first operation to finish first. Or 
maybe you'll want to have 3-4 somewhat smaller segments in use at the 
same time, but not more than that.

>> * As discussed in the "Something fishy happening on frogmouth" thread, I
>> don't like the fact that the dynamic shared memory segments will be
>> permanently leaked if you kill -9 postmaster and destroy the data directory.
>
> Your test elicited different behavior for the dsm code vs. the main
> shared memory segment because it involved running a new postmaster
> with a different data directory but the same port number on the same
> machine, and expecting that that new - and completely unrelated -
> postmaster would clean up the resources left behind by the old,
> now-destroyed cluster.  I tend to view that as a defect in your test
> case more than anything else, but as I suggested previously, we could
> potentially change the code to use something like 1000000 + (port *
> 100) with a forward search for the control segment identifier, instead
> of using a state file, mimicking the behavior of the main shared
> memory segment.  I'm not sure we ever reached consensus on whether
> that was overall better than what we have now.

I really think we need to do something about it. To use your earlier 
example of parallel sort, it's not acceptable to permanently leak a 512 
GB segment on a system with 1 TB of RAM.

One idea is to create the shared memory object with shm_open, and wait 
until all the worker processes that need it have attached to it. Then, 
shm_unlink() it, before using it for anything. That way the segment will 
be automatically released once all the processes close() it, or die. In 
particular, kill -9 will release it. (This is a variant of my earlier 
idea to create a small number of anonymous shared memory file 
descriptors in postmaster startup with shm_open(), and pass them down to 
child processes with fork()). I think you could use that approach with 
SysV shared memory as well, by destroying the segment with 
sgmget(IPC_RMID) immediately after all processes have attached to it.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Performance optimization of btree binary search
Следующее
От: Marko Kreen
Дата:
Сообщение: Re: Feature request: Logging SSL connections