Обсуждение: Shared memory and memory context question

Поиск
Список
Период
Сортировка

Shared memory and memory context question

От
richard@playford.net
Дата:
Dear all,

I am writing a C-language shared-object file which is dynamically linked with 
postgres, and uses the various SPI functions for executing queries from 
numerous trigger functions.

My question is thus: what is the best method for a dynamically linked object 
to share memory with the same object running on other backends? Am I right in 
thinking that if I allocate memory in the "upper execution context" from 
SPI_palloc(), this is not shared with the other processes?

I thought of a few ways of doing this (please forgive me if these appear 
idiotic, as I am fairly new to postgres):

1. Change memory context to TopMemoryContext and palloc everything there. 
(However, I believe this still isn't shared between processes?)

2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a 
chunk of shared memory and use this (Although I would like to avoid writing 
my own memory manager to carve up the space).

3. Somehow create shared memory using the shmem functions, and set a memory 
context to live *inside* this shared memory, which my trigger functions can 
then switch to. Then use palloc() and pfree() without worrying..

Please let me know if this problem has been solved before, as I have searched 
through the mailing lists and through the source, but am not sure which is 
the best way to resolve it. Thanks for your help.

Regards,

Richard


Re: Shared memory and memory context question

От
Martijn van Oosterhout
Дата:
On Sun, Feb 05, 2006 at 02:03:59PM +0000, richard@playford.net wrote:
> 1. Change memory context to TopMemoryContext and palloc everything there.
> (However, I believe this still isn't shared between processes?)

Not shared, correct.

> 2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a
> chunk of shared memory and use this (Although I would like to avoid writing
> my own memory manager to carve up the space).

This is the generally accepted method. Please remember that when
sharing structures you have to worry about concurrency. So you need
locking.

> 3. Somehow create shared memory using the shmem functions, and set a memory
> context to live *inside* this shared memory, which my trigger functions can
> then switch to. Then use palloc() and pfree() without worrying..

Nope, palloc/pfree don't deal with concurrency.

> Please let me know if this problem has been solved before, as I have searched
> through the mailing lists and through the source, but am not sure which is
> the best way to resolve it. Thanks for your help.

Most people allocate chunks of shared memory and don't use
palloc/pfree. What are you doing that requires such management? Most
shared structures in PostgreSQL are allocated once and never freed...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Shared memory and memory context question

От
Doug McNaught
Дата:
richard@playford.net writes:

> 1. Change memory context to TopMemoryContext and palloc everything there. 
> (However, I believe this still isn't shared between processes?)

Nope.

> 2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a 
> chunk of shared memory and use this (Although I would like to avoid writing 
> my own memory manager to carve up the space).
>
> 3. Somehow create shared memory using the shmem functions, and set a memory 
> context to live *inside* this shared memory, which my trigger functions can 
> then switch to. Then use palloc() and pfree() without worrying..

You'd have to do one of the above, but #2 is probably out because all
shared memory is allocated to various purposes at startup and there is
none free at runtime (as I understand it).

For #3, how do you plan to have a memory context shared by multiple
backends with no synchronization?  If two backends try to do
allocation or deallocation at the same time you will get corruption,
as I don't think palloc() and pfree() do any locking (they currently
never allocate from shared memory).

You should probably think very carefully about whether you can get
along without using additional shared memory, because it's not that
easy to do.

-Doug


Re: Shared memory and memory context question

От
Richard Hills
Дата:
On Sun February 5 2006 14:11, Martijn van Oosterhout wrote:
> This is the generally accepted method. Please remember that when
> sharing structures you have to worry about concurrency. So you need
> locking.

Of course - I have already implemented locking with semaphores (I may simply 
use one big lock and carefully avoid reentry).

> Nope, palloc/pfree don't deal with concurrency.

Indeed, although if I lock the shared memory then I can palloc and pfree() 
without worrying. The problem I see is that new memory contexts have their 
memory assigned to them when they are created. I can't tell them "go here!"

> Most people allocate chunks of shared memory and don't use
> palloc/pfree. What are you doing that requires such management? Most
> shared structures in PostgreSQL are allocated once and never freed...

I have a number of functions which modify tables based on complex rules stored 
in script-files. I wrote a parser for these files as a separate program first 
before incorporating it as a shared object, subsequentially it loads and 
executes rules from memory. As anything can be read from the files, and rules 
can be unloaded later, I was hoping for flexibility in allocing memory to 
store it all.

Another option is to load the files but store the rules within the database, 
which should be possible, but appears to be a slightly messy way of doing it. 
Then again, messing about with shared memory allocation may be messier. 
Asking as an fairly inexperienced postgres person, what would you suggest?



Re: Shared memory and memory context question

От
Martijn van Oosterhout
Дата:
On Sun, Feb 05, 2006 at 02:31:23PM +0000, Richard Hills wrote:
> I have a number of functions which modify tables based on complex rules stored
> in script-files. I wrote a parser for these files as a separate program first
> before incorporating it as a shared object, subsequentially it loads and
> executes rules from memory. As anything can be read from the files, and rules
> can be unloaded later, I was hoping for flexibility in allocing memory to
> store it all.

So what you load are the already processed rules? In that case you
could probably use the buffer management system. Ask it to load the
blocks and they'll be in the buffer cache. As long as you have the
buffer pinned they'll stay there. That's pretty much a read-only
approach.

If you're talking about things that don't come from disk, well, hmm...
If you want you could use a file on disk as backing and mmap() it into
each processes address space...

> Another option is to load the files but store the rules within the database,
> which should be possible, but appears to be a slightly messy way of doing it.
> Then again, messing about with shared memory allocation may be messier.
> Asking as an fairly inexperienced postgres person, what would you suggest?

The real question is, does it need to be shared-writable.
Shared-readonly is much easier (ie one writer, multiple readers). Using
a file as backing store for mmap() may be the easiest....

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Shared memory and memory context question

От
richard@playford.net
Дата:
On Sun February 5 2006 14:43, Martijn van Oosterhout wrote:
> So what you load are the already processed rules? In that case you
> could probably use the buffer management system. Ask it to load the
> blocks and they'll be in the buffer cache. As long as you have the
> buffer pinned they'll stay there. That's pretty much a read-only
> approach.
>
> If you're talking about things that don't come from disk, well, hmm...
> If you want you could use a file on disk as backing and mmap() it into
> each processes address space...<...>
> The real question is, does it need to be shared-writable.
> Shared-readonly is much easier (ie one writer, multiple readers). Using
> a file as backing store for mmap() may be the easiest....

I load the rules from a script and parse them, storing them in a forest of 
linked malloced structures. These structures are created by one writer but 
then read by a number of readers, and later may be removed by the original 
writer.

So, as you can imagine, I could store the forest in the db, although it might 
be a mess. First I will look through the buffer management system, and see if 
that will do the job.

Thanks for your help,

Regards,

Richard



Re: Shared memory and memory context question

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> So what you load are the already processed rules? In that case you
> could probably use the buffer management system. Ask it to load the
> blocks and they'll be in the buffer cache. As long as you have the
> buffer pinned they'll stay there.

... until you get to the end of the transaction, where the buffer
manager will barf because somebody forgot an unpin.  Long-term buffer
pins are really not acceptable anyway --- you'd essentially be asserting
that your little facility is more important than any other use of shared
buffers, and I'm sorry but that ain't so.

AFAICT the data structures you are worried about don't have any readily
predictable size, which means there is no good way to keep them in
shared memory --- we can't dynamically resize shared memory.  So I think
storing the rules in a table and loading into private memory at need is
really the only reasonable solution.  Storing them in a table has a lot
of other advantages anyway, mainly that you can manipulate them from
SQL.

You can find some prior discussion of similar issues in the archives;
IIRC the idea of a shared plan cache was being kicked around for awhile
some years back.
        regards, tom lane


Re: Shared memory and memory context question

От
Richard Hills
Дата:
On Sun February 5 2006 16:16, Tom Lane wrote:
> AFAICT the data structures you are worried about don't have any readily
> predictable size, which means there is no good way to keep them in
> shared memory --- we can't dynamically resize shared memory.  So I think
> storing the rules in a table and loading into private memory at need is
> really the only reasonable solution.  Storing them in a table has a lot
> of other advantages anyway, mainly that you can manipulate them from
> SQL.

I have come to the conclusion that storing the rules and various other bits in 
tables is the best solution, although this will require a much more complex 
db structure than I had originally planned. Trying to allocate and free 
memory in shared memory is fairly straightforward, but likely to become 
incredibly messy.

Seeing as some of the rules already include load-value-from-db-on-demand, it 
should be fairly straightforward to extend it to load-rule-from-db-on-demand.

Thanks for all your help,

Regards,

Richard


Re: Shared memory and memory context question

От
Neil Conway
Дата:
On Sun, 2006-02-05 at 14:03 +0000, richard@playford.net wrote:
> 3. Somehow create shared memory using the shmem functions, and set a memory 
> context to live *inside* this shared memory, which my trigger functions can 
> then switch to. Then use palloc() and pfree() without worrying..

This has been done before, by the TelegraphCQ folks: they implemented a
shared memory MemoryContext on top of OSSP MM[1]. The code is in the
v0.2 TelegraphCQ tarball[2] -- see shmctx.c and shmset.c in
src/backend/utils/mmgr/. I'm not aware of an independent distribution,
but you could probably separate it out without too much pain.

(Of course, the comments elsewhere in the thread about using an
alternative are probably still true...)

-Neil

[1] http://www.ossp.org/pkg/lib/mm/
[2] http://telegraph.cs.berkeley.edu/downloads/TelegraphCQ-0.2.tar.gz



Re: Shared memory and memory context question

От
"Mark Woodward"
Дата:
Hi!!

I was just browsing the message and saw yours.  I have actually written a
shared memory system for PostgreSQL.

I've done some basic bench testing, and it seems to work, but I haven't
given it the big QA push yet.

My company, Mohawk Software, is going to release a bunch of PostgreSQL
extenssions for text search, shared memory, interfacing, etc.

Here's the source for the shared module. Mind you, it has not been through
rigerous QA yet!!! Also, this is the UNIX/Linux/SHM version, the Win32
version has not been written yet.

http://www.mohawksoft.org
Вложения

Re: Shared memory and memory context question

От
"Mark Woodward"
Дата:
> On Sun February 5 2006 16:16, Tom Lane wrote:
>> AFAICT the data structures you are worried about don't have any readily
>> predictable size, which means there is no good way to keep them in
>> shared memory --- we can't dynamically resize shared memory.  So I think
>> storing the rules in a table and loading into private memory at need is
>> really the only reasonable solution.  Storing them in a table has a lot
>> of other advantages anyway, mainly that you can manipulate them from
>> SQL.
>
> I have come to the conclusion that storing the rules and various other
> bits in
> tables is the best solution, although this will require a much more
> complex
> db structure than I had originally planned. Trying to allocate and free
> memory in shared memory is fairly straightforward, but likely to become
> incredibly messy.
>
> Seeing as some of the rules already include load-value-from-db-on-demand,
> it
> should be fairly straightforward to extend it to
> load-rule-from-db-on-demand.
>

I posted some source to a shared memory sort of thing to the group, as
well as to you, I believe.

For variables and values that change very infrequently, using the DB is
the right idea. PostgreSQL, as well as most databases, crumble under a
highly changing database. By changing, I mean a lot of UPDATES and
DELETES. Inserts are not so bad. PostgreSQL has a fairl poor (IMHO) UPDATE
behaviour. Most transaction aware databases do, but PostgreSQL seems quite
bad.

For an example, if you are doing a scoreboard sort of thing for a website,
updating a single varible in a table 20 times a second, will quickly make
that simple and normally fast update/query take a very long time. You have
to run VACUUM a whole lot.

The next example is a session table for a website, you may have a few
hundred or a few thousand active session rows, but each row may get many
updates, and you may have tens of thousands of sessions which may be
inactive. Unless you vaccum very frequently, you are doing a lot of disk
I/O for every session, because the query has to walk the table file to
find a valid row.

A database is a BAD system to manage data like sessions in an active
website. It is a good tool for most all, but if you are implementing an
eBay or Yahoo, you'll swamp your DB quickly.

The issue with a shared memory system is that you don't get the data
security that you do with disk storage.



Re: Shared memory and memory context question

От
Richard Hills
Дата:
On Mon February 6 2006 05:17, Mark Woodward wrote:
> I posted some source to a shared memory sort of thing to the group, as
> well as to you, I believe.
Indeed, and it looks rather interesting. I'll have a look through it when I 
have a chance...
So, after more discussion and experimentation, the possible methods in order 
of +elegance/-difficulty/-complexity are:

=1. OSSP supported shared mem, possibly with a pg memory context or Mark's 
shared memory manager.
=1. Separate application which the postgres backends talk to over tcp (which 
actually turns out to be quite a clean way of doing it).
3. Storing rules in db and reloading them each time (which turns out to be a 
utter bastard to do).
4. Shared memory with my own memory manager.
I am *probably* going to go for the separate network application, as I 
believe this is easy and relatively clean, as the required messages should be 
fairly straightforward. Each postgres backend opens a connection to the 
single separate "rules-server" which sends back a serious of commands 
(probably SQL), before the connection is closed again.
If this is Clearly Insane - please let me know!

Regards,

Richard


Re: Shared memory and memory context question

От
"Mark Woodward"
Дата:
> On Mon February 6 2006 05:17, Mark Woodward wrote:
>> I posted some source to a shared memory sort of thing to the group, as
>> well as to you, I believe.
>
>     Indeed, and it looks rather interesting. I'll have a look through it when
> I
> have a chance...
>
>     So, after more discussion and experimentation, the possible methods in
> order
> of +elegance/-difficulty/-complexity are:
>
> =1. OSSP supported shared mem, possibly with a pg memory context or Mark's
> shared memory manager.
> =1. Separate application which the postgres backends talk to over tcp
> (which
> actually turns out to be quite a clean way of doing it).

If you hop on over to http://www.mohawksoft.org, you'll see a server
application called "MCache." MCache is written to handle *exactly* the
sort of information you are looking to manage. Its primary duty is to
manage highly concurrent/active sessions for a large web cluster. I have
also been working on a PostgreSQL extension for it. It needs to be fleshed
out and, again, some heavy duty QA, but "works on my machine."

I alluded to releasing an extension module for PostgreSQL, I'm actually
working on a much larger set of projects intended to tightly integrate
PostgreSQL, web servers (PHP right now), and a set of service applications
including search and recommendations. In another thread I wanted to add an
extension, "xmldbx," to postgresql's contrib dir. Anyway, I digress.

If anyone is interested in lending a hand in QA, examples, and so on, I'd
be glad to take this off line.


> 3. Storing rules in db and reloading them each time (which turns out to be
> a
> utter bastard to do).
> 4. Shared memory with my own memory manager.

If you have time and the inclanation to so, it is a fund sort of thing to
write.

>
>     I am *probably* going to go for the separate network application, as I
> believe this is easy and relatively clean, as the required messages should
> be
> fairly straightforward. Each postgres backend opens a connection to the
> single separate "rules-server" which sends back a serious of commands
> (probably SQL), before the connection is closed again.
>
>     If this is Clearly Insane - please let me know!

It isn't a bad idea at all. For MCache, I leave the socket connection open
for the next use of the PostgreSQL session. Web environments usually keep
a cache of active database connections to save the overhead of connecting
each time. You just need to be careful when you clean up.