Обсуждение: Shared sequence-like objects in PostgreSQL

Поиск
Список
Период
Сортировка

Shared sequence-like objects in PostgreSQL

От
Vlad Arkhipov
Дата:
Hello all,

I'm writing a C-language function that is similar to nextval() but 
should return the next member of the recurrent sequence:
T(n+1) = f(T(n), T(n-1), ..., T(n-k)), where f is some function and k is 
a constant.
The state of this object should be persistent between database restarts 
and should be easily recovered if the database crashes.

So the first problem I encountered was where to store the current state 
of this object (n and values T(n), T(n-1), ... T(n-k)). I believe that 
TopMemoryContext is not shared between processes, therefore I must use 
shmem functions from backend/storage/ipc/shmem.c to create a structure 
in shared memory.

The next issue is how to synchronize backends' reads/writes to this 
chunk of shared memory. I suppose there must be something to handle with 
semaphores in the Postgres code.

Then I periodically need to persist the state of this object to the 
database, for example for every 100 generated values, as well as on the 
postmaster's shutdown. What is the best method for doing that?

Please let me know if this problem has been solved before. Thanks for 
you help.


Re: Shared sequence-like objects in PostgreSQL

От
Greg Stark
Дата:
On Wed, Sep 21, 2011 at 8:19 AM, Vlad Arkhipov <arhipov@dc.baikal.ru> wrote:
> I'm writing a C-language function that is similar to nextval() but should
> return the next member of the recurrent sequence:
> T(n+1) = f(T(n), T(n-1), ..., T(n-k)), where f is some function and k is a
> constant.
> The state of this object should be persistent between database restarts and
> should be easily recovered if the database crashes.

The purpose of nextval() is to provide an escape hatch from the normal
transactional guarantees which would normally serialize everything
using it. Avoiding the performance impact of that is the only reason
it needs to use shared memory and so on.

If this function isn't performance critical and doesn't need to be
highly concurrent then you would be better off storing this
information in a table and updating the table using regular database
updates. The way you've defined it also makes me wonder whether you
can afford to skip values. If not then you don't really get an option
of avoiding the serialization.

If you can, one short-cut you could consider would be to populate a
table with the values of the sequence, and periodically populate more
values when you run short of unused values. Then you can use a regular
postgres sequence to generate indexes into that table. That would not
perform quite as well as a shared memory native implementation like
you describe but wouldn't require nearly as much Postgres-specific C
code.

Perhaps if you can explain what the problem you're actually trying to
solve is it might be clearer whether it justifies working at such a
low level.



-- 
greg