Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Дата
Msg-id CA+TgmoYPec_Awn+NM-ETnzOwyiYMmH-JaH1-LDOvFDqsFojsTw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL  (knizhnik <knizhnik@garret.ru>)
Ответы Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL  (james <james@mansionfamily.plus.com>)
Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL  (knizhnik <knizhnik@garret.ru>)
Список pgsql-hackers
On Sat, Jan 4, 2014 at 3:27 PM, knizhnik <knizhnik@garret.ru> wrote:
> 1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
> shared memory), like 9.2, 9.3.1,...

Yeah.  If it's loaded at postmaster start time, then it can work with
any version.  On 9.4+, you could possibly make it work even if it's
loaded on the fly by using the dynamic shared memory facilities.
However, there are currently some limitations to those facilities that
make some things you might want to do tricky.  There are pending
patches to lift some of these limitations.

> 2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
> hash_search,...)
> May be I missed something - I just noticed DSM and have no chance to
> investigate it, but looks like hash table can not be allocated in DSM...

It wouldn't be very difficult to write an analog of ShmemInitHash() on
top of the dsm_toc patch that is currently pending.  A problem,
though, is that it's not currently possible to put LWLocks in dynamic
shared memory, and even spinlocks will be problematic if
--disable-spinlocks is used.  I'm due to write a post about these
problems; perhaps I should go do that.

> 3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
> to provide own allocator (although creation of non-releasing memory
> allocator should not be a big issue).

The dsm_toc infrastructure would solve this problem.

> 4. Current implementation of DSM still suffers from 256Gb problem. Certainly
> I can create multiple segments and so provide workaround without using huge
> pages, but it complicates allocator.

So it sounds like DSM should also support huge pages somehow.  I'm not
sure what that should look like.

> 5. I wonder if I dynamically add new DSM segment - will it be available for
> other PostgreSQL processes? For example I run query which loads data in IMCS
> and so needs more space and allocates new DSM segment. Then another query is
> executed by other PostgreSQL process which tries to access this data. This
> process is not forked from the process created this new DSM segment, so I do
> not understand how this segment will be mapped to the address space of this
> process, preserving address... Certainly I can prohibit dynamic extension of
> IMCS storage (hoping that in this case there will be no such problem with
> DSM). But in this case we will loose the main advantage of using DSM instead
> of old schema of plugin's private shared memory.

You can definitely dynamically add a new DSM segment; that's the point
of making it *dynamic* shared memory.  What's a bit tricky as things
stand today is making sure that it sticks around.  The current model
is that the DSM segment is destroyed when the last process unmaps it.
It would be easy enough to lift that limitation on systems other than
Windows; we could just add a dsm_keep_until_shutdown() API or
something similar.  But on Windows, segments are *automatically*
destroyed *by the operating system* when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

> 6. IMCS has some configuration parameters which has to be set through
> postgresql.conf. So in any case user has to edit postgresql.conf file.
> In case of using DSM it will be not necessary to add IMCS to
> shared_preload_libraries list. But I do not think that it is so restrictive
> and critical requirement, is it?

I don't really see a problem here.  One of the purposes of dynamic
shared memory (and dynamic background workers) is precisely that you
don't *necessarily* need to put extensions that use shared memory in
shared_preload_libraries - or in other words, you can add the
extension to a running server without restarting it.  If you know in
advance that you will want it, you probably still *want* to put it in
shared_preload_libraries, but part of the idea is that we can get away
from requiring that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Florian Weimer
Дата:
Сообщение: Re: RFC: Async query processing
Следующее
От: james
Дата:
Сообщение: Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL