[GSoC] artbufmgr

Поиск
Список
Период
Сортировка
От pantilimonov misha
Тема [GSoC] artbufmgr
Дата
Msg-id 7546461561465811@myt2-69536e66ccf5.qloud-c.yandex.net
обсуждение исходный текст
Ответы Re: [GSoC] artbufmgr  (pantilimonov michael <pantlimon@yandex.ru>)
Список pgsql-hackers
Hello everyone,

I am participating in gsoc and would like to share the current working status of my project,
devoted to an alternative data structure for buffer management.

To get things going, I started with the basic implementation of the art tree for the single
backend process, using the open-source realization of the algorithm. That was
a pretty straightforward task that helped me to better understand
data structure and convenient mechanism of PostgreSQL for local memory allocation.
For the validation purposes, I also ported a couple of tests as a simple extension with a function,
that can read a file with test data and run basic operations on the tree: insert/search/delete.
Described changes are present in the first two commits [0].

It is worthwhile to note that this "src/backend/lib/artree.c" implementation will not go any further
and will be thrown away for the final patch. The reason for that is an example with the current dynahash
implementation(that I have examined later, while was trying to move the tree into shared memory),
that supports both access patterns with shared & local memory.
Thus, I guess that the proper place for the tree data structure is nearby dynahash, i.e. in
"src/backend/utils/tree/..."
and eventually, with some restrictions for the shared variant (fixed key size length), it can be used in some other
places
of the system.

The last two commits implement the simplest form of the buf-tree in the shared memory, alongside the existing
hashtable.
SharedBufTree just repeats all the actions that are performed on the hashtable and compares results.
For now, it completely relies on the synchronization, that is performed at the layer above for the hashtable,
i.e. BufferAlloc(...). Also, the size of memory that is used for tree's nodes/leaves depends on just some random
numbers.
(working with default 128mb shared memory size)
Operations that work in a bulky way with buffers (drop/truncate/checkpoint/...) are not touched.

So, this is a plan, that I would like to stick with subsequently:

1) Work on synchronization.
2) Examine bulk operations on buffers, replace them with tree (prefix) iterator.
3) A reasonable way to size memory for tree maintenance.

Ideas & suggestions & comments are welcomed.

Dynamic data structure like tree brings problems that are not relevant for the current hashtable implementation.
The number of LWLocks in the hashtable is fixed and equal to the number of partitions(128),
the memory footprint for buckets and hash-entries can be easily precomputed.

Currently, I am thinking about the idea of tree decomposition, so that it is really a
tree of trees, where RelFileNode[12bytes] (maybe + ForkNum) can be used for the top tree and
the rest for the trees of blocks. Hence, each bottom tree can contain a single private LWLock, that will be
used to protect its content. Won't be that an overkill? if we take a hypothetical example with
2 tablespaces(hdd + sdd), each with 500+ objects, so the number of LWLocks should scale accordingly, or
we will be forced to reuse them, by unloading block trees, that should be done in some intelligent way.

Another approach is to use lock coupling... with the penalty of cache invalidation.

-- 
[0] https://github.com/mvpant/postgres/commits/artbufmgr




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Dead encoding conversion functions