Re: LSM tree for Postgres

Поиск
Список
Период
Сортировка
От AJG
Тема Re: LSM tree for Postgres
Дата
Msg-id 1597446860793-0.post@n3.nabble.com
обсуждение исходный текст
Ответ на Re: LSM tree for Postgres  (Dmitry Dolgov <9erthalion6@gmail.com>)
Список pgsql-hackers
Dmitry Dolgov wrote
>> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
>>
>> Then I think about implementing ideas of LSM using standard Postgres
>> nbtree.
>>
>> We need two indexes: one small for fast inserts and another - big
>> (main) index. This top index is small enough to fit in memory so
>> inserts in this index are very fast.  Periodically we will merge data
>> from top index to base index and truncate the top index. To prevent
>> blocking of inserts in the table while we are merging indexes we can
>> add ... on more index, which will be used during merge.
>>
>> So final architecture of Lsm3 is the following: two top indexes used
>> in cyclic way and one main index. When top index reaches some
>> threshold value we initiate merge with main index, done by bgworker
>> and switch to another top index.  As far as merging indexes is done in
>> background, it doesn't  affect insert speed.  Unfortunately Postgres
>> Index AM has not bulk insert operation, so we have to perform normal
>> inserts.  But inserted data is already sorted by key which should
>> improve access locality and partly solve random reads problem for base
>> index.
>>
>> Certainly to perform search in Lsm3 we have to make lookups in all
>> three indexes and merge search results.
>
> Thanks for sharing this! In fact this reminds me more of partitioned
> b-trees [1] (and more older [2]) rather than LSM as it is (although
> could be that the former was influenced by the latter). What could be
> interesting is that quite often in these and many other whitepapers
> (e.g. [3]) to address the lookup overhead the design includes bloom
> filters in one or another way to avoid searching not relevant part of an
> index. Tomas mentioned them in this thread as well (in the different
> context), probably the design suggested here could also benefit from it?
>
> [1]: Riegger Christian, Vincon Tobias, Petrov Ilia. Write-optimized
> indexing with partitioned b-trees. (2017). 296-300.
> 10.1145/3151759.3151814.
> [2]: Graefe Goetz. Write-Optimized B-Trees. (2004). 672-683.
> 10.1016/B978-012088469-8/50060-7.
> [3]: Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky,
> Lin Ma, and Rui Shen. Reducing the Storage Overhead of Main-Memory OLTP
> Databases with Hybrid Indexes. (2016). 1567–1581. 10.1145/2882903.2915222.


I found this 2019 paper recently, might be worth a skim read for some
different ideas. too technical for me :)
"Jungle: Towards Dynamically Adjustable Key-Value Storeby Combining LSM-Tree
and Copy-On-Write B+-Tree"
https://www.usenix.org/system/files/hotstorage19-paper-ahn.pdf



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Switch to multi-inserts for pg_depend