Re: LSM tree for Postgres

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: LSM tree for Postgres
Дата
Msg-id 4ab882f6-e29a-4a97-14a6-f82a81c55a65@postgrespro.ru
обсуждение исходный текст
Ответ на Re: LSM tree for Postgres  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers

On 09.08.2020 04:53, Alexander Korotkov wrote:
>>
>> I realize that it is not true LSM.
>> But still I wan to notice that it is able to provide ~10 times increase
>> of insert speed when size of index is comparable with RAM size.
>> And "true LSM" from RocksDB shows similar results.
> It's very far from being shown.  All the things you've shown is a
> naive benchmark.  I don't object that your design can work out some
> cases.  And it's great that we have the lsm3 extension now.  But I
> think for PostgreSQL core we should think about better design.

Sorry, I mean that at particular benchmark and hardware Lsm3 and RocksDB 
shows similar performance.
It definitely doesn't mean that it will be true in all other cases.
This is one of the reasons why I have published this Lsm3 and RockDB FDW 
extensions:
anybody can try to test them at their workload.
It will be very interesting to me to know this results, because I 
certainly understand
that measuring of random insert performance in dummy table is not enough 
to make some
conclusions.

And I certainly do not want to say that we do not need "right" LSM 
implementation inside Postgres core.
It just requires an order of magnitude more efforts.
And there are many questions and challenges. For example Postgres buffer 
size (8kb) seems to be too small for LSM.
Should LSM implementation bypass Postgres buffer cache? There pros and 
contras...

Another issue is logging. Should we just log all operations with LSM in 
WAL in usual way (as it is done for nbtree and Lsm3)?
It seems to me that for LSM alternative and more efficient solutions may 
be proposed.
For example we may not log inserts in top index at all and just replay 
them during recovery, assuming that this operation with
small index is fast enough. And merge of top index with base index can 
be done in atomic way and so also doesn't require WAL.

As far as I know Anastasia Lubennikova several years ago has implemented 
LSM for Postgres.
There was some performance issues (with concurrent access?).
This is why the first thing I want to clarify for myself is what are the 
bottlenecks of LSM architecture
and are them caused by LSM itself or its integration in Postgres 
infrastructure.

I any case, before thinking about details of in-core LSM implementation 
for Postgres, I think that
it is necessary to demonstrate workloads at which RocksDB (or any other 
existed DBMS with LSM)
shows significant performance advantages comparing with Postgres with 
nbtree/Lsm3.

>> May be if size of
>> index will be 100 times larger then
>> size of RAM, RocksDB will be significantly faster than Lsm3. But modern
>> servers has 0.5-1Tb of RAM.
>> Can't believe that there are databases with 100Tb indexes.
> Comparison of whole RAM size to single index size looks plain wrong
> for me.  I think we can roughly compare whole RAM size to whole
> database size.  But also not the whole RAM size is always available
> for caching data.  Let's assume half of RAM is used for caching data.
> So, a modern server with 0.5-1Tb of RAM, which suffers from random
> B-tree insertions and badly needs LSM-like data-structure, runs a
> database of 25-50Tb.  Frankly speaking, there is nothing
> counterintuitive for me.

There is actually nothing counterintuitive.
I just mean that there are not so much 25-50Tb OLTP databases.




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: Amcheck: do rightlink verification with lock coupling
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Unnecessary delay in streaming replication due to replay lag