Re: LSM tree for Postgres

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: LSM tree for Postgres
Дата
Msg-id CAH2-WzmxTn+9fgXJ8hm_jErfpKOSJNXhm3nhkHB4Czf9WrxZpQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: LSM tree for Postgres  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On Tue, Aug 4, 2020 at 8:24 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> So, in my point of view LSM as an index AM is far not a full power LSM
> for PostgreSQL, but it's still useful.  Large insert-only tables can
> benefit from LSM.  Large tables with many indexes could also benefit,
> because non-HOT updates will become cheaper.

Right -- this is why you usually have to choose one or the other. An
LSM design typically subsumes not just indexing and table storage, but
also checkpointing -- you cannot really compare an LSM to a B-Tree
because you really have to talk about other components to make a
sensible comparison (at which point you're actually comparing two
totally different *storage engines*). Roughly speaking, the compaction
process is the equivalent of checkpointing. So you either use (say)
InnoDB or RocksDB everywhere -- you usually can't have it both ways.
Well, maybe you can kind of get the benefits of both, but in practice
LSMs are usually highly optimized for the things that they're good at,
at the expense of other things. So in practice you kind of have to
make an up-front choice. An LSM is definitely not a natural fit for
the index access method interface in Postgres.

One thing that I don't think anyone else made reference to on the
thread (which is surprising) is that the performance of an LSM is
usually not measured using any of the conventional metrics that we
care about. For example, consider the Facebook MyRocks paper
"Optimizing Space Amplification in RocksDB" [1]. The reported RocksDB
throughput for an LSM-sympathetic workload is not really any faster
than InnoDB, and sometimes slower. That's not the point, though; the
main advantages of using an LSM are reductions in space amplification
and write amplification, particularly the latter. This isn't so much
about performance as it is about efficiency -- it enabled Facebook to
get a lot more out of the inexpensive flash storage that they use. It
lowered the total cost of ownership by *a lot*.

I personally think that we should care about efficiency in this sense
a lot more than we do now, but the fact remains that it hasn't really
been considered an independent problem that could be addressed by
accepting a tradeoff similar to the tradeoff LSMs make quite explicit
(apparently you can tune LSMs to get less write amplification but more
read amplification, or vice versa). In general, we almost always just
talk about throughout and latency without considering efficiency
specifically. I'm not suggesting that we need an LSM, but an
appreciation of LSMs could be helpful -- it could lead to better
designs elsewhere.

Mark Callaghan's blog is a pretty good resource for learning about
LSMs [2] (perhaps you've heard of him?). He wrote a bunch of stuff
about Postgres recently, which I enjoyed.

[1] http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf
[2] https://smalldatum.blogspot.com/
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Keep elog(ERROR) and ereport(ERROR) calls in the cold path
Следующее
От: David Rowley
Дата:
Сообщение: Re: pg13dev: explain partial, parallel hashagg, and memory use