Re: Different compression methods for FPI

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: Different compression methods for FPI
Дата
Msg-id 20210614012412.GA31772@telsasoft.com
обсуждение исходный текст
Ответ на Re: Different compression methods for FPI  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Different compression methods for FPI  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: Different compression methods for FPI  (Michael Paquier <michael@paquier.xyz>)
Re: Different compression methods for FPI  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Tue, Jun 01, 2021 at 11:06:53AM +0900, Michael Paquier wrote:
> - Speed and CPU usage.  We should worry about that for CPU-bounded
> environments.
> - Compression ratio, which is just monitoring the difference in WAL.
> - Effect of the level of compression perhaps?
> - Use a fixed amount of WAL generated, meaning a set of repeatable SQL
> queries, with one backend, no benchmarks like pgbench.
> - Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
> - Avoid any extra WAL interference, like checkpoints, no autovacuum
> running in parallel.

I think it's more nuanced than just finding the algorithm with the least CPU
use.  The GUC is PGC_USERSET, and it's possible that a data-loading process
might want to use zlib for better compress ratio, but an interactive OLTP
process might want to use lz4 or no compression for better responsiveness.

Reducing WAL volume during loading can be important - at one site, their SAN
was too slow to keep up during their period of heaviest loading, the
checkpointer fell behind, WAL couldn't be recycled as normal, and the (local)
WAL filesystem overflowed, and then the oversized WAL then needed to be
replayed, to the slow SAN.  A large fraction of their WAL is FPI, and
compression now made this a non-issue.  We'd happily incur 2x more CPU cost if
WAL were 25% smaller.

We're not proposing to enable it by default, so the threshold doesn't have to
be "no performance regression" relative to no compression.  The feature should
provide a faster alternative to PGLZ, and also a method with better compression
ratio to improve the case of heavy WAL writes, by reducing I/O from FPI.

In a CPU-bound environment, one would just disable WAL compression, or use LZ4
if it's cheap enough.  In the IO bound case, someone might enable zlib or zstd
compression.

I found this old thread about btree performance with wal compression (+Peter,
+Andres).

https://www.postgresql.org/message-id/flat/540584F2-A554-40C1-8F59-87AF8D623BB7%40yandex-team.ru#94c0dcaa34e3170992749f6fdc8db35c

And the differences are pretty dramatic, so I ran a single test on my PC:

CREATE TABLE t AS SELECT generate_series(1,999999)a; VACUUM t;
SET wal_compression= off;
\set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE
INDEXON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Time: 1639.375 ms (00:01.639)
wal_bytes        | 20357193

pglz writes ~half as much, but takes twice as long as uncompressed:
|Time: 3362.912 ms (00:03.363)
|wal_bytes        | 11644224

zlib writes ~4x less than ncompressed, and still much faster than pglz
|Time: 2167.474 ms (00:02.167)
|wal_bytes        | 5611653

lz4 is as fast as uncompressed, and writes a bit more than pglz:
|Time: 1612.874 ms (00:01.613)
|wal_bytes        | 12397123

zstd(6) is slower than lz4, but compresses better than anything but zlib.
|Time: 1808.881 ms (00:01.809)
|wal_bytes        | 6395993

In this patch series, I added compression information to the errcontext from
xlog_block_info(), and allow specifying compression levels like zlib-2.  I'll
rearrange that commit earlier if we decide that's desirable to include.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)
Следующее
От: Zhihong Yu
Дата:
Сообщение: Re: unnesting multirange data types