Re: Zedstore - compressed in-core columnar storage

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Zedstore - compressed in-core columnar storage
Дата
Msg-id 44cafbdf-0a04-64ef-3cf8-6d5b14b68643@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Zedstore - compressed in-core columnar storage  (Ashwin Agrawal <aagrawal@pivotal.io>)
Ответы Re: Zedstore - compressed in-core columnar storage  (Andreas Karlsson <andreas@proxel.se>)
Список pgsql-hackers


On 11.04.2019 8:03, Ashwin Agrawal wrote:
On Apr 10, 2019, at 9:08 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:


On 11/04/19 4:01 PM, Mark Kirkwood wrote:
On 9/04/19 12:27 PM, Ashwin Agrawal wrote:

Heikki and I have been hacking recently for few weeks to implement
in-core columnar storage for PostgreSQL. Here's the design and initial
implementation of Zedstore, compressed in-core columnar storage (table
access method). Attaching the patch and link to github branch [1] to
follow along.


Very nice. I realize that it is very early days, but applying this patch I've managed to stumble over some compression bugs doing some COPY's:

benchz=# COPY dim1 FROM '/data0/dump/dim1.dat'
USING DELIMITERS ',';
psql: ERROR:  compression failed. what now?
CONTEXT:  COPY dim1, line 458

The log has:

2019-04-11 15:48:43.976 NZST [2006] ERROR:  XX000: compression failed. what now?
2019-04-11 15:48:43.976 NZST [2006] CONTEXT:  COPY dim1, line 458
2019-04-11 15:48:43.976 NZST [2006] LOCATION: zs_compress_finish, zedstore_compression.c:287
2019-04-11 15:48:43.976 NZST [2006] STATEMENT:  COPY dim1 FROM '/data0/dump/dim1.dat'   USING DELIMITERS ',';

The dataset is generated from and old DW benchmark I wrote (https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_projects_benchw_&d=DwIDaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=gxIaqms7ncm0pvqXLI_xjkgwSStxAET2rnZQpzba2KM&m=BgmTkDoY6SKOgODe8v6fpH4hs-wM0H91cLfrAfEL6C0&s=lLcXp_8h2bRb_OR4FT8kxD-FG9MaLBPU7M5aV9nQ7JY&e=). The row concerned looks like:

457,457th interesting measure,1th measure type,aqwycdevcmybxcnpwqgrdsmfelaxfpbhfxghamfezdiwfvneltvqlivstwralshsppcpchvdkdbraoxnkvexdbpyzgamajfp
458,458th interesting measure,2th measure type,bjgdsciehjvkxvxjqbhtdwtcftpfewxfhfkzjsdrdabbvymlctghsblxucezydghjrgsjjjnmmqhncvpwbwodhnzmtakxhsg


I'll see if changing to LZ4 makes any different.


The COPY works with LZ4 configured.
Thank you for trying it out. Yes, noticed for certain patterns pg_lzcompress() actually requires much larger output buffers. Like for one 86 len source it required 2296 len output buffer. Current zedstore code doesn’t handle this case and errors out. LZ4 for same patterns works fine, would highly recommend using LZ4 only, as anyways speed is very fast as well with it.



Internal Postgres lz compressor is really very inefficient comparing with other compression algorithms.
But in any case you should never assume that size of compressed data will be smaller than size of plain data.
Moreover, if you are trying to compress already compressed data, then result almost always will be larger.
If size of compressed data is larger (or even not significantly smaller) than size of raw data, then you should store original data.

lz4 is actually very fast. But it doesn't provide good compression ratio.
This my results of compressing pbench data using different compressors:

ConfigurationSize (Gb)Time (sec)
no compression
15.3192
zlib (default level) 2.37 284
zlib (best speed) 2.43191
postgres internal lz 3.89 214
lz44.12
95
snappy5.1899
lzfse2.801099
(apple) 2.80 1099
1.69125


You see that zstd provides almost 2 times better compression ration and almost at the same speed.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_rewind vs superuser
Следующее
От: "Higuchi, Daisuke"
Дата:
Сообщение: RE: Problem during Windows service start