Re: Optimizing pglz compressor

Поиск
Список
Период
Сортировка
От Daniel Farina
Тема Re: Optimizing pglz compressor
Дата
Msg-id CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4YebMOA-wA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizing pglz compressor  (Joachim Wieland <joe@mcknight.de>)
Список pgsql-hackers
On Wed, Mar 6, 2013 at 6:32 AM, Joachim Wieland <joe@mcknight.de> wrote:
> On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> With these tweaks, I was able to make pglz-based delta encoding perform
>> roughly as well as Amit's patch.
>
> Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ?

This one is for the archives, as I thought it surprising: there can be
a surprisingly huge magnitude of performance difference of these
algorithms depending on architecture.  Here's a table reproduced from:
http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

"""
testdata/alice29.txt                     :
ZLIB:    [b 1M] bytes 152089 ->  54404 35.8%  comp   0.8 MB/s  uncomp   8.1 MB/s
LZO:     [b 1M] bytes 152089 ->  82721 54.4%  comp  14.5 MB/s  uncomp  43.0 MB/s
CSNAPPY: [b 1M] bytes 152089 ->  90965 59.8%  comp   2.1 MB/s  uncomp   4.4 MB/s
SNAPPY:  [b 4M] bytes 152089 ->  90965 59.8%  comp   1.8 MB/s  uncomp   2.8 MB/s
testdata/asyoulik.txt                    :
ZLIB:    [b 1M] bytes 125179 ->  48897 39.1%  comp   0.8 MB/s  uncomp   7.7 MB/s
LZO:     [b 1M] bytes 125179 ->  73224 58.5%  comp  15.3 MB/s  uncomp  42.4 MB/s
CSNAPPY: [b 1M] bytes 125179 ->  80207 64.1%  comp   2.0 MB/s  uncomp   4.2 MB/s
SNAPPY:  [b 4M] bytes 125179 ->  80207 64.1%  comp   1.7 MB/s  uncomp   2.7 MB/s

LZO was ~8x faster compressing and ~16x faster decompressing. Only on
uncompressible data was Snappy was faster:

testdata/house.jpg                       :
ZLIB:    [b 1M] bytes 126958 -> 126513 99.6%  comp   1.2 MB/s  uncomp   9.6 MB/s
LZO:     [b 1M] bytes 126958 -> 127173 100.2%  comp   4.2 MB/s  uncomp74.9 MB/s
CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9%  comp  24.6 MB/s  uncomp 381.2 MB/s
SNAPPY:  [b 4M] bytes 126958 -> 126803 99.9%  comp  22.8 MB/s  uncomp 354.4 MB/s
"""

So that's one more gotcha to worry about, since I surmise most numbers
are being taken on x86.  Apparently this has something to do with
alignment of accesses.  Some of it may be fixable by tweaking the
implementation rather than the compression encoding, although I am no
expert in the matter.

-- 
fdr



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Farina
Дата:
Сообщение: Re: Enabling Checksums
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Trust intermediate CA for client certificates