Re: pluggable compression support

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: pluggable compression support
Дата
Msg-id 20130621000900.GA12425@alap2.anarazel.de
обсуждение исходный текст
Ответ на Re: pluggable compression support  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: pluggable compression support  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2013-06-15 12:20:28 +0200, Andres Freund wrote:
> On 2013-06-14 21:56:52 -0400, Robert Haas wrote:
> > I don't think we need it.  I think what we need is to decide is which
> > algorithm is legally OK to use.  And then put it in.
> >
> > In the past, we've had a great deal of speculation about that legal
> > question from people who are not lawyers.  Maybe it would be valuable
> > to get some opinions from people who ARE lawyers.  Tom and Heikki both
> > work for real big companies which, I'm guessing, have substantial
> > legal departments; perhaps they could pursue getting the algorithms of
> > possible interest vetted.  Or, I could try to find out whether it's
> > possible do something similar through EnterpriseDB.
>
> I personally don't think the legal arguments holds all that much water
> for snappy and lz4. But then the opinion of a european non-lawyer doesn't
> hold much either.
> Both are widely used by a large number open and closed projects, some of
> which have patent grant clauses in their licenses. E.g. hadoop,
> cassandra use lz4, and I'd be surprised if the companies behind those
> have opened themselves to litigation.
>
> I think we should preliminarily decide which algorithm to use before we
> get lawyers involved. I'd surprised if they can make such a analysis
> faster than we can rule out one of them via benchmarks.
>
> Will post an updated patch that includes lz4 as well.

Attached.

Changes:
* add lz4 compression algorithm (2 clause bsd)
* move compression algorithms into own subdirectory
* clean up compression/decompression functions
* allow 258 compression algorithms, uses 1byte extra for any but the
  first three
* don't pass a varlena to pg_lzcompress.c anymore, but data directly
* add pglz_long as a test fourth compression method that uses the +1
  byte encoding
* us postgres' endian detection in snappy for compatibility with osx

Based on the benchmarks I think we should go with lz4 only for now. The
patch provides the infrastructure should somebody else want to add more
or even proper configurability.

Todo:
* windows build support
* remove toast_compression_algo guc
* remove either snappy or lz4 support
* remove pglz_long support (just there for testing)

New benchmarks:

Table size:
                          List of relations
 Schema |        Name        | Type  | Owner  |  Size  | Description
--------+--------------------+-------+--------+--------+-------------
 public | messages_pglz      | table | andres | 526 MB |
 public | messages_snappy    | table | andres | 523 MB |
 public | messages_lz4       | table | andres | 522 MB |
 public | messages_pglz_long | table | andres | 527 MB |
(4 rows)

Workstation (2xE5520, enough s_b for everything):

Data load:
pglz:        36643.384 ms
snappy:     24626.894 ms
lz4:            23871.421 ms
pglz_long:    37097.681 ms

COPY messages_* TO '/dev/null' WITH BINARY;
pglz:           3116.083 ms
snappy:         2524.388 ms
lz4:            2349.396 ms
pglz_long:      3104.134 ms

COPY (SELECT rawtxt FROM messages_*) TO '/dev/null' WITH BINARY;
pglz:           1609.969 ms
snappy:         1031.696 ms
lz4:             886.782 ms
pglz_long:      1606.803 ms


On my elderly laptop (core 2 duo), too load shared buffers:

Data load:
pglz:        39968.381 ms
snappy:     26952.330 ms
lz4:            29225.472 ms
pglz_long:    39929.568 ms

COPY messages_* TO '/dev/null' WITH BINARY;
pglz:           3920.588 ms
snappy:         3421.938 ms
lz4:            3311.540 ms
pglz_long:      3885.920 ms

COPY (SELECT rawtxt FROM messages_*) TO '/dev/null' WITH BINARY;
pglz:           2238.145 ms
snappy:         1753.403 ms
lz4:            1638.092 ms
pglz_long:      2227.804 ms


Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)