Re: pluggable compression support
От | Andres Freund |
---|---|
Тема | Re: pluggable compression support |
Дата | |
Msg-id | 20130625184230.GH7716@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: pluggable compression support (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 2013-06-25 12:22:31 -0400, Robert Haas wrote: > On Thu, Jun 20, 2013 at 8:09 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-06-15 12:20:28 +0200, Andres Freund wrote: > >> On 2013-06-14 21:56:52 -0400, Robert Haas wrote: > >> > I don't think we need it. I think what we need is to decide is which > >> > algorithm is legally OK to use. And then put it in. > >> > > >> > In the past, we've had a great deal of speculation about that legal > >> > question from people who are not lawyers. Maybe it would be valuable > >> > to get some opinions from people who ARE lawyers. Tom and Heikki both > >> > work for real big companies which, I'm guessing, have substantial > >> > legal departments; perhaps they could pursue getting the algorithms of > >> > possible interest vetted. Or, I could try to find out whether it's > >> > possible do something similar through EnterpriseDB. > >> > >> I personally don't think the legal arguments holds all that much water > >> for snappy and lz4. But then the opinion of a european non-lawyer doesn't > >> hold much either. > >> Both are widely used by a large number open and closed projects, some of > >> which have patent grant clauses in their licenses. E.g. hadoop, > >> cassandra use lz4, and I'd be surprised if the companies behind those > >> have opened themselves to litigation. > >> > >> I think we should preliminarily decide which algorithm to use before we > >> get lawyers involved. I'd surprised if they can make such a analysis > >> faster than we can rule out one of them via benchmarks. > >> > >> Will post an updated patch that includes lz4 as well. > > > > Attached. > > Well, the performance of both snappy and lz4 seems to be significantly > better than pglz. On these tests lz4 has a small edge but that might > not be true on other data sets. From what I've seen of independent benchmarks on more varying datasets and from what I tested (without pg inbetween) lz4 usually has a bigger margin than this, especially on decompression. The implementation also seems to be better prepared to run on more platforms, e.g. it didn't require any fiddling with endian.h in contrast to snappy. But yes, "even" snappy would be a big improvement should lz4 turn out to be problematic and the performance difference isn't big enough to rule one out as I'd hopped. > I still think the main issue is legal > review: are there any license or patent concerns about including > either of these algorithms in PG? If neither of them have issues, we > might need to experiment a little more before picking between them. > If one does and the other does not, well, then it's a short > conversation. True. So, how do we proceed on that? The ASF decided it was safe to use lz4 in cassandra. Does anybody have contacts over there? Btw, I have the feeling we hold this topic to a higher standard wrt patent issues than other work in postgres... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: