Обсуждение: LZ compressing data type
Hi,
I just committed some changes that require an initdb.
New are the discussed, simple LZ compressor, placed into
/utils/adt/pg_compress.c, and a new lztext data type based on
it. You'll find a fairly detailed description of the
compression algorithm in the comments at the top of
pg_lzcompress.c.
Not very surprisingly to me it turns out, that the compressor
does a very good job on rule action strings. I used the 48
rules that can be found in pg_rewrite after the regression
test. The original string sizes range from 820 to 4615 and
the compression rates from 35-76% with an average of 60%. The
4615 size rule action has been coded into a 1126
octet_length.
For the lztext type, there are conversion functions to/from
text and the length() and octet_length() functions available.
Length() returns the same as length on text would. While
octet_length returns the compressed size without VARHDRSZ.
The type does not support MULTIBYTE or CYR_ENCODE up to now.
It shouldn't be too hard to add it and after that, we might
add another lzbpchar type too. The latter is really
interesting, because an empty char(200) (thus containing 200
spaces) could result in an octet_length of 12 instead of 204
- that's a compression rate of 94.1%! It actually wouldn't,
because the compressors default is to start only if the input
is at least 256 bytes, but there is a mechanism so a lzbpchar
type could force this behaviour.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
>
> Hi,
>
> I just committed some changes that require an initdb.
>
> New are the discussed, simple LZ compressor, placed into
> /utils/adt/pg_compress.c, and a new lztext data type based on
> it. You'll find a fairly detailed description of the
> compression algorithm in the comments at the top of
> pg_lzcompress.c.
One question.
You say this is an LZ algorythm. Is this the same as LZW, as in the
same kind that gifs use. If so, are we sure that this algorythm is not
covered by the Unisys patent? Their patent is on the algorythm for
compression not on gifs themselves.
If it is, you are liable for a $5,000 fee to use it throughout your
site, or a per-licence fee if you are distributing (thay are worked out
on a per case basis but typical licences are $5 per unit sold from what
I am told).
I came across this problem with a gif manipulation program that *I
WROTE FROM SCRATCH* and had to switch to using the libungif
compression routines.
Just thought Id mention this, in case it has been overlooked.
~Michael
>
> >
> > Hi,
> >
> > I just committed some changes that require an initdb.
> >
> > New are the discussed, simple LZ compressor, placed into
> > /utils/adt/pg_compress.c, and a new lztext data type based on
> > it. You'll find a fairly detailed description of the
> > compression algorithm in the comments at the top of
> > pg_lzcompress.c.
>
> One question.
>
> You say this is an LZ algorythm. Is this the same as LZW, as in the
> same kind that gifs use. If so, are we sure that this algorythm is not
> covered by the Unisys patent? Their patent is on the algorythm for
> compression not on gifs themselves.
>
> If it is, you are liable for a $5,000 fee to use it throughout your
> site, or a per-licence fee if you are distributing (thay are worked out
> on a per case basis but typical licences are $5 per unit sold from what
> I am told).
It't an SLZ algorithm, not LZW. There are FLZ and LZ77 out
too (and I don't know how many other subtypes). At least, LZ
is a family of compression algorithms where LZW is just a
member of them.
I've written the entire code from scatch, inspired by an
article from Adisak Pochanayon dated back in 1993. If they
have a license on this algorithm, they have one on code that
can be coded from scatch in 20 hours after reading
information that's claimed to be free on the internet,
congrats.
This is M$ practice, I never thought that there's more than
one company out doing this kind of business.
But thanks for the info.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
> > If it is, you are liable for a $5,000 fee to use it throughout your
> > site, or a per-licence fee if you are distributing (thay are worked out
> > on a per case basis but typical licences are $5 per unit sold from what
> > I am told).
>
> It't an SLZ algorithm, not LZW. There are FLZ and LZ77 out
> too (and I don't know how many other subtypes). At least, LZ
> is a family of compression algorithms where LZW is just a
> member of them.
>
> [...]
>
> But thanks for the info.
From the very beginning of US patent No. 4,558,302 (the thing
behind Unisys's LZW):
The compressor searches the input stream to determine the
longest match to a stored string. Each stored string
comprises a prefix string and an extension character
where the extension character is the last character in
the string and the prefix string comprises all but the
extension character. Each string has a code signal
associated therewith and a string is stored in the string
table by, at least implicitly, storing the code signal
for the string, the code signal for the string prefix and
the extension character.
The format my code stores is different to this definition.
It only stores offset/length pairs or literal bytes, signaled
by a bit in a control unit. So there is no prefix string
and/or extension character at all.
If someone might state that storing TAG's and LITERAL chars
is still equivalent to PREFIX and EXTension character, I say
that my code possibly output's sequences of immediately
following TAG's, that aren't neccessarily PREFIX'es. One TAG
could code a sequence of characters, not previously occured
in any other TAG, without any intermediate LITERAL character
occured.
And my code forgets AUTOMATICALLY about too far behind
occurences of matching strings for speed AND compression rate
improvement.
Thus I think my implementation is not related to this patent.
When reading patent #4,558,302 I really remembered the
question, if anyone could ever have a patent on rectangle
front lights for cars. The definition is so general, that I
cannot think that gzip, bzip or any other existing
compression tool doesn't use it's METHOD. If such a GENERAL
patent is possible at all, car manufacturers all over the
world would have to pay millions of dollars license fee's
just to sell cars with rectangular lights.
I don't see any relationship between my code and what Unisys
has a patent for. If they see, I might be blind - or they see
a horizon from inside a black hole. If they need to, I'll
enlighten them.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
At 03:23 18/11/99 +0100, Jan Wieck wrote: > > From the very beginning of US patent No. 4,558,302 (the thing > behind Unisys's LZW): > The other thing to remember is that there are a very large number of countries in which this patent does not apply because it was not even applied for until after the alogorithm was made public. In the first publication of the LZW algorithm, there was no patent notice, but I think the US patent had been applied for. For US-based distribution sites, however, you may find the threat of legal action from a large company is enough to make it less desirable to distribute. FWIW, I think Unisys patented the LZW algorithm only. It's probably a bit late to ask, but how difficult would it be to generalize your code to use the compressor/encryption library of choice? zlib and pgp spring to mind. This would kill three birds with one stone. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
> FWIW, I think Unisys patented the LZW algorithm only.
More specifically, Unisys patented Terry Welch's variant of the LZ78
class of algorithms. Jan's code falls in the substantially different
LZ77 class. There could be problematic patents out there, but Unisys'
is certainly not one.
(If you don't know the difference between LZ77 and LZ78, see the
comp.compression FAQ ... but don't raise alarms about compression
patents if you don't know even that much about the field, eh?)
regards, tom lane
At 23:28 17/11/99 -0500, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> FWIW, I think Unisys patented the LZW algorithm only. > >More specifically, Unisys patented Terry Welch's variant of the LZ78 >class of algorithms. Hence the 'W' in LZW, no? >Jan's code falls in the substantially different >LZ77 class. There could be problematic patents out there, but Unisys' >is certainly not one. > Is this a legal opinion, or a personal one? >(If you don't know the difference between LZ77 and LZ78, see the >comp.compression FAQ ... but don't raise alarms about compression >patents if you don't know even that much about the field, eh?) 1. I did not raise the alarm, I just responded to it. 2. Since Unisys have moved to block code that does not even use LZW compression, but happens to be compatible with most GIF decompressors, I think your comment is misinformed, eh? (See the GD graphics library) The fear is not whether one wins court cases, but whether you can afford to fight them. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes:
> At 23:28 17/11/99 -0500, Tom Lane wrote:
>> Jan's code falls in the substantially different
>> LZ77 class. There could be problematic patents out there, but Unisys'
>> is certainly not one.
> Is this a legal opinion, or a personal one?
I'm not a lawyer, but I believe I qualify as an expert witness when
it comes to compression questions. It is not a matter of opinion
whether Jan's code is LZ77 or LZ78, nor is it a matter of opinion
which class Unisys has claims on a small piece of.
> The fear is not whether one wins court cases, but whether you can afford to
> fight them.
There are patents related to databases. Shall we therefore shut down
Postgres development and run screaming for the hills? If we let
ourselves be intimidated by irrelevant patents, the Microsofts and
Unisyses will win without a fight.
regards, tom lane
Tom Lane wrote:
> There are patents related to databases. Shall we therefore shut down
> Postgres development and run screaming for the hills? If we let
> ourselves be intimidated by irrelevant patents, the Microsofts and
> Unisyses will win without a fight.
It's definitely a LZ77 coder, and this technique is used in
many other compressors too (lha, arj, zlib, gzip, ...). So I
don't think we have to fear anything.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #