Обсуждение: lztext.c
I'm going to commit changes to make lztextlen() aware of multi-byte. While doing the work, I found that no POSITION() or SUBSTRING() for lztext has been implemented in the file. BTW, does anybody work on making lztext indexable? If no, I will take care of it with above addtions. -- Tatsuo Ishii
Tatsuo Ishii wrote:
> I'm going to commit changes to make lztextlen() aware of
> multi-byte. While doing the work, I found that no POSITION() or
> SUBSTRING() for lztext has been implemented in the file.
Thank's for that. I usually don't have multi-byte support
compiled in and it's surely better if you do the extension
and tests.
I know that a lot of functions are missing so far. Especially
comparision and the mentioned ones. I thought to get back on
it after the multi-byte support is inside.
> BTW, does anybody work on making lztext indexable? If no, I will take
> care of it with above addtions.
IMHO something questionable.
A compressed data type is preferred to store large amounts of
data. Indexing large fields OTOH is something to prevent by
database design. The new type at hand offers reasonable
compression rates only above some size of input.
OTOOH, it might get someone around the btree split problems
some of us encountered and which I where able to trigger with
field contents above 2K already. In such a case it can be a
last resort.
I'd like to know what others think.
Don't spend much efford for comparision and the SUBSTRING()
things right now. I already have an additional, generalized
decompressor in mind, that can be used in the comparision for
example to decompress two values on the fly and stop
comparision at the first difference, which usually happens
early in two random datums.
Tell me when you have the multi-byte (and maybe cyrillic?)
stuff committed and I'll take my hands back on the code.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
> Don't spend much efford for comparision and the SUBSTRING() > things right now. I already have an additional, generalized > decompressor in mind, that can be used in the comparision for > example to decompress two values on the fly and stop > comparision at the first difference, which usually happens > early in two random datums. Ok. > Tell me when you have the multi-byte (and maybe cyrillic?) > stuff committed and I'll take my hands back on the code. I have committed the changes just now, though cyrillic support is not included. I vaguely recall the discussion about the usefullness of the cyrillic support. -- Tatsuo Ishii
On Wed, 24 Nov 1999, Tatsuo Ishii wrote: > Date: Wed, 24 Nov 1999 12:52:53 +0900 > From: Tatsuo Ishii <t-ishii@sra.co.jp> > To: Jan Wieck <wieck@debis.com> > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] lztext.c > > > Don't spend much efford for comparision and the SUBSTRING() > > things right now. I already have an additional, generalized > > decompressor in mind, that can be used in the comparision for > > example to decompress two values on the fly and stop > > comparision at the first difference, which usually happens > > early in two random datums. > > Ok. > > > Tell me when you have the multi-byte (and maybe cyrillic?) > > stuff committed and I'll take my hands back on the code. > > I have committed the changes just now, though cyrillic support is not > included. I vaguely recall the discussion about the usefullness of > the cyrillic support. If you mean --recode you-re right. > -- > Tatsuo Ishii > > > ************ > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Tatsuo Ishii wrote:
> > Don't spend much efford for comparision and the SUBSTRING()
> > things right now. I already have an additional, generalized
> > decompressor in mind, that can be used in the comparision for
> > example to decompress two values on the fly and stop
> > comparision at the first difference, which usually happens
> > early in two random datums.
>
> Ok.
>
> > Tell me when you have the multi-byte (and maybe cyrillic?)
> > stuff committed and I'll take my hands back on the code.
>
> I have committed the changes just now, though cyrillic support is not
> included. I vaguely recall the discussion about the usefullness of
> the cyrillic support.
I added the comparision functions, operators and the default
nbtree operator class for indexing.
For the SUBSTR() and STRPOS(), I just checked the current
setup and it automatically casts an lztext argument in these
functions to text. I assume lztext can now be used in every
place where text is allowed. Is it really worth to blow up
the catalogs with rarely used functions that only gain some
saved decompressed portion?
Remember, the algorithm is optimized for decompression speed.
It might save some time to do this for a comparision function
used inside of index scans or btree operations, where it's
likely to hit a difference early. But for something like
STRPOS(), using the default cast and changing the STRPOS()
match search itself into a KMP algorithm (instead of walking
through the text and comparing each position against the
pattern using strncmp) would outperform it in any case. With
the byte by byte strncmp() method, we definitely implemented
the slowest and best readable possibility.
I think we should better spend our time in adding a lzbpchar
type. Or work on compressed tables and tuple split to blow
away the size limits at all.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #