Re: Zedstore - compressed in-core columnar storage

Поиск
Список
Период
Сортировка
От Ashwin Agrawal
Тема Re: Zedstore - compressed in-core columnar storage
Дата
Msg-id CALfoeiuc_FFXO00qZgEVsahw4_R5OKNWbc4+VWMxUfWm=iSeMg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Zedstore - compressed in-core columnar storage  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Zedstore - compressed in-core columnar storage  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers

On Sun, Apr 14, 2019 at 9:40 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Thu, Apr 11, 2019 at 06:20:47PM +0300, Heikki Linnakangas wrote:
>On 11/04/2019 17:54, Tom Lane wrote:
>>Ashwin Agrawal <aagrawal@pivotal.io> writes:
>>>Thank you for trying it out. Yes, noticed for certain patterns pg_lzcompress() actually requires much larger output buffers. Like for one 86 len source it required 2296 len output buffer. Current zedstore code doesn’t handle this case and errors out. LZ4 for same patterns works fine, would highly recommend using LZ4 only, as anyways speed is very fast as well with it.
>>
>>You realize of course that *every* compression method has some inputs that
>>it makes bigger.  If your code assumes that compression always produces a
>>smaller string, that's a bug in your code, not the compression algorithm.
>
>Of course. The code is not making that assumption, although clearly
>there is a bug there somewhere because it throws that error. It's
>early days..
>
>In practice it's easy to weasel out of that, by storing the data
>uncompressed, if compression would make it longer. Then you need an
>extra flag somewhere to indicate whether it's compressed or not. It
>doesn't break the theoretical limit because the actual stored length
>is then original length + 1 bit, but it's usually not hard to find a
>place for one extra bit.
>

Don't we already have that flag, though? I see ZSCompressedBtreeItem has
t_flags, and there's ZSBT_COMPRESSED, but maybe it's more complicated.

The flag ZSBT_COMPRESSED differentiates between container (compressed) item and plain (uncompressed item). Current code is writtten such that within container (compressed) item, all the data is compressed. If need exists to store some part of uncompressed data inside container item, then this additional flag would be required to indicate the same. Hence its different than ZSBT_COMPRESSED. I am thinking one of the ways could be to just not store this datum in container item if can't be compressed and just store it as plain item with uncompressed data, this additional flag won't be required. Will know more once write code for this.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Multivariate MCV lists -- pg_mcv_list_items() seems to be broken
Следующее
От: "Daniel Verite"
Дата:
Сообщение: Re: Cleanup/remove/update references to OID column