Re: [PATCH] Compression dictionaries for JSONB

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [PATCH] Compression dictionaries for JSONB
Дата
Msg-id 20230206193328.zl5bzx54y4uc4nu3@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: [PATCH] Compression dictionaries for JSONB  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: [PATCH] Compression dictionaries for JSONB  (Nikita Malakhov <hukutoc@gmail.com>)
Re: [PATCH] Compression dictionaries for JSONB  (Aleksander Alekseev <aleksander@timescale.com>)
Список pgsql-hackers
Hi,

On 2023-02-06 16:16:41 +0100, Matthias van de Meent wrote:
> On Mon, 6 Feb 2023 at 15:03, Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi,
> >
> > I see your point regarding the fact that creating dictionaries on a
> > training set is too beneficial to neglect it. Can't argue with this.
> >
> > What puzzles me though is: what prevents us from doing this on a page
> > level as suggested previously?
> 
> The complexity of page-level compression is significant, as pages are
> currently a base primitive of our persistency and consistency scheme.

+many

It's also not all a panacea performance-wise, datum-level decompression can
often be deferred much longer than page level decompression. For things like
json[b], you'd hopefully normally have some "pre-filtering" based on proper
columns, before you need to dig into the json datum.

It's also not necessarily that good, compression ratio wise. Particularly for
wider datums you're not going to be able to remove much duplication, because
there's only a handful of tuples. Consider the case of json keys - the
dictionary will often do better than page level compression, because it'll
have the common keys in the dictionary, which means the "full" keys never will
have to appear on a page, whereas page-level compression will have the keys on
it, at least once.

Of course you can use a dictionary for page-level compression too, but the
gains when it works well will often be limited, because in most OLTP usable
page-compression schemes I'm aware of, you can't compress a page all that far
down, because you need a small number of possible "compressed page sizes".


> > More similar data you compress the more space and disk I/O you save.
> > Additionally you don't have to compress/decompress the data every time
> > you access it. Everything that's in shared buffers is uncompressed.
> > Not to mention the fact that you don't care what's in pg_attribute,
> > the fact that schema may change, etc. There is a table and a
> > dictionary for this table that you refresh from time to time. Very
> > simple.
> 
> You cannot "just" refresh a dictionary used once to compress an
> object, because you need it to decompress the object too.

Right. That's what I was trying to refer to when mentioning that we might need
to add a bit of additional information to the varlena header for datums
compressed with a dictionary.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Pluggable toaster
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Non-superuser subscription owners