Re: [PATCH] Compression dictionaries for JSONB

Поиск

Список

Период

Сортировка

От	Nikita Malakhov
Тема	Re: [PATCH] Compression dictionaries for JSONB
Дата	7 февраля 2023 г. 09:11:52
Msg-id	CAN-LCVMg6ntnrjWFbHnuWEAMiJa_07+3bgHyaLApJu_igw9Y4w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Compression dictionaries for JSONB (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

Hi,

On updating dictionary -

>You cannot "just" refresh a dictionary used once to compress an
>object, because you need it to decompress the object too.

and when you have many - updating an existing dictionary requires

going through all objects compressed with it in the whole database.

It's a very tricky question how to implement this feature correctly.

Also, there are some thoughts on using JSON schema to optimize

storage for JSON objects.

(That's applied to the TOAST too, so at first glance we've decided

to forbid dropping or changing TOAST implementations already

registered in a particular database.)

In my experience, in modern world, even with fast SSD storage

arrays, with large database (about 40-50 Tb) we had disk access

as a bottleneck more often than CPU, except for the cases with

a lot of parallel execution threads for a single query (Oracle).

On Mon, Feb 6, 2023 at 10:33 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-02-06 16:16:41 +0100, Matthias van de Meent wrote:
> On Mon, 6 Feb 2023 at 15:03, Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi,
> >
> > I see your point regarding the fact that creating dictionaries on a
> > training set is too beneficial to neglect it. Can't argue with this.
> >
> > What puzzles me though is: what prevents us from doing this on a page
> > level as suggested previously?
>
> The complexity of page-level compression is significant, as pages are
> currently a base primitive of our persistency and consistency scheme.

+many

It's also not all a panacea performance-wise, datum-level decompression can
often be deferred much longer than page level decompression. For things like
json[b], you'd hopefully normally have some "pre-filtering" based on proper
columns, before you need to dig into the json datum.

It's also not necessarily that good, compression ratio wise. Particularly for
wider datums you're not going to be able to remove much duplication, because
there's only a handful of tuples. Consider the case of json keys - the
dictionary will often do better than page level compression, because it'll
have the common keys in the dictionary, which means the "full" keys never will
have to appear on a page, whereas page-level compression will have the keys on
it, at least once.

Of course you can use a dictionary for page-level compression too, but the
gains when it works well will often be limited, because in most OLTP usable
page-compression schemes I'm aware of, you can't compress a page all that far
down, because you need a small number of possible "compressed page sizes".

> > More similar data you compress the more space and disk I/O you save.
> > Additionally you don't have to compress/decompress the data every time
> > you access it. Everything that's in shared buffers is uncompressed.
> > Not to mention the fact that you don't care what's in pg_attribute,
> > the fact that schema may change, etc. There is a table and a
> > dictionary for this table that you refresh from time to time. Very
> > simple.
>
> You cannot "just" refresh a dictionary used once to compress an
> object, because you need it to decompress the object too.

Right. That's what I was trying to refer to when mentioning that we might need
to add a bit of additional information to the varlena header for datums
compressed with a dictionary.

Greetings,

Andres Freund

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 07 февраля 2023 г., 08:28:49
Сообщение: Re: Time delayed LR (WAS Re: logical replication restrictions)

Следующее

От: Tom Lane
Дата: 07 февраля 2023 г., 09:12:51
Сообщение: Re: A problem in deconstruct_distribute_oj_quals

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Compression dictionaries for JSONB

Предыдущее

Следующее