Re: [PATCH] Compression dictionaries for JSONB

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: [PATCH] Compression dictionaries for JSONB
Дата	23 июня 2022 г. 18:48:48
Msg-id	CANbhV-HpCF852WcZuU0wyh1jMU4p6XLbV6rCRkZpnpeKQ9OenQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Compression dictionaries for JSONB (Aleksander Alekseev <aleksander@timescale.com>)
Ответы	Re: [PATCH] Compression dictionaries for JSONB (Aleksander Alekseev <aleksander@timescale.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, 2 Jun 2022 at 14:30, Aleksander Alekseev
<aleksander@timescale.com> wrote:

> > I saw there was some previous discussion about dictionary size. It
> > looks like this approach would put all dictionaries into a shared OID
> > pool. Since I don't know what a "standard" use case is, is there any
> > risk of OID exhaustion for larger deployments with many dictionaries?
> > Or is 2**32 so comparatively large that it's not really a serious
> > concern?
>
> I agree, this is a drawback of the current implementation. To be honest,
> I simply followed the example of how ENUMs are implemented. I'm not 100% sure
> if we should be worried here (apparently, freed OIDs are reused). I'm OK with
> using a separate sequence if someone could second this. This is the first time
> I'm altering the catalog so I'm not certain what the best practices are.

The goal of this patch is great, thank you for working on this (and ZSON).

The approach chosen has a few downsides that I'm not happy with yet.

* Assigning OIDs for each dictionary entry is not a great idea. I
don't see why you would need to do that; just assign monotonically
ascending keys for each dictionary, as we do for AttrNums.

* There is a limit on SQL statement size, which will effectively limit
the size of dictionaries, but the examples are unrealistically small,
so this isn't clear as a limitation, but it would be in practice. It
would be better to specify a filename, which can be read in when the
DDL executes. This can be put into pg_dump output in a similar way to
the COPY data for a table is, so once read in it stays static.

* The dictionaries are only allowed for certain datatypes. This should
not be specifically limited by this patch, i.e. user defined types
should not be rejected.

* Dictionaries have no versioning. Any list of data items changes over
time, so how do we express that? Enums were also invented as static
lists originally, then had to be modified later to accomodate
additions and revisions, so let's think about that now, even if we
don't add all of the commands in one go. Currently we would have to
create a whole new dictionary if even one word changes. Ideally, we
want the dictionary to have a top-level name and then have multiple
versions over time. Let's agree how we are going do these things, so
we can make sure the design and code allows for those future
enhancements.
i.e. how will we do ALTER TABLE ... UPGRADE DICTIONARY without causing
a table rewrite?

* Does the order of entries in the dictionary allow us to express a
priority? i.e. to allow Huffman coding.

Thanks for your efforts - this is a very important patch.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 23 июня 2022 г., 18:19:45
Сообщение: Re: allow specifying action when standby encounters incompatible parameter settings

Следующее

От: Simon Riggs
Дата: 23 июня 2022 г., 20:13:26
Сообщение: Re: O(n) tasks cause lengthy startups and checkpoints

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Compression dictionaries for JSONB

Предыдущее

Следующее