Re: RFC: compression dictionaries for JSONB

Поиск
Список
Период
Сортировка
От Aleksander Alekseev
Тема Re: RFC: compression dictionaries for JSONB
Дата
Msg-id CAJ7c6TM7z=cBbD8F76E3CnjxTgOoQGFmzPovs0hFjTPn4BO3+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RFC: compression dictionaries for JSONB  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: RFC: compression dictionaries for JSONB  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
Hi Matthias,

> Assuming this above is option 1. If I understand correctly, this
> option was 'adapt the data type so that it understands how to handle a
> shared dictionary, decreasing storage requirements'.
> [...]
> Assuming this was the 2nd option. If I understand correctly, this
> option is effectively 'adapt or wrap TOAST to understand and handle
> dictionaries for dictionary encoding common values'.

Yes, exactly.

> I think that an 'universal dictionary encoder' would be useful, but
> that a data type might also have good reason to implement their
> replacement methods by themselves for better overall performance (such
> as maintaining partial detoast support in dictionaried items, or
> overall lower memory footprint, or ...). As such, I'd really
> appreciate it if Option 1 is not ruled out by any implementation of
> Option 2.

I agree, having the benefits of two approaches in one feature would be
great. However, I'm having some difficulties imagining how the
implementation would look like in light of the pros and cons stated
above. I could use some help here.

One approach I can think of is introducing a new entity, let's call it
"dictionary compression method". The idea is similar to access methods
and tableam's. There is a set of callbacks the dictionary compression
method should implement, some are mandatory, some can be set to NULL.
Users can specify the compression method for the dictionary:

```
CREATE TYPE name AS DICTIONARY OF JSONB (
  compression_method = 'jsonb_best_compression'
  -- compression_methods = 'jsonb_fastest_partial_decompression'
  -- if not specified, some default compression method is used
);
```

JSONB is maybe not the best example of the type for which people may
need multiple compression methods in practice. But I can imagine how
overwriting a compression method for, let's say, arrays in an
extension could be beneficial depending on the application.

This approach will make an API well-defined and, more importantly,
extendable. In the future, we could add additional (optional) methods
for particular scenarios, like partial decompression.

Does it sound like a reasonable approach?

-- 
Best regards,
Aleksander Alekseev



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?gr
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: postgres_fdw: misplaced? comments in connection.c