Re: Pluggable toaster

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Pluggable toaster
Дата
Msg-id CANbhV-F4Vffu7hWDEK+yBjquq4EQGH3JAJ+K4tjHEopFU0a-kQ@mail.gmail.com
обсуждение исходный текст
Ответ на Pluggable toaster  (Teodor Sigaev <teodor@sigaev.ru>)
Ответы Re: Pluggable toaster  (Nikita Malakhov <hukutoc@gmail.com>)
Re: Pluggable toaster  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Pluggable toaster  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor@sigaev.ru> wrote:

> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
>     - "one toast fits all"  may be not the best solution for particular
>       type or/and use cases
>     - it doesn't know the internal structure of data type, so it  cannot
>       choose an optimal toast strategy
>     - it can't  share common parts between different rows and even
>       versions of rows

Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices

> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it  will not works for  custom data types. Postgres
> is an extensible database,  why not to extent its extensibility even
> further, to have pluggable TOAST! We  propose an idea to separate
> toaster from  heap using  toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.


Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

--
Simon Riggs                http://www.EnterpriseDB.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Robert Haas
Дата:
Сообщение: Re: refactoring basebackup.c