Обсуждение: Pluggable toaster

Поиск

Список

Период

Сортировка

Pluggable toaster

От

Teodor Sigaev

Дата:

30 декабря 2021 г., 19:40:09

Hi!

We are working on custom toaster for JSONB [1], because current TOAST is
universal for any data type and because of that it has some disadvantages:
- "one toast fits all" may be not the best solution for particular
type or/and use cases
- it doesn't know the internal structure of data type, so it cannot
choose an optimal toast strategy
- it can't share common parts between different rows and even
versions of rows

Modification of current toaster for all tasks and cases looks too
complex, moreover, it will not works for custom data types. Postgres
is an extensible database, why not to extent its extensibility even
further, to have pluggable TOAST! We propose an idea to separate
toaster from heap using toaster API similar to table AM API etc.
Following patches are applicable over patch in [1]

1) 1_toaster_interface_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_interface
Introduces syntax for storage and formal toaster API. Adds column
atttoaster to pg_attribute, by design this column should not be equal to
invalid oid for any toastable datatype, ie it must have correct oid for
any type (not column) with non-plain storage. Since toaster may support
only particular datatype, core should check correctness of toaster set
by toaster validate method. New commands could be found in
src/test/regress/sql/toaster.sql

On-disk toast pointer structure now has one more possible struct -
varatt_custom with fixed header and variable tail which uses as a
storage for custom toasters. Format of built-in toaster is kept to allow
simple pg_upgrade logic.

Since toaster for column could be changed during table's lifetime we had
two options about toaster's drop operation:
- if column's toaster has been changed, then we need to re-toast all
values, which could be extremely expensive. In any case,
functions/operators should be ready to work with values toasted by
different toasters, although any toaster should execute simple
toast/detoast operation, which allows any existing code to
work with the new approach. Tracking dependency of toasters and
rows looks as bad idea.
- disallow drop toaster. We don't believe that there will be many
toasters at the same time (number of AM isn't very high too and
we don't believe that it will be changed significantly in the near
future), so prohibition of dropping of toaster looks reasonable.
In this patch set we choose second option.

Toaster API includes get_vtable method, which is planned to access the
custom toaster features which isn't covered by this API. The idea is,
that toaster returns some structure with some values and/or pointers to
toaster's methods and caller could use it for particular purposes, see
patch 4). Kind of structure identified by magic number, which should be
a first field in this structure.

Also added contrib/dummy_toaster to simplify checking.

psql/pg_dump are modified to support toaster object concept.

2) 2_toaster_default_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring) uisng toaster API
as generic (or default) toaster. dummy_toaster here is a minimal
workable example, it saves value directly in toast pointer and fails if
value is greater than 1kb.

3) 3_toaster_snapshot_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows

4) 4_bytea_appendable_toaster_v1.patch.gz
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value. Appended tail
stored directly in toaster pointer, if there is enough place to do it.

Note: patch modifies byteacat() to support contrib toaster. Seems, it's
looks ugly and contrib module should create new concatenation function.

We are open for any questions, discussions, objections and advices.
Thank you.

Peoples behind:
Oleg Bartunov
Nikita Gluhov
Nikita Malakhov
Teodor Sigaev

[1]
https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru
<https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru>

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Вложения

Re: Pluggable toaster

От

Simon Riggs

Дата:

05 января 2022 г., 17:45:56

On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor@sigaev.ru> wrote:

> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
>     - "one toast fits all"  may be not the best solution for particular
>       type or/and use cases
>     - it doesn't know the internal structure of data type, so it  cannot
>       choose an optimal toast strategy
>     - it can't  share common parts between different rows and even
>       versions of rows

Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices

> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it  will not works for  custom data types. Postgres
> is an extensible database,  why not to extent its extensibility even
> further, to have pluggable TOAST! We  propose an idea to separate
> toaster from  heap using  toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.

Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

--
Simon Riggs                http://www.EnterpriseDB.com/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 января 2022 г., 15:25:53

Hi all!

Simon, thank you for your review.
I'll try to give a brief explanation on some topics you've mentioned.
My colleagues would correct me if I miss the point and provide some more details.

>Agreed, Oleg has made some very clear analysis of the value of having
>a higher degree of control over toasting from within the datatype.
Currently we see the biggest flaw in TOAST functionality is that it
does not provide any means for extension and modification except
modifying the core code itself. It is not possible to use any other
TOAST strategy except existing in the core, the same issue is with
assigning different TOAST methods to columns and datatypes.
The main point in this patch is actually to provide an open API and
syntax for creation of new Toasters as pluggable extensions, and
to make an existing (default) toaster to work via this API without
affecting its function. Also, the default toaster is strongly cross-tied
with Heap access, with somewhat unclear code relations (headers,
function locations and calls, etc.) that are not quite good logically
structured and ask to be straightened out.

>In my understanding, we want to be able to
>1. Access data from a toasted object one slice at a time, by using
>knowledge of the structure
>2. If toasted data is updated, then update a minimum number of
>slices(s), without rewriting the existing slices
>3. If toasted data is expanded, then allownew slices to be appended to
>the object without rewriting the existing slices
There are two main ideas behind Pluggable Toaster patch -
First - to provide an extensible API for all Postgres developers, to
be able to develop and plug in custom toasters as independent
extensions for different data types and columns, to use different
toast strategies, access and compression methods, and so on;
Second - to refactor current Toast functionality, to improve Toast
code structure and make it more logically structured and
understandable, to 'detach' default ('generic', as it is currently
named, or maybe the best naming for it to be 'heap') toaster from DBMS
core code, route it through new API and hide all existing internal
specific Toast functionality behind new API.

All the points you mentioned are made available for development by
this patch (and, actually, some are being developed - in the
bytea_appendable_toaster part of this patch or jSONb toaster by Nikita
Glukhov, he could provide much better explanation on this topic).

>> Modification of current toaster for all tasks and cases looks too
>> complex, moreover, it will not works for custom data types. Postgres
>> is an extensible database, why not to extent its extensibility even
>> further, to have pluggable TOAST! We propose an idea to separate
>> toaster from heap using toaster API similar to table AM API etc.
>> Following patches are applicable over patch in [1]

>ISTM that we would want the toast algorithm to be associated with the
>datatype, not the column?
>Can you explain your thinking?
This possibility is considered for future development.

>We already have Expanded toast format, in-memory, which was designed
>specifically to allow us to access sub-structure of the datatype
>in-memory. So I was expecting to see an Expanded, on-disk, toast
>format that roughly matched that concept, since Tom has already shown
>us the way. (varatt_expanded). This would be usable by both JSON and
>PostGIS.
The main disadvantage is that it does not suppose either usage of any
other toasting strategies, or compressions methods except plgz and
lz4.

>Some other thoughts:

>I imagine the data type might want to keep some kind of dictionary
>inside the main toast pointer, so we could make allowance for some
>optional datatype-specific private area in the toast pointer itself,
>allowing a mix of inline and out-of-line data, and/or a table of
>contents to the slices.
It is partly implemented in jSONb custom Toaster, as I mentioned
above, and also could be considered for future improvement of existing
Toaster as an extension.

>I'm thinking could also tackle these things at the same time:
>* We want to expand TOAST to 64-bit pointers, so we can have more
>pointers in a table
This issue is being discussed but not currently implemented, it was
considered as one of the possible future improvements.

>* We want to avoid putting the data length into the toast pointer, so
>we can allow the toasted data to be expanded without rewriting
>everything (to avoid O(N^2) cost)
May I correct you - actual relation is O(N), not O(N^2).
Currently data length is stored outside customized toaster data, in
the varatt_custom structure that is supposed to be used by all custom
(extended) toasters. Data that is specific to some custom Toasted will
be stored inside va_toasterdata structure.

Looking forward to your thoughts on our work.

--
Best regards,
Nikita A. Malakhov

On Wed, Jan 5, 2022 at 5:46 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor@sigaev.ru> wrote:

> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows

Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices

> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it will not works for custom data types. Postgres
> is an extensible database, why not to extent its extensibility even
> further, to have pluggable TOAST! We propose an idea to separate
> toaster from heap using toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.

Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

--
Simon Riggs http://www.EnterpriseDB.com/

Re: Pluggable toaster

От

Teodor Sigaev

Дата:

14 января 2022 г., 21:41:55

> In my understanding, we want to be able to
> 1. Access data from a toasted object one slice at a time, by using
> knowledge of the structure
> 2. If toasted data is updated, then update a minimum number of
> slices(s), without rewriting the existing slices
> 3. If toasted data is expanded, then allownew slices to be appended to
> the object without rewriting the existing slices

There are more options:
1 share common parts between not only versions of row but between all 
rows in a column. Seems strange but examples:
   - urls often have a common prefix and so storing in a prefix tree (as
     SP-GiST does) allows significantly decrease storage size
   - the same for json - it's often use case with common part of its
     hierarchical structure
   - one more usecase for json. If json use only a few schemes
     (structure) it's possible to store in toast storage only values and
     don't store keys and structure
2 Current toast storage stores chunks in heap accesses method and to 
provide fast access by toast id it makes an index. Ideas:
   - store chunks directly in btree tree, pgsql's btree already has an
     INCLUDE columns, so, chunks and visibility data will be stored only
     in leaf pages. Obviously it reduces number of disk's access for
     "untoasting".
   - use another access method for chunk storage

> ISTM that we would want the toast algorithm to be associated with the
> datatype, not the column?
> Can you explain your thinking?
Hm. I'll try to explain my motivation.
1) Datatype could have more than one suitable toasters. For different
    usecases: fast retrieving, compact storage, fast update etc. As I
    told   above, for jsonb there are several optimal strategies for
    toasting:   for values with a few different structures, for close to
    hierarchical structures,  for values with different parts by access
    mode (easy to imagine json with some keys used for search and some
    keys only for   output to user)
2) Toaster could be designed to work with different data type. Suggested
    appendable toaster is designed to work with bytea but could work with
    text

Looking on this point I have doubts where to store connection between 
toaster and datatype. If we add toasteroid to pg_type how to deal with 
several toaster for one datatype? (And we could want to has different 
toaster on one table!) If we add typoid to pg_toaster then how it will 
work with several datatypes? An idea to add a new many-to-many 
connection table seems workable but here there are another questions, 
such as will any toaster work with any table access method?

To resolve this bundle of question we propose validate() method of 
toaster, which should be called during DDL operation, i.e. toaster is 
assigned to column or column's datatype is changed.

More thought:
Now postgres has two options for column: storage and compression and now 
we add toaster. For me it seems too redundantly. Seems, storage should 
be binary value: inplace (plain as now) and toastable. All other 
variation such as toast limit, compression enabling, compression kind 
should be an per-column option for toaster (that's why we suggest valid 
toaster oid for any column with varlena/toastable datatype). It looks 
like a good abstraction but we will have a problem with backward 
compatibility and I'm afraid I can't implement it very fast.



> 
> We already have Expanded toast format, in-memory, which was designed
> specifically to allow us to access sub-structure of the datatype
> in-memory. So I was expecting to see an Expanded, on-disk, toast
> format that roughly matched that concept, since Tom has already shown
> us the way. (varatt_expanded). This would be usable by both JSON and
> PostGIS.
Hm, I don't understand. varatt_custom has variable-length tail which 
toaster could use it by any way, appandable toaster use it to store 
appended tail.

> 
> 
> Some other thoughts:
> 
> I imagine the data type might want to keep some kind of dictionary
> inside the main toast pointer, so we could make allowance for some
> optional datatype-specific private area in the toast pointer itself,
> allowing a mix of inline and out-of-line data, and/or a table of
> contents to the slices.
> 
> I'm thinking could also tackle these things at the same time:
> * We want to expand TOAST to 64-bit pointers, so we can have more
> pointers in a table
> * We want to avoid putting the data length into the toast pointer, so
> we can allow the toasted data to be expanded without rewriting
> everything (to avoid O(N^2) cost)
Right

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/

Re: Pluggable toaster

От

Teodor Sigaev

Дата:

14 января 2022 г., 21:47:04

I'd like to ask your opinion for next small questions

1) May be, we could add toasteroid to pg_type to for default toaster for 
datatype.  ALTER TYPE type SET DEFAULT TOASTER toaster;

2) The name of default toaster is deftoaster, which was choosen at 
night, may be heap_toaster is better? heap because default toaster 
stores chunks in heap table.

Thank you!

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/

Re: Pluggable toaster

От

Tomas Vondra

Дата:

17 января 2022 г., 22:23:44


On 1/14/22 19:41, Teodor Sigaev wrote:
> 
>> In my understanding, we want to be able to
>> 1. Access data from a toasted object one slice at a time, by using
>> knowledge of the structure
>> 2. If toasted data is updated, then update a minimum number of
>> slices(s), without rewriting the existing slices
>> 3. If toasted data is expanded, then allownew slices to be appended to
>> the object without rewriting the existing slices
> 
> There are more options:
> 1 share common parts between not only versions of row but between all 
> rows in a column. Seems strange but examples:
>    - urls often have a common prefix and so storing in a prefix tree (as
>      SP-GiST does) allows significantly decrease storage size
>    - the same for json - it's often use case with common part of its
>      hierarchical structure
>    - one more usecase for json. If json use only a few schemes
>      (structure) it's possible to store in toast storage only values and
>      don't store keys and structure

This sounds interesting, but very much like column compression, which 
was proposed some time ago. If we haven't made much progrees with that 
patch (AFAICS), what's the likelihood we'll succeed here, when it's 
combined with yet more complexity?

Maybe doing that kind of compression in TOAST is somehow simpler, but I 
don't see it.

> 2 Current toast storage stores chunks in heap accesses method and to 
> provide fast access by toast id it makes an index. Ideas:
>    - store chunks directly in btree tree, pgsql's btree already has an
>      INCLUDE columns, so, chunks and visibility data will be stored only
>      in leaf pages. Obviously it reduces number of disk's access for
>      "untoasting".
>    - use another access method for chunk storage
> 

Maybe, but that probably requires more thought - e.g. btree requires the 
values to be less than 1/3 page, so I wonder how would that play with 
toasting of values.

>> ISTM that we would want the toast algorithm to be associated with the
>> datatype, not the column?
>> Can you explain your thinking?
> Hm. I'll try to explain my motivation.
> 1) Datatype could have more than one suitable toasters. For different
>     usecases: fast retrieving, compact storage, fast update etc. As I
>     told   above, for jsonb there are several optimal strategies for
>     toasting:   for values with a few different structures, for close to
>     hierarchical structures,  for values with different parts by access
>     mode (easy to imagine json with some keys used for search and some
>     keys only for   output to user)
> 2) Toaster could be designed to work with different data type. Suggested
>     appendable toaster is designed to work with bytea but could work with
>     text
> 
> Looking on this point I have doubts where to store connection between 
> toaster and datatype. If we add toasteroid to pg_type how to deal with 
> several toaster for one datatype? (And we could want to has different 
> toaster on one table!) If we add typoid to pg_toaster then how it will 
> work with several datatypes? An idea to add a new many-to-many 
> connection table seems workable but here there are another questions, 
> such as will any toaster work with any table access method?
> 
> To resolve this bundle of question we propose validate() method of 
> toaster, which should be called during DDL operation, i.e. toaster is 
> assigned to column or column's datatype is changed.
> 

Seems you'd need a mapping table, to allow M:N mapping between types and 
toasters, linking it to all "compatible" types. It's not clear to me how 
would this work with custom data types, domains etc.

Also, what happens to existing values when you change the toaster? What 
if the toasters don't use the same access method to store the chunks 
(heap vs. btree)? And so on.

> More thought:
> Now postgres has two options for column: storage and compression and now 
> we add toaster. For me it seems too redundantly. Seems, storage should 
> be binary value: inplace (plain as now) and toastable. All other 
> variation such as toast limit, compression enabling, compression kind 
> should be an per-column option for toaster (that's why we suggest valid 
> toaster oid for any column with varlena/toastable datatype). It looks 
> like a good abstraction but we will have a problem with backward 
> compatibility and I'm afraid I can't implement it very fast.
> 

So you suggest we move all of this to toaster? I'd say -1 to that, 
because it makes it much harder to e.g. add custom compression method, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

18 января 2022 г., 01:25:56

Hi,

>This sounds interesting, but very much like column compression, which
>was proposed some time ago. If we haven't made much progrees with that
>patch (AFAICS), what's the likelihood we'll succeed here, when it's
>combined with yet more complexity?
The main concern is that this patch provides open API for Toast functionality

and new Toasters could be written as extensions and plugged in instead of

default one, for any column (or datatype). It was not possible before, because

Toast functionality was part of the Postgres core code and was not meant to

be modified.

>Maybe doing that kind of compression in TOAST is somehow simpler, but I
>don't see it.

[Custom ]Toaster extension itself is not restricted to only toast data, it could be

used for compression, encryption, etc, just name it - any case when data meant

to be transformed in some complicated way before being stored and (possibly)

transformed backwards while being selected from the table, along with some

simpler but not so obvious transformations like removing common parts shared

by all data in column before storing it and restoring column value to full during

selection.

>Seems you'd need a mapping table, to allow M:N mapping between types and
>toasters, linking it to all "compatible" types. It's not clear to me how
>would this work with custom data types, domains etc.
Any suitable [custom] Toaster could be plugged in for any table column,

or [duscussible] for datatype and assigned by default to the according column

for any table using this datatype.

>Also, what happens to existing values when you change the toaster? What
>if the toasters don't use the same access method to store the chunks
>(heap vs. btree)? And so on.

For newer data there is no problem - it will be toasted by the newly assigned Toaster.

For detoasting - the Toaster ID is stored in Toast pointer and in principle all data

could be detoasted by the according toaster if it is available. But this is the topic

for discussion and we are open for proposals, because there are possible cases

where older Toaster is not available - the older used Toaster extension is not installed

at all or was uninstalled, it was upgraded to the newer version. Currently we see

two ways of handling this case - to restrict changing the toaster, and to re-toast

all toasted data which could be very heavy if Toaster is assigned to a widely used

datatype, and we're looking forward to any ideas.

>So you suggest we move all of this to toaster? I'd say -1 to that,
>because it makes it much harder to e.g. add custom compression method, etc.

Original compression methods, etc. are not affected by this patch.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Mon, Jan 17, 2022 at 10:23 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 1/14/22 19:41, Teodor Sigaev wrote:
>
>> In my understanding, we want to be able to
>> 1. Access data from a toasted object one slice at a time, by using
>> knowledge of the structure
>> 2. If toasted data is updated, then update a minimum number of
>> slices(s), without rewriting the existing slices
>> 3. If toasted data is expanded, then allownew slices to be appended to
>> the object without rewriting the existing slices
>
> There are more options:
> 1 share common parts between not only versions of row but between all
> rows in a column. Seems strange but examples:
> - urls often have a common prefix and so storing in a prefix tree (as
>     SP-GiST does) allows significantly decrease storage size
> - the same for json - it's often use case with common part of its
>     hierarchical structure
> - one more usecase for json. If json use only a few schemes
>     (structure) it's possible to store in toast storage only values and
>     don't store keys and structure

This sounds interesting, but very much like column compression, which
was proposed some time ago. If we haven't made much progrees with that
patch (AFAICS), what's the likelihood we'll succeed here, when it's
combined with yet more complexity?

Maybe doing that kind of compression in TOAST is somehow simpler, but I
don't see it.

> 2 Current toast storage stores chunks in heap accesses method and to
> provide fast access by toast id it makes an index. Ideas:
> - store chunks directly in btree tree, pgsql's btree already has an
>     INCLUDE columns, so, chunks and visibility data will be stored only
>     in leaf pages. Obviously it reduces number of disk's access for
>     "untoasting".
> - use another access method for chunk storage
>

Maybe, but that probably requires more thought - e.g. btree requires the
values to be less than 1/3 page, so I wonder how would that play with
toasting of values.

>> ISTM that we would want the toast algorithm to be associated with the
>> datatype, not the column?
>> Can you explain your thinking?
> Hm. I'll try to explain my motivation.
> 1) Datatype could have more than one suitable toasters. For different
>    usecases: fast retrieving, compact storage, fast update etc. As I
>    told   above, for jsonb there are several optimal strategies for
>    toasting:   for values with a few different structures, for close to
>    hierarchical structures, for values with different parts by access
>    mode (easy to imagine json with some keys used for search and some
>    keys only for   output to user)
> 2) Toaster could be designed to work with different data type. Suggested
>    appendable toaster is designed to work with bytea but could work with
>    text
>
> Looking on this point I have doubts where to store connection between
> toaster and datatype. If we add toasteroid to pg_type how to deal with
> several toaster for one datatype? (And we could want to has different
> toaster on one table!) If we add typoid to pg_toaster then how it will
> work with several datatypes? An idea to add a new many-to-many
> connection table seems workable but here there are another questions,
> such as will any toaster work with any table access method?
>
> To resolve this bundle of question we propose validate() method of
> toaster, which should be called during DDL operation, i.e. toaster is
> assigned to column or column's datatype is changed.
>

Seems you'd need a mapping table, to allow M:N mapping between types and
toasters, linking it to all "compatible" types. It's not clear to me how
would this work with custom data types, domains etc.

Also, what happens to existing values when you change the toaster? What
if the toasters don't use the same access method to store the chunks
(heap vs. btree)? And so on.

> More thought:
> Now postgres has two options for column: storage and compression and now
> we add toaster. For me it seems too redundantly. Seems, storage should
> be binary value: inplace (plain as now) and toastable. All other
> variation such as toast limit, compression enabling, compression kind
> should be an per-column option for toaster (that's why we suggest valid
> toaster oid for any column with varlena/toastable datatype). It looks
> like a good abstraction but we will have a problem with backward
> compatibility and I'm afraid I can't implement it very fast.
>

So you suggest we move all of this to toaster? I'd say -1 to that,
because it makes it much harder to e.g. add custom compression method, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

От

Teodor Sigaev

Дата:

18 января 2022 г., 17:56:30

Hi!

> Maybe doing that kind of compression in TOAST is somehow simpler, but I 
> don't see it.
Seems, in ideal world, compression should be inside toaster.

> 
>> 2 Current toast storage stores chunks in heap accesses method and to 
>> provide fast access by toast id it makes an index. Ideas:
>>    - store chunks directly in btree tree, pgsql's btree already has an
>>      INCLUDE columns, so, chunks and visibility data will be stored only
>>      in leaf pages. Obviously it reduces number of disk's access for
>>      "untoasting".
>>    - use another access method for chunk storage
>>
> 
> Maybe, but that probably requires more thought - e.g. btree requires the 
> values to be less than 1/3 page, so I wonder how would that play with 
> toasting of values.
That's ok, because chunk size is 2000 bytes right now and its could be 
saved.
> 

> Seems you'd need a mapping table, to allow M:N mapping between types and 
> toasters, linking it to all "compatible" types. It's not clear to me how 
> would this work with custom data types, domains etc.
If toaster will look into internal structure then it should know type's 
binary format. So, new custom types have a little chance to work with 
old custom toaster. Default toaster works with any types.
> 
> Also, what happens to existing values when you change the toaster? What 
> if the toasters don't use the same access method to store the chunks 
> (heap vs. btree)? And so on.

vatatt_custom contains an oid of toaster and toaster is not allowed to 
delete (at least, in suggested patches). So, if column's toaster has 
been changed then old values will be detoasted  by toaster pointed in 
varatt_custom structure, not in column definition. This is very similar 
to storage attribute works: we we alter storage attribute only new 
values will be stored with pointed storage type.

> 
>> More thought:
>> Now postgres has two options for column: storage and compression and 
>> now we add toaster. For me it seems too redundantly. Seems, storage 
>> should be binary value: inplace (plain as now) and toastable. All 
>> other variation such as toast limit, compression enabling, compression 
>> kind should be an per-column option for toaster (that's why we suggest 
>> valid toaster oid for any column with varlena/toastable datatype). It 
>> looks like a good abstraction but we will have a problem with backward 
>> compatibility and I'm afraid I can't implement it very fast.
>>
> 
> So you suggest we move all of this to toaster? I'd say -1 to that, 
> because it makes it much harder to e.g. add custom compression method, etc.
Hmm, I suggested to leave only toaster at upper level. Compression kind 
could be chosen in toaster's options (not implemented yet) or even make 
an API interface to compression to make it configurable. Right now, 
module developer could not implement a module with new compression 
method and it is a disadvantage.
-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/

Re: Pluggable toaster

От

Tomas Vondra

Дата:

18 января 2022 г., 19:05:46

On 1/18/22 15:56, Teodor Sigaev wrote:
> Hi!
> 
>> Maybe doing that kind of compression in TOAST is somehow simpler, but
>> I don't see it.
> Seems, in ideal world, compression should be inside toaster.
> 

I'm not convinced that's universally true. Yes, I'm sure certain TOAST
implementations would benefit from tighter control over compression, but
does that imply compression and toast are redundant? I doubt that,
because we compress non-toasted types too, for example. And layering has
a value too, as makes it easier to replace the pieces.

>>
>>> 2 Current toast storage stores chunks in heap accesses method and to
>>> provide fast access by toast id it makes an index. Ideas:
>>>    - store chunks directly in btree tree, pgsql's btree already has an
>>>      INCLUDE columns, so, chunks and visibility data will be stored only
>>>      in leaf pages. Obviously it reduces number of disk's access for
>>>      "untoasting".
>>>    - use another access method for chunk storage
>>>
>>
>> Maybe, but that probably requires more thought - e.g. btree requires
>> the values to be less than 1/3 page, so I wonder how would that play
>> with toasting of values.
> That's ok, because chunk size is 2000 bytes right now and its could be
> saved.
>>

Perhaps. My main point is that we should not be making too many radical
changes at once - it makes it much harder to actually get anything done.
So yeah, doing TOAST through IOT might be interesting, but I'd leave
that for a separate patch.

> 
>> Seems you'd need a mapping table, to allow M:N mapping between types
>> and toasters, linking it to all "compatible" types. It's not clear to
>> me how would this work with custom data types, domains etc.
> If toaster will look into internal structure then it should know type's
> binary format. So, new custom types have a little chance to work with
> old custom toaster. Default toaster works with any types.

The question is what happens when you combine data type with a toaster
that is not designed for that type. I mean, imagine you have a JSONB
toaster and you set it for a bytea column. Naive implementation will
just crash, because it'll try to process bytea as if it was JSONB.

It seems better to prevent such incompatible combinations and restrict
each toaster to just compatible data types, and the mapping table
(linking toaster and data types) seems a way to do that.

However, it seems toasters are either generic (agnostic to data types,
treating everything as bytea) or specialized. I doubt any specialized
toaster can reasonably support multiple data types, so maybe each
toaster can have just one "compatible type" OID. If it's invalid, it'd
be "generic" and otherwise it's useful for that type and types derived
from it (e.g. domains).

So you'd have the toaster OID in two places:

pg_type.toaster_oid      - default toaster for the type
pg_attribute.toaster_oid - current toaster for this column

and then you'd have

pg_toaster.typid - type this toaster handles (or InvalidOid for generic)

>>
>> Also, what happens to existing values when you change the toaster?
>> What if the toasters don't use the same access method to store the
>> chunks (heap vs. btree)? And so on.
> 
> vatatt_custom contains an oid of toaster and toaster is not allowed to
> delete (at least, in suggested patches). So, if column's toaster has
> been changed then old values will be detoasted  by toaster pointed in
> varatt_custom structure, not in column definition. This is very similar
> to storage attribute works: we we alter storage attribute only new
> values will be stored with pointed storage type.
> 

IIRC we do this for compression methods, right?

>>
>>> More thought:
>>> Now postgres has two options for column: storage and compression and
>>> now we add toaster. For me it seems too redundantly. Seems, storage
>>> should be binary value: inplace (plain as now) and toastable. All
>>> other variation such as toast limit, compression enabling,
>>> compression kind should be an per-column option for toaster (that's
>>> why we suggest valid toaster oid for any column with
>>> varlena/toastable datatype). It looks like a good abstraction but we
>>> will have a problem with backward compatibility and I'm afraid I
>>> can't implement it very fast.
>>>
>>
>> So you suggest we move all of this to toaster? I'd say -1 to that,
>> because it makes it much harder to e.g. add custom compression method,
>> etc.
> Hmm, I suggested to leave only toaster at upper level. Compression kind
> could be chosen in toaster's options (not implemented yet) or even make
> an API interface to compression to make it configurable. Right now,
> module developer could not implement a module with new compression
> method and it is a disadvantage.

If you have to implement custom toaster to implement custom compression
method, doesn't that make things more complex? You'd have to solve all
the issues for custom compression methods and also all issues for custom
toaster. Also, what if you want to just compress the column, not toast?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

19 января 2022 г., 17:25:19

Hi,

>I'm not convinced that's universally true. Yes, I'm sure certain TOAST
>implementations would benefit from tighter control over compression, but
>does that imply compression and toast are redundant? I doubt that,
>because we compress non-toasted types too, for example. And layering has
>a value too, as makes it easier to replace the pieces.

Not exactly. It is a mean to control TOAST itself without changing the core

each time you want to change Toast strategy or method. Compression is

just an example. And no Toasters are available without the patch proposed,

there is the one and only.

>Perhaps. My main point is that we should not be making too many radical
>changes at once - it makes it much harder to actually get anything done.
>So yeah, doing TOAST through IOT might be interesting, but I'd leave
>that for a separate patch.

That's why 4 distinct patches with incremental changes were proposed -

1) just new Toaster API with some necessary core changes required by the API;

2) default toaster routed via new API (but all it's functions are not affected

and dummy toaster extension as an example);

3) 1+2+some refactoring and versioning;

4) extension module for bytea columns.

Toast through IOT is a topic for discussion but does not seem to give a major

advantage over existing storage method, according to tests.

>It seems better to prevent such incompatible combinations and restrict
>each toaster to just compatible data types, and the mapping table
>(linking toaster and data types) seems a way to do that.

To handle this case a validate function (toastervalidate_function) is proposed

in the TsrRoutine structure.

>If you have to implement custom toaster to implement custom compression
>method, doesn't that make things more complex? You'd have to solve all
>the issues for custom compression methods and also all issues for custom
>toaster. Also, what if you want to just compress the column, not toast?

Default compression is restricted to 2 compression methods, all other means

require extensions. Also, the name Toaster is a little bit misleading because

it intends that data is being sliced, but it is not always the case, to be toasted

a piece of bread must not necessarily be sliced.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Tue, Jan 18, 2022 at 7:06 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 1/18/22 15:56, Teodor Sigaev wrote:
> Hi!
>
>> Maybe doing that kind of compression in TOAST is somehow simpler, but
>> I don't see it.
> Seems, in ideal world, compression should be inside toaster.
>

I'm not convinced that's universally true. Yes, I'm sure certain TOAST
implementations would benefit from tighter control over compression, but
does that imply compression and toast are redundant? I doubt that,
because we compress non-toasted types too, for example. And layering has
a value too, as makes it easier to replace the pieces.

>>
>>> 2 Current toast storage stores chunks in heap accesses method and to
>>> provide fast access by toast id it makes an index. Ideas:
>>>    - store chunks directly in btree tree, pgsql's btree already has an
>>>      INCLUDE columns, so, chunks and visibility data will be stored only
>>>      in leaf pages. Obviously it reduces number of disk's access for
>>>      "untoasting".
>>>    - use another access method for chunk storage
>>>
>>
>> Maybe, but that probably requires more thought - e.g. btree requires
>> the values to be less than 1/3 page, so I wonder how would that play
>> with toasting of values.
> That's ok, because chunk size is 2000 bytes right now and its could be
> saved.
>>

Perhaps. My main point is that we should not be making too many radical
changes at once - it makes it much harder to actually get anything done.
So yeah, doing TOAST through IOT might be interesting, but I'd leave
that for a separate patch.

>
>> Seems you'd need a mapping table, to allow M:N mapping between types
>> and toasters, linking it to all "compatible" types. It's not clear to
>> me how would this work with custom data types, domains etc.
> If toaster will look into internal structure then it should know type's
> binary format. So, new custom types have a little chance to work with
> old custom toaster. Default toaster works with any types.

The question is what happens when you combine data type with a toaster
that is not designed for that type. I mean, imagine you have a JSONB
toaster and you set it for a bytea column. Naive implementation will
just crash, because it'll try to process bytea as if it was JSONB.

It seems better to prevent such incompatible combinations and restrict
each toaster to just compatible data types, and the mapping table
(linking toaster and data types) seems a way to do that.

However, it seems toasters are either generic (agnostic to data types,
treating everything as bytea) or specialized. I doubt any specialized
toaster can reasonably support multiple data types, so maybe each
toaster can have just one "compatible type" OID. If it's invalid, it'd
be "generic" and otherwise it's useful for that type and types derived
from it (e.g. domains).

So you'd have the toaster OID in two places:

pg_type.toaster_oid - default toaster for the type
pg_attribute.toaster_oid - current toaster for this column

and then you'd have

pg_toaster.typid - type this toaster handles (or InvalidOid for generic)

>>
>> Also, what happens to existing values when you change the toaster?
>> What if the toasters don't use the same access method to store the
>> chunks (heap vs. btree)? And so on.
>
> vatatt_custom contains an oid of toaster and toaster is not allowed to
> delete (at least, in suggested patches). So, if column's toaster has
> been changed then old values will be detoasted by toaster pointed in
> varatt_custom structure, not in column definition. This is very similar
> to storage attribute works: we we alter storage attribute only new
> values will be stored with pointed storage type.
>

IIRC we do this for compression methods, right?

>>
>>> More thought:
>>> Now postgres has two options for column: storage and compression and
>>> now we add toaster. For me it seems too redundantly. Seems, storage
>>> should be binary value: inplace (plain as now) and toastable. All
>>> other variation such as toast limit, compression enabling,
>>> compression kind should be an per-column option for toaster (that's
>>> why we suggest valid toaster oid for any column with
>>> varlena/toastable datatype). It looks like a good abstraction but we
>>> will have a problem with backward compatibility and I'm afraid I
>>> can't implement it very fast.
>>>
>>
>> So you suggest we move all of this to toaster? I'd say -1 to that,
>> because it makes it much harder to e.g. add custom compression method,
>> etc.
> Hmm, I suggested to leave only toaster at upper level. Compression kind
> could be chosen in toaster's options (not implemented yet) or even make
> an API interface to compression to make it configurable. Right now,
> module developer could not implement a module with new compression
> method and it is a disadvantage.

If you have to implement custom toaster to implement custom compression
method, doesn't that make things more complex? You'd have to solve all
the issues for custom compression methods and also all issues for custom
toaster. Also, what if you want to just compress the column, not toast?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Pluggable toaster

От

Robert Haas

Дата:

20 января 2022 г., 18:59:52

On Thu, Dec 30, 2021 at 11:40 AM Teodor Sigaev <teodor@sigaev.ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
>     - "one toast fits all"  may be not the best solution for particular
>       type or/and use cases
>     - it doesn't know the internal structure of data type, so it  cannot
>       choose an optimal toast strategy
>     - it can't  share common parts between different rows and even
>       versions of rows

I agree ... but I'm also worried about what happens when we have
multiple table AMs. One can imagine a new table AM that is
specifically optimized for TOAST which can be used with an existing
heap table. One can imagine a new table AM for the main table that
wants to use something different for TOAST. So, I don't think it's
right to imagine that the choice of TOASTer depends solely on the
column data type. I'm not really sure how this should work exactly ...
but it needs careful thought.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

20 января 2022 г., 21:24:56

Hi,

The patch provides assigning toaster to a data column separately. Assigning to a data type is considered worthy

but is also a topic for further discussion and is not included in patch.

We've been thinking how to integrate AMs and custom toasters, so any thoughts are welcome.

Regards,

On Thu, Jan 20, 2022 at 7:00 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 30, 2021 at 11:40 AM Teodor Sigaev <teodor@sigaev.ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows

I agree ... but I'm also worried about what happens when we have
multiple table AMs. One can imagine a new table AM that is
specifically optimized for TOAST which can be used with an existing
heap table. One can imagine a new table AM for the main table that
wants to use something different for TOAST. So, I don't think it's
right to imagine that the choice of TOASTer depends solely on the
column data type. I'm not really sure how this should work exactly ...
but it needs careful thought.

--
Robert Haas
EDB: http://www.enterprisedb.com

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Teodor Sigaev

Дата:

02 февраля 2022 г., 10:34:49

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

Right. that's why we propose a validate method  (may be, it's a wrong 
name, but I don't known better one) which accepts several arguments, one 
of which is table AM oid. If that method returns false then toaster 
isn't useful with current TAM, storage or/and compression kinds, etc.

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

10 марта 2022 г., 11:47:32

Hi Hackers,

In addition to original patch set for Pluggable Toaster, we have two more patches

(actually, small, but important fixes), authored by Nikita Glukhov:

1) 0001-Fix-toast_tuple_externalize.patch

This patch fixes freeing memory in case of new toasted value is the same as old one,

this seems incorrect, and in this case the function just returns instead of freeing old value;

2) 0002-Fix-alignment-of-custom-TOAST-pointers.patch

This patch adds data alignment for new varatt_custom data structure in building tuples,

since varatt_custom must be aligned for custom toasters (in particular, this fix is very

important to JSONb Toaster).

These patches must be applied on top of the latter 4_bytea_appendable_toaster_v1.patch.

Please consider them in reviewing Pluggable Toaster.

Regards.

On Wed, Feb 2, 2022 at 10:35 AM Teodor Sigaev <teodor@sigaev.ru> wrote:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

Right. that's why we propose a validate method (may be, it's a wrong
name, but I don't known better one) which accepts several arguments, one
of which is table AM oid. If that method returns false then toaster
isn't useful with current TAM, storage or/and compression kinds, etc.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

22 марта 2022 г., 02:31:21

Hi Hackers,

Because of 3 months have passed since Pluggable Toaster presentation and a lot of

commits were pushed into v15 master - we would like to re-introduce this patch

rebased onto actual master. Last commit being used -

commit 641f3dffcdf1c7378cfb94c98b6642793181d6db (origin/master)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Mar 11 13:47:26 2022 -0500

Updated patch consists of 4 patch files, next version (v2) of original patch files

(please check original commit message from 30 Dec 2020):

1) 1_toaster_interface_v2.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_interface
Introduces syntax for storage and formal toaster API.

2) 2_toaster_default_v2.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring) uisng toaster API
as generic (or default) toaster.

3) 3_toaster_snapshot_v2.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows

4) 4_bytea_appendable_toaster_v2.patch.gz
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value.

These patches also include 2 minor fixes made after commit fest presentation

1) Fix for freeing memory in case of new toasted value is the same as old one,

this seems incorrect, and in this case the function just returns instead of freeing old value;

2) Fix of data alignment for new varatt_custom data structure in building tuples,

since varatt_custom must be aligned for custom toasters (in particular, this fix is very

important to JSONb Toaster).

Thanks!

On Wed, Feb 2, 2022 at 10:35 AM Teodor Sigaev <teodor@sigaev.ru> wrote:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

Right. that's why we propose a validate method (may be, it's a wrong
name, but I don't known better one) which accepts several arguments, one
of which is table AM oid. If that method returns false then toaster
isn't useful with current TAM, storage or/and compression kinds, etc.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Andres Freund

Дата:

22 марта 2022 г., 03:51:41

Hi,

On 2022-03-22 02:31:21 +0300, Nikita Malakhov wrote:
> Hi Hackers,
> Because of 3 months have passed since Pluggable Toaster presentation and a
> lot of
> commits were pushed into v15 master - we would like to re-introduce
> this patch
> rebased onto actual master. Last commit being used -
> commit 641f3dffcdf1c7378cfb94c98b6642793181d6db (origin/master)
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Fri Mar 11 13:47:26 2022 -0500

It currently fails to apply: http://cfbot.cputube.org/patch_37_3490.log

Given the size of the patch, and the degree of review it has gotten so far, it
seems not realistically a fit for 15, but is marked as such.

Think it should be moved to the next CF. Marked as waiting-on-author for now.

Greetings,

Andres Freund

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

22 марта 2022 г., 15:18:55

Hi,

In the previous email I attached patches that were not sequential, all 4 files contained complete independent

patch to apply to master. Sorry, I re-created patch files to apply them in sequence as in was meant in original

mail by Teodor Sigaev.

Please check.

Thanks!

On Tue, Mar 22, 2022 at 3:51 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-03-22 02:31:21 +0300, Nikita Malakhov wrote:
> Hi Hackers,
> Because of 3 months have passed since Pluggable Toaster presentation and a
> lot of
> commits were pushed into v15 master - we would like to re-introduce
> this patch
> rebased onto actual master. Last commit being used -
> commit 641f3dffcdf1c7378cfb94c98b6642793181d6db (origin/master)
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Fri Mar 11 13:47:26 2022 -0500

It currently fails to apply: http://cfbot.cputube.org/patch_37_3490.log

Given the size of the patch, and the degree of review it has gotten so far, it
seems not realistically a fit for 15, but is marked as such.

Think it should be moved to the next CF. Marked as waiting-on-author for now.

Greetings,

Andres Freund

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Greg Stark

Дата:

31 марта 2022 г., 23:14:36

It looks like it's still not actually building on some compilers. I
see a bunch of warnings as well as an error:

[03:53:24.660] dummy_toaster.c:97:2: error: void function
'dummyDelete' should not return a value [-Wreturn-type]

Also the "publication" regression test needs to be adjusted as it
includes \d+ output which has changed to include the toaster.

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

01 апреля 2022 г., 17:11:16

Hi,

Have found code corrupted by merge went unnoticed in dummy_toaster.c.

Please look at the attached patches. They must be applied sequentially, one after another,

from the 1st one. I haven't seed any warnings during compilation (with gcc on Ubuntu 20),

and check-world goes with no errors.

On Thu, Mar 31, 2022 at 11:15 PM Greg Stark <stark@mit.edu> wrote:

It looks like it's still not actually building on some compilers. I
see a bunch of warnings as well as an error:

[03:53:24.660] dummy_toaster.c:97:2: error: void function
'dummyDelete' should not return a value [-Wreturn-type]

Also the "publication" regression test needs to be adjusted as it
includes \d+ output which has changed to include the toaster.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Greg Stark

Дата:

03 апреля 2022 г., 04:20:36

Hm. It compiles but it's failing regression tests:

diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
2022-04-02 16:02:47.874360253 +0000
+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
2022-04-02 16:07:20.878047769 +0000
@@ -20,186 +20,7 @@
....
+server closed the connection unexpectedly
+ This probably means the server terminated abnormally
+ before or while processing the request.
+connection to server was lost
I think this will require some real debugging, so I'm marking this
Waiting on Author.

Re: Pluggable toaster

От

Andres Freund

Дата:

03 апреля 2022 г., 05:06:13

Hi,

On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote:
>Hm. It compiles but it's failing regression tests:
>
>diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>2022-04-02 16:02:47.874360253 +0000
>+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>2022-04-02 16:07:20.878047769 +0000
>@@ -20,186 +20,7 @@
>....
>+server closed the connection unexpectedly
>+ This probably means the server terminated abnormally
>+ before or while processing the request.
>+connection to server was lost
>I think this will require some real debugging, so I'm marking this
>Waiting on Author.

Yes, dumps core (just like in several previous runs):

https://cirrus-ci.com/task/4710272324599808?logs=cores#L44

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

03 апреля 2022 г., 16:15:19

Hi,

I'm checking. It seems that I've missed something while rebasing, we have had all tests clean before.

On Sun, Apr 3, 2022 at 5:06 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote:
>Hm. It compiles but it's failing regression tests:
>
>diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>2022-04-02 16:02:47.874360253 +0000
>+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>2022-04-02 16:07:20.878047769 +0000
>@@ -20,186 +20,7 @@
>....
>+server closed the connection unexpectedly
>+ This probably means the server terminated abnormally
>+ before or while processing the request.
>+connection to server was lost
>I think this will require some real debugging, so I'm marking this
>Waiting on Author.

Yes, dumps core (just like in several previous runs):

https://cirrus-ci.com/task/4710272324599808?logs=cores#L44

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 апреля 2022 г., 19:12:59

Hi,

I'm really sorry. Messed up some code while merging rebased branches with previous (v1)

patches issued in December and haven't noticed that seg fault because of corrupted code

while running check-world.

I've fixed messed code in Dummy Toaster package and checked twice - all's correct now,

patches are applied correctly and tests are clean.

Thank you for pointing out the error and for your patience!

On Sun, Apr 3, 2022 at 5:06 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote:
>Hm. It compiles but it's failing regression tests:
>
>diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>2022-04-02 16:02:47.874360253 +0000
>+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>2022-04-02 16:07:20.878047769 +0000
>@@ -20,186 +20,7 @@
>....
>+server closed the connection unexpectedly
>+ This probably means the server terminated abnormally
>+ before or while processing the request.
>+connection to server was lost
>I think this will require some real debugging, so I'm marking this
>Waiting on Author.

Yes, dumps core (just like in several previous runs):

https://cirrus-ci.com/task/4710272324599808?logs=cores#L44

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Robert Haas

Дата:

04 апреля 2022 г., 19:39:53

On Mon, Apr 4, 2022 at 12:13 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I'm really sorry. Messed up some code while merging rebased branches with previous (v1)
> patches issued in December and haven't noticed that seg fault because of corrupted code
> while running check-world.
> I've fixed messed code in Dummy Toaster package and checked twice - all's correct now,
> patches are applied correctly and tests are clean.
> Thank you for pointing out the error and for your patience!

Hi,

This patch set doesn't seem anywhere close to committable to me. For example:

- It apparently adds a new command called CREATE TOASTER but that
command doesn't seem to be documented anywhere.

- contrib/dummy_toaster is added in patch 1 with a no implementation
and code comments that say "Bloom index utilities" and then those
comments are fixed and an implementation is added in later patches.

- What is supposedly patch 1 is actually 32 patch files concatenated
together. It doesn't apply properly either with 'patch' or 'git am' so
I don't understand what we would even do with this.

- None of these patches have appropriate commit messages.

- Many of these patches add 'XXX' comments or errors or other
obviously unfinished bits of code. Some of this may be cleaned up by
later patches but it's hard to tell because right now one can't even
apply the patch set properly. Even if that were possible, it's the job
of the person submitting the patch to organize the patch into
independent, separately committable chunks that are self-contained and
have good comments and good commit messages for each one.

I don't think based on the status of this patch set that it's even
possible to provide useful feedback on the design at this point, never
mind getting anything committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

От

Greg Stark

Дата:

04 апреля 2022 г., 22:32:45

Thanks for the review Robert. I think that gives some pretty
actionable advice on how to improve the patch and it doesn't seem
likely to get much more in this cycle.

I'll mark the patch Returned with Feedback. Hope to see it come back
with improvements in the next release.

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 апреля 2022 г., 23:05:16

Hi,

Thank you very much for your review, I'd like to get it much earlier. I'm currently

working on cleaning up code, and would try to comment as much as possible.

This patch set is really a large set of functionality, it was divided into 4 logically complete

parts, but anyway these parts contain a lot of changes themselves.

- Yes, you're right, new syntax is added. I'm also working on the documentation part for it;

- Patches actually consist of a lot of minor commits. As I see we have to squash them

to provide parts as 2-3 main commits without any unnecessary garbage;

- Is 'git apply' not a valid way to apply such patches?

We'll try to straighten these issues out asap.

Thank you!

On Mon, Apr 4, 2022 at 7:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 4, 2022 at 12:13 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I'm really sorry. Messed up some code while merging rebased branches with previous (v1)
> patches issued in December and haven't noticed that seg fault because of corrupted code
> while running check-world.
> I've fixed messed code in Dummy Toaster package and checked twice - all's correct now,
> patches are applied correctly and tests are clean.
> Thank you for pointing out the error and for your patience!

Hi,

This patch set doesn't seem anywhere close to committable to me. For example:

- It apparently adds a new command called CREATE TOASTER but that
command doesn't seem to be documented anywhere.

- contrib/dummy_toaster is added in patch 1 with a no implementation
and code comments that say "Bloom index utilities" and then those
comments are fixed and an implementation is added in later patches.

- What is supposedly patch 1 is actually 32 patch files concatenated
together. It doesn't apply properly either with 'patch' or 'git am' so
I don't understand what we would even do with this.

- None of these patches have appropriate commit messages.

- Many of these patches add 'XXX' comments or errors or other
obviously unfinished bits of code. Some of this may be cleaned up by
later patches but it's hard to tell because right now one can't even
apply the patch set properly. Even if that were possible, it's the job
of the person submitting the patch to organize the patch into
independent, separately committable chunks that are self-contained and
have good comments and good commit messages for each one.

I don't think based on the status of this patch set that it's even
possible to provide useful feedback on the design at this point, never
mind getting anything committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Robert Haas

Дата:

04 апреля 2022 г., 23:17:59

On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?

I have found that it never works. This case is no exception:

[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
 * CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
 * va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply

I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.

Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

05 апреля 2022 г., 10:58:59

Hi,

Thanks for advices.

We have 4 branches, for each patch provided, you can check them out -

(come copy-paste from the very fist email, where the patches were proposed)

1) 1_toaster_interface
https://github.com/postgrespro/postgres/tree/toaster_interface
Introduces syntax for storage and formal toaster API. Adds column
atttoaster to pg_attribute, by design this column should not be equal to
invalid oid for any toastable datatype, ie it must have correct oid for
any type (not column) with non-plain storage. Since toaster may support
only particular datatype, core should check correctness of toaster set
by toaster validate method. New commands could be found in
src/test/regress/sql/toaster.sql. Also includes modification of pg_dump.

2) 2_toaster_default
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring) using toaster API
as generic (or default) toaster. dummy_toaster here is a minimal
workable example, it saves value directly in toast pointer and fails if
value is greater than 1kb.

3) 3_toaster_snapshot
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows

4) 4_bytea_appendable_toaster
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value. Appended tail
stored directly in toaster pointer, if there is enough space to do it.

Working on refactoring according to your recommendations.

Thank you!

On Mon, Apr 4, 2022 at 11:18 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?

I have found that it never works. This case is no exception:

[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply

I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.

Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.

--
Robert Haas
EDB: http://www.enterprisedb.com

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 апреля 2022 г., 21:55:03

Hi,

I reworked previous patch set according to recommendations. Patches

are generated by format-patch and applied by git am. Patches are based on

master from 03.11. Also, now we've got clean branch with incremental commits

which could be easily rebased onto a fresh master.

Currently, there are 8 patches:

1) 0001_create_table_storage_v3.patch - SET STORAGE option for CREATE

TABLE command by Teodor Sigaev which is required by all the following functionality;

2) 0002_toaster_interface_v6.patch - Toaster API (SQL syntax for toasters + API)

with Dummy toaster as an example of how this API should be used, but with default

toaster left 'as-is';

3) 0003_toaster_default_v5.patch - default (regular) toaster is implemented

via new API;

4) 0004_toaster_snapshot_v5.patch - refactoring of default toaster and support

of versioned toasted rows;

5) 0005_bytea_appendable_toaster_v5.patch - bytea toaster by Nikita Glukhov

Custom toaster for bytea data with support of appending (instead of rewriting)

stored data;

6) 0006_toasterapi_docs_v1.patch - brief documentation on Toaster API in Pg docs;

7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's

alignment required by bytea toaster by Nikita Glukhov;

8) 0008_fix_toast_tuple_externalize.patch - fixes toast_tuple_externalize function

not to call toast if old data is the same as new one.

I would be grateful for feedback on the reworked patch set.

On Mon, Apr 4, 2022 at 11:18 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?

I have found that it never works. This case is no exception:

[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply

I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.

Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.

--
Robert Haas
EDB: http://www.enterprisedb.com

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 апреля 2022 г., 22:58:24

Hi,

For a pluggable toaster - in previous patch set part 7 patch file contains invalid string.

Fixup (v2 file should used instead of previous) patch:

7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's

alignment required by bytea toaster by Nikita Glukhov;

On Wed, Apr 13, 2022 at 9:55 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi,
I reworked previous patch set according to recommendations. Patches
are generated by format-patch and applied by git am. Patches are based on
master from 03.11. Also, now we've got clean branch with incremental commits
which could be easily rebased onto a fresh master.

Currently, there are 8 patches:

1) 0001_create_table_storage_v3.patch - SET STORAGE option for CREATE
TABLE command by Teodor Sigaev which is required by all the following functionality;

2) 0002_toaster_interface_v6.patch - Toaster API (SQL syntax for toasters + API)
with Dummy toaster as an example of how this API should be used, but with default
toaster left 'as-is';

3) 0003_toaster_default_v5.patch - default (regular) toaster is implemented
via new API;

4) 0004_toaster_snapshot_v5.patch - refactoring of default toaster and support
of versioned toasted rows;

5) 0005_bytea_appendable_toaster_v5.patch - bytea toaster by Nikita Glukhov
Custom toaster for bytea data with support of appending (instead of rewriting)
stored data;

6) 0006_toasterapi_docs_v1.patch - brief documentation on Toaster API in Pg docs;

7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's
alignment required by bytea toaster by Nikita Glukhov;

8) 0008_fix_toast_tuple_externalize.patch - fixes toast_tuple_externalize function
not to call toast if old data is the same as new one.

I would be grateful for feedback on the reworked patch set.

On Mon, Apr 4, 2022 at 11:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?

I have found that it never works. This case is no exception:

[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply

I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.

Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.

--
Robert Haas
EDB: http://www.enterprisedb.com

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

0007_fix_alignment_of_custom_toast_pointers_v2.patch.gz

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

17 июня 2022 г., 17:33:24

Hi hackers,

> For a pluggable toaster - in previous patch set part 7 patch file contains invalid string.
> Fixup (v2 file should used instead of previous) patch:
> 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's
> alignment required by bytea toaster by Nikita Glukhov;

I finished digesting the thread and the referred presentations per
Matthias (cc:'ed) suggestion in [1] discussion. Although the patchset
got a fair amount of push-back above, I prefer to stay open minded and
invest some of my time into this effort as a tester/reviewer during
the July CF. Even if the patchset will not make it entirely to the
core, some of its parts can be useful.

Unfortunately, I didn't manage to find something that can be applied
and tested. cfbot is currently not happy with the patchset.
0001_create_table_storage_v3.patch doesn't apply to the current
origin/master manually either:

```
error: patch failed: src/backend/parser/gram.y:2318
error: src/backend/parser/gram.y: patch does not apply
```

Any chance we can see a rebased patchset for the July CF?

[1]: https://commitfest.postgresql.org/38/3626/

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 июня 2022 г., 14:46:07

Hi hackers,

We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.

Thank you for your patience.

Best regards,

Nikita Malakhov

On Fri, Jun 17, 2022 at 5:33 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi hackers,

> For a pluggable toaster - in previous patch set part 7 patch file contains invalid string.
> Fixup (v2 file should used instead of previous) patch:
> 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's
> alignment required by bytea toaster by Nikita Glukhov;

I finished digesting the thread and the referred presentations per
Matthias (cc:'ed) suggestion in [1] discussion. Although the patchset
got a fair amount of push-back above, I prefer to stay open minded and
invest some of my time into this effort as a tester/reviewer during
the July CF. Even if the patchset will not make it entirely to the
core, some of its parts can be useful.

Unfortunately, I didn't manage to find something that can be applied
and tested. cfbot is currently not happy with the patchset.
0001_create_table_storage_v3.patch doesn't apply to the current
origin/master manually either:

```
error: patch failed: src/backend/parser/gram.y:2318
error: src/backend/parser/gram.y: patch does not apply
```

Any chance we can see a rebased patchset for the July CF?

[1]: https://commitfest.postgresql.org/38/3626/

--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

23 июня 2022 г., 16:38:45

Hi Nikita,

> We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.
> Thank you for your patience.

Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.

[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 июня 2022 г., 16:53:47

Hi,

Alexander, thank you for your feedback and willingness to help. You can send a suggested fixup in this thread, I'll check the issue you've mentioned.

Best regards,

Nikita Malakhov

On Thu, Jun 23, 2022 at 4:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.
> Thank you for your patience.

Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.

[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

30 июня 2022 г., 23:26:46

Hi hackers!

Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Just to remind:

In Pluggable TOAST we suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - we extracted

TOAST mechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.

With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload

of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with any Toastable datatype

(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise, but through our TOAST API.

TOAST API does not have any noticeable overhead in comparison to the original (master). Proofs in our research materials (measured).

We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials here

http://www.sai.msu.su/~megera/postgres/talks/

We have ready to plug in extension Toasters

- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation)

- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB)

and prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects

(like pg_largeobject, but much, much larger, and without existing large object limitations), default Toaster implementation without using Indexes.

Patch set consists of 9 incremental patches:

0001_create_table_storage_v4.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;

0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)

and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v6.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v6.patch - refactoring of Default TOAST and support for versioned Toast rows;

0005_bytea_appendable_toaster_v6.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;

0006_toasterapi_docs_v2.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v2.patch - fixes custom toast pointer's

alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v2.patch - fixes toast_tuple_externalize function

not to call toast if old data is the same as new one.

0009_bytea_contrib_and_varlena_v1.patch - several late fixups for 0005.

This patch set opens the following issues:

1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.

Actually, Toasters allow to use any compression methods independently of AM;

2) Implement default Toaster without using Indexes (currently in development)?

3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;

4) Several already developed Toasters show impressive results for datatypes they were designed for.

We're gladly appreciate your feedback!

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Thu, Jun 23, 2022 at 4:53 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi,
Alexander, thank you for your feedback and willingness to help. You can send a suggested fixup in this thread, I'll check the issue you've mentioned.

Best regards,
Nikita Malakhov

On Thu, Jun 23, 2022 at 4:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi Nikita,

> We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.
> Thank you for your patience.

Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.

[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

30 июня 2022 г., 23:27:47

Rebased onto 15 REL BETA 2

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

01 июля 2022 г., 11:10:46

Hi Nikita,

> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks for the rebased patchset.

This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.

0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.

The order of proposed changes IMO is wrong.

0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.

0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.

[1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru

--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

01 июля 2022 г., 15:14:50

Hi,

Alexander, thank you for your feedback!

I'd explain our thoughts:

We thought that refactoring default TOAST mechanics via TOAST API in p. 0002 would be too much because the API itself already

introduced a lot of changes, so we kept Default Toasters re-implementation for later patch.

0002 introduces custom Toast pointers with corresponding macro set (postgres.h), Dummy toaster as an example for developers

how the API should be used, but left default TOAST as-is. As I see, there are no lots of custom TAMs, despite of pluggable storage

interface introduced several years ago, so we thought that some simple example of how to use the new API would be nice to have.

We've done TOAST refactoring in 0003, but it has not replaced default TOAST, it just routed it default via TOAST API, but it still

is left as part of the core, and is used for TOASTing (set in atttoaster column) by default.

For performance testing we used a lot of manually corrected scripts, I have to put them in order but I would provide them later as

additional side patch for this patch set.

Best regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Fri, Jul 1, 2022 at 11:10 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks for the rebased patchset.

This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.

0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.

The order of proposed changes IMO is wrong.

0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.

0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.

[1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru

--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Matthias van de Meent

Дата:

01 июля 2022 г., 15:27:16

On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing
internalsand/or workflow and workload

> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE
TOASTER...)

> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

11 июля 2022 г., 15:03:50

Hi!

We have branch with incremental commits worm where patches were generated with format-patch -

https://github.com/postgrespro/postgres/tree/toasterapi_clean

I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.

Best regards,

Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 июля 2022 г., 22:45:40

Hi hackers!

According to previous requests the patch branch was cleaned up from garbage, logs, etc. All conflicts' resolutions were merged

into patch commits where they appear, branch was rebased to present one commit for one patch. The branch was actualized,

and a fresh patch set was generated.

https://github.com/postgrespro/postgres/tree/toasterapi_clean

What we propose in short:

We suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - detached TOAST

mechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.

With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals

and/or workflow and workload of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with any

Toastable datatype

(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise,

but through our TOAST API.

We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials here

http://www.sai.msu.su/~megera/postgres/talks/

Testing scripts used in talks are a bit scarce and have a lot of manual handling, so it is another bit of work to bunch them into

patch set, please be patient, I'll try to make it ASAP.

We have ready to plug in extension Toasters

- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation) is included in this patch set;

- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB) will be provided later;

- Prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects

(like pg_largeobject, but much, much larger, and without existing large object limitations), and currently we're checking default

Toaster implementation without using Indexes (direct access by TIDs, up to 3 times faster than default on smaller values,

less storage due to absence of index tree).

Patch set consists of 8 incremental patches:

0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;

This patch is already discussed in a separate thread;

0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)

and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support for versioned Toast rows;

0005_bytea_appendable_toaster_v7.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;

0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast pointer's

alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize function

not to call toast if old data is the same as new one.

The example of usage the TOAST API:

CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;

This patch set opens the following issues:

1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.

Actually, Toasters allow to use any compression methods independently of AM;

2) Implement default Toaster without using Indexes (currently in development)?

3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;

4) Several already developed Toasters show impressive results for datatypes they were designed for.

We're awaiting feedback.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi!
We have branch with incremental commits worm where patches were generated with format-patch -
https://github.com/postgrespro/postgres/tree/toasterapi_clean
I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.

Best regards,
Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

20 июля 2022 г., 12:15:41

Hi hackers!

We really need your feedback on the last patchset update!

On a previous question about TOAST API overhead - please check script in attach, we tested INSERT, UPDATE and SELECT

operations, and ran it on vanilla master and on patched master (vanilla with untouched TOAST implementation and patched

with default TOAST implemented via TOAST API, in this patch set - with patches up to 0005_bytea_appendable_toaster installed).

Some of the test scripts will be included in the patch set later, as an additional patch.

Currently I'm working on an update to the default Toaster (some internal optimizations, not affecting functionality)

and readme files explaining Pluggable TOAST.

An example of using custom Toaster:

Custom Toaster extension definition (developer):

CREATE FUNCTION custom_toaster_handler(internal)

RETURNS toaster_handler
AS 'MODULE_PATHNAME'
LANGUAGE C;

CREATE TOASTER custom_toaster HANDLER custom_toaster_handler;

User's POV:

Thanks!

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

On Wed, Jul 13, 2022 at 10:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!
According to previous requests the patch branch was cleaned up from garbage, logs, etc. All conflicts' resolutions were merged
into patch commits where they appear, branch was rebased to present one commit for one patch. The branch was actualized,
and a fresh patch set was generated.
https://github.com/postgrespro/postgres/tree/toasterapi_clean

What we propose in short:
We suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - detached TOAST
mechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.
With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals
and/or workflow and workload of data being TOASTed makes Custom Toasters much more efficient in performance and storage.
We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with any
Toastable datatype
(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise,
but through our TOAST API.

We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials here
http://www.sai.msu.su/~megera/postgres/talks/
Testing scripts used in talks are a bit scarce and have a lot of manual handling, so it is another bit of work to bunch them into
patch set, please be patient, I'll try to make it ASAP.

We have ready to plug in extension Toasters
- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation) is included in this patch set;
- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB) will be provided later;
- Prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects
(like pg_largeobject, but much, much larger, and without existing large object limitations), and currently we're checking default
Toaster implementation without using Indexes (direct access by TIDs, up to 3 times faster than default on smaller values,
less storage due to absence of index tree).

Patch set consists of 8 incremental patches:
0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;
This patch is already discussed in a separate thread;

0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support for versioned Toast rows;

0005_bytea_appendable_toaster_v7.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;

0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast pointer's
alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize function
not to call toast if old data is the same as new one.

The example of usage the TOAST API:
CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;

This patch set opens the following issues:
1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.
Actually, Toasters allow to use any compression methods independently of AM;
2) Implement default Toaster without using Indexes (currently in development)?
3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;
4) Several already developed Toasters show impressive results for datatypes they were designed for.

We're awaiting feedback.

Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi!
We have branch with incremental commits worm where patches were generated with format-patch -
https://github.com/postgrespro/postgres/tree/toasterapi_clean
I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.

Best regards,
Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Вложения

api_perf.sql

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

22 июля 2022 г., 14:05:30

Hi hackers!

I've reworked the patch set according to recommendations of Aleksander Alekseev, Robert Haas

and Matthias van de Meent, and decided, as it was recommended earlier, include only the most

important part in the first set. Also, I've added a large README on Pluggable TOAST to sources,

I'll be grateful for feedback on this README file. Also, some minor fixes were included in patches.

Now patch set consists of 3 incremental patches:

0001_create_table_storage_v6.patch - This patch adds important part of SQL syntax fix for

CREATE TABLE clause, which is mandatory for all Pluggable TOAST functionality - processing of

SET STORAGE option correctly.

This patch is presented by Teodor Sigaev and is already discussed and marked as ready for commit

in a separate thread;

0002_toaster_interface_v9.patch - The patch introduces TOAST API interface and SQL syntax

allowing creation of custom Toaster (CREATE TOASTER ...) and assigning Toaster to a table column

CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);

Default TOAST functionality is left intact, for the sake of not-so-big one-time changes, and nothing

changes from user's perspective, but here user already can develop and plug in custom Toasters;

0003_toaster_default_v8.patch - Introducing Default TOAST implemented via TOAST API, and

a large README on Pluggable TOAST functionality for developers, put into

/src/backend/access/toast/README.toastapi

With a respect to Aleksander Alekseev

>>There is no need for contrib/dummy_toaster
>>similarly as there is no contrib/ for a dummy TableAM. The provided
>>example doesn't do much anyway since all the heavy lifting should be
>>done in the callbacks implementation.

we decided to leave the dummy Toaster as it is, because it's purpose is not to show heavy lifting

done by internal Toaster implementation, but to demonstrate how Toaster extension is defined

and plugged in.

Thank you for your attention and feedbacks!

On Fri, Jul 1, 2022 at 11:10 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).

Thanks for the rebased patchset.

This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.

0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.

The order of proposed changes IMO is wrong.

0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.

0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.

[1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

22 июля 2022 г., 14:54:22

Hi Nikita,

> I've reworked the patch set according to recommendations of Aleksander Alekseev, Robert Haas
> and Matthias van de Meent, and decided, as it was recommended earlier, include only the most
> important part in the first set. Also, I've added a large README on Pluggable TOAST to sources,
> I'll be grateful for feedback on this README file. Also, some minor fixes were included in patches.

Many thanks for accounting for the previous feedback. I will take a
look somewhere early next week.

> 0001 [...] This patch is presented by Teodor Sigaev and is already discussed and marked as ready for commit
in a separate thread;

FYI it was merged, see 784cedda [1].

Also, you forgot the attachments :)

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Matthias van de Meent

Дата:

22 июля 2022 г., 16:16:56

On Wed, 20 Jul 2022 at 11:16, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!

Hi,

Please don't top-post here.  See
https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics.

> We really need your feedback on the last patchset update!

This is feedback on the latest version that was shared on the mailing
list here [0]. Your mail from today didn't seem to have an attachment,
and I haven't checked the git repository for changes.

0001: Create Table Storage:
LGTM

---

0002: Toaster API interface

> tablecmds.c

> SetIndexStorageProperties(Relation rel, Relation attrelation,
>                           AttrNumber attnum,
>                           bool setstorage, char newstorage,
>                           bool setcompression, char newcompression,
> +                          bool settoaster, Oid toasterOid,
>                           LOCKMODE lockmode)

Indexes cannot index toasted values, so why would the toaster oid be
interesting for index storage properties?

>  static List *MergeAttributes(List *schema, List *supers, char relpersistence,
> -                             bool is_partition, List **supconstr);
> +                             bool is_partition, List **supconstr,
> +                             Oid accessMethodId);

> toasterapi.h:

> +SearchTsrCache(Oid    toasterOid)
> ...
> +    for_each_from(lc, ToasterCache, 0)
> +    {
> +        entry = (ToasterCacheEntry*)lfirst(lc);
> +
> +        if (entry->toasterOid == toasterOid)
> +        {
> +            /* remove entry from list, it will be added in a head of list below */
> +            foreach_delete_current(ToasterCache, lc);
> +            goto out;
> +        }
> +    }

Moving toasters seems quite expensive when compared to just index
checks. When you have many toasters, but only a few hot ones, this
currently will move the "cold" toasters around a lot. Why not use a
stack instead (or alternatively, a 'zipper' (or similar) data
structure), where the hottest toaster is on top, so that we avoid
larger memmove calls?

> postgres.h

> +/* varatt_custom uses 16bit aligment */
To the best of my knowledge varlena-pointers are unaligned; and this
is affirmed by the comment immediately under varattrib_1b_e. Assuming
alignment to 16 bits should/would be incorrect in some of the cases.
This is handled for normal varatt_external values by memcpy-ing the
value into local (aligned) fields before using them, but that doesn't
seem to be the case for varatt_custom?

---

0003: (re-implement default toaster using toaster interface)

I see significant changes to the dummy toaster (created in 0002),
could those be moved to patch 0002 in the next iteration?

detoast.c and detoast.h are effectively empty after this patch (only
imports and commented-out code remain), please fully remove them
instead - that saves on patch diff size.

With the new API, I'm getting confused about the location of the
various toast_* functions. They are spread around in various files
that have no clear distinction on why it is (still) located there:
some functions are moved to access/toast/*, while others are moved
around in catalog/toasting.c, access/common/toast_internals.c and
access/table/toast_helper.c.

> detoast.c / tableam.h
According to a quick search, all core usage of
table_relation_fetch_toast_slice is removed. Shouldn't we remove that
tableAM API (+ heapam implementation) instead of updating and
maintaining it? Same question for table_relation_toast_am - I'm not
sure that it remains the correct way of dealing with toast.

> toast_helper.c

toast_delete_external_datum:
Please clean up code that was commented out from the patches, it
detracts from the readability of a patch.

> toast_internals.c

This seems like a bit of a mess, considering the lack of
Can't we split this up into a heaptoast (or whatever we're going to
call the default toaster) and actual toast internals? It seems to me
that

> generic_toaster.c

Could you align name styles in this new file? It has both camelCase
and snake_case for function names.

> toasting.c

I'm not entirely sure that we should retain catalog/toasting.c if we
are going to depend on the custom toaster API. Shouldn't the creation
of toast tables be delegated to the toaster?

> + * toast_get_valid_index
> + *
> + *    Get OID of valid index associated to given toast relation. A toast
> + *    relation can have only one valid index at the same time.

Although this is code being moved around, the comment is demonstrably
false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
leave a toast relation with 2 valid indexes.

---

0004: refactoring and optimization of default toaster
0005: bytea appendable toaster

I dind't review these yet.

---

0006: docs

Seems like a good start, but I'm not sure that we need the struct
definition in the docs. I think the BRIN extensibility docs [1] are a
better example on what I think the documentation for this API should
look like.

---

0007: fix alignment of custom toast pointers
This is not a valid fix for the alignment requirement for custom toast
pointers. You now leave one byte empty if you are not already aligned,
which for on-disk toast pointers means that we're dealing with a
4-byte aligned value, which is not the case because this is a
2-byte-aligned value.

Regardless, this should be merged into 0002, not remain a seperate patch.

---

0008: fix tuple externalization
Should be merged into the relevant patch as well, not as a separate patch.

---

> Currently I'm working on an update to the default Toaster (some internal optimizations, not affecting functionality)
> and readme files explaining Pluggable TOAST.

That would be greatly appreciated - 0006 does not cover why we need
vtable, nor how it's expected to be used in type-aware code.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CAN-LCVNkU%2Bkdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg%40mail.gmail.com
[1] https://www.postgresql.org/docs/15/brin-extensibility.html

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 июля 2022 г., 10:15:05

Hi hackers!

Matthias, thank you very much for your feedback!

Sorry, I forgot to attach files.

Attaching here, but they are for the commit tagged "15beta2", I am currently

rebasing this branch onto the actual master and will provide rebased version,

with some corrections according to your feedback, in a day or two.

>Indexes cannot index toasted values, so why would the toaster oid be
>interesting for index storage properties?

Here Teodor might correct me. Toast tables are indexed, and Heap TOAST

mechanics accesses Toasted tuples by index, isn't it the case?

>Moving toasters seems quite expensive when compared to just index
>checks. When you have many toasters, but only a few hot ones, this
>currently will move the "cold" toasters around a lot. Why not use a
>stack instead (or alternatively, a 'zipper' (or similar) data
>structure), where the hottest toaster is on top, so that we avoid
>larger memmove calls?

This is a reasonable remark, we'll consider it for the next iteration. Our reason

is that we think there won't be a lot of custom Toasters, most likely less then

a dozen, for the most complex/heavy datatypes so we haven't considered

these expenses.

>To the best of my knowledge varlena-pointers are unaligned; and this
>is affirmed by the comment immediately under varattrib_1b_e. Assuming
>alignment to 16 bits should/would be incorrect in some of the cases.
>This is handled for normal varatt_external values by memcpy-ing the
>value into local (aligned) fields before using them, but that doesn't
>seem to be the case for varatt_custom?

Alignment correction seemed reasonable for us because structures are

anyway aligned in memory, so when we use 1 and 2-byte fields along

with 4-byte it may create a lot of padding. Am I wrong? Also, correct

alignment somewhat simplifies access to structure fields.

>0003: (re-implement default toaster using toaster interface)
>I see significant changes to the dummy toaster (created in 0002),
>could those be moved to patch 0002 in the next iteration?

Will do.

>detoast.c and detoast.h are effectively empty after this patch (only
>imports and commented-out code remain), please fully remove them
>instead - that saves on patch diff size.

Will do.

About the location of toast_ functions: these functions are part of Heap

TOAST mechanics, and they were scattered among other Heap internals

sources. I've tried to gather them and put them in more straight order, but

this work is not fully finished yet and will take some time. Working on it.

I'll check if table_relation_fetch_toast_slice could be removed, thanks for

the remark.

toast_helper - done, will be provided in rebased version.

toast_internals - this one is an internal part of TOAST implemented in

Heap AM, but I'll try to straighten it out as much as I could.

naming conventions in some sources - done, will be provided in rebased

patch set.

>Shouldn't the creation of toast tables be delegated to the toaster?

Yes, you're right, and actually, it is. I'll check that and correct in rebased

version.

>Although this is code being moved around, the comment is demonstrably
>false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
>leave a toast relation with 2 valid indexes.

This code is quite old, we've not changed it but thanks for the remark,

I'll check it more carefully.

Small fixes are already merged into larger patches in attached files. Also,

I appreciate your feedback on documentation - if you would have an opportunity

please check README provided in 0003. I've took your comments on documentation

into account and will include corrections according to them into rebased patch.

As Aleksander recommended, I've shortened the patch set and left only the most

important part - API and re-implemented default Toast. All bells and whistles are not

of so much importance and could be sent later after the API itself will be straightened

out and commited.

Thank you very much!

On Fri, Jul 22, 2022 at 4:17 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

On Wed, 20 Jul 2022 at 11:16, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!

Hi,

Please don't top-post here. See
https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics.

> We really need your feedback on the last patchset update!

This is feedback on the latest version that was shared on the mailing
list here [0]. Your mail from today didn't seem to have an attachment,
and I haven't checked the git repository for changes.

0001: Create Table Storage:
LGTM

---

0002: Toaster API interface

> tablecmds.c

> SetIndexStorageProperties(Relation rel, Relation attrelation,
> AttrNumber attnum,
> bool setstorage, char newstorage,
> bool setcompression, char newcompression,
> + bool settoaster, Oid toasterOid,
> LOCKMODE lockmode)

Indexes cannot index toasted values, so why would the toaster oid be
interesting for index storage properties?

> static List *MergeAttributes(List *schema, List *supers, char relpersistence,
> - bool is_partition, List **supconstr);
> + bool is_partition, List **supconstr,
> + Oid accessMethodId);

> toasterapi.h:

> +SearchTsrCache(Oid toasterOid)
> ...
> + for_each_from(lc, ToasterCache, 0)
> + {
> + entry = (ToasterCacheEntry*)lfirst(lc);
> +
> + if (entry->toasterOid == toasterOid)
> + {
> + /* remove entry from list, it will be added in a head of list below */
> + foreach_delete_current(ToasterCache, lc);
> + goto out;
> + }
> + }

Moving toasters seems quite expensive when compared to just index
checks. When you have many toasters, but only a few hot ones, this
currently will move the "cold" toasters around a lot. Why not use a
stack instead (or alternatively, a 'zipper' (or similar) data
structure), where the hottest toaster is on top, so that we avoid
larger memmove calls?

> postgres.h

> +/* varatt_custom uses 16bit aligment */
To the best of my knowledge varlena-pointers are unaligned; and this
is affirmed by the comment immediately under varattrib_1b_e. Assuming
alignment to 16 bits should/would be incorrect in some of the cases.
This is handled for normal varatt_external values by memcpy-ing the
value into local (aligned) fields before using them, but that doesn't
seem to be the case for varatt_custom?

---

0003: (re-implement default toaster using toaster interface)

I see significant changes to the dummy toaster (created in 0002),
could those be moved to patch 0002 in the next iteration?

detoast.c and detoast.h are effectively empty after this patch (only
imports and commented-out code remain), please fully remove them
instead - that saves on patch diff size.

With the new API, I'm getting confused about the location of the
various toast_* functions. They are spread around in various files
that have no clear distinction on why it is (still) located there:
some functions are moved to access/toast/*, while others are moved
around in catalog/toasting.c, access/common/toast_internals.c and
access/table/toast_helper.c.

> detoast.c / tableam.h
According to a quick search, all core usage of
table_relation_fetch_toast_slice is removed. Shouldn't we remove that
tableAM API (+ heapam implementation) instead of updating and
maintaining it? Same question for table_relation_toast_am - I'm not
sure that it remains the correct way of dealing with toast.

> toast_helper.c

toast_delete_external_datum:
Please clean up code that was commented out from the patches, it
detracts from the readability of a patch.

> toast_internals.c

This seems like a bit of a mess, considering the lack of
Can't we split this up into a heaptoast (or whatever we're going to
call the default toaster) and actual toast internals? It seems to me
that

> generic_toaster.c

Could you align name styles in this new file? It has both camelCase
and snake_case for function names.

> toasting.c

I'm not entirely sure that we should retain catalog/toasting.c if we
are going to depend on the custom toaster API. Shouldn't the creation
of toast tables be delegated to the toaster?

> + * toast_get_valid_index
> + *
> + * Get OID of valid index associated to given toast relation. A toast
> + * relation can have only one valid index at the same time.

Although this is code being moved around, the comment is demonstrably
false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
leave a toast relation with 2 valid indexes.

---

0004: refactoring and optimization of default toaster
0005: bytea appendable toaster

I dind't review these yet.

---

0006: docs

Seems like a good start, but I'm not sure that we need the struct
definition in the docs. I think the BRIN extensibility docs [1] are a
better example on what I think the documentation for this API should
look like.

---

0007: fix alignment of custom toast pointers
This is not a valid fix for the alignment requirement for custom toast
pointers. You now leave one byte empty if you are not already aligned,
which for on-disk toast pointers means that we're dealing with a
4-byte aligned value, which is not the case because this is a
2-byte-aligned value.

Regardless, this should be merged into 0002, not remain a seperate patch.

---

0008: fix tuple externalization
Should be merged into the relevant patch as well, not as a separate patch.

---

> Currently I'm working on an update to the default Toaster (some internal optimizations, not affecting functionality)
> and readme files explaining Pluggable TOAST.

That would be greatly appreciated - 0006 does not cover why we need
vtable, nor how it's expected to be used in type-aware code.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CAN-LCVNkU%2Bkdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg%40mail.gmail.com
[1] https://www.postgresql.org/docs/15/brin-extensibility.html

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

26 июля 2022 г., 00:20:23

Hi hackers!

Here is the patch set rebased to master from 22.07. I've got some trouble rebasing it due to

conflicts, so it took some time.

I've made some corrections according to Matthias and Aleksander comments, though not all

of them, because some require refactoring of very old code and would have to take much more

effort. I keep these recommended corrections in mind and already working on them but it will

require extensive testing and some more work, so the will be presented later, in next iteration

or next patch - these are optimization of heap AM according table_relation_fetch_toast_slice,

double-index problem and I'm continue to straighten out code related to TOAST functionality.

It's quite a task because as I mentioned before, this core was scattered over Heap AM and

reference implementation of TOAST is very tightly intertwined with Heap AM itself. Default

toaster uses Heap AM storage so it is unlikely that it will be possible to fully detach it from

Heap.

However, I've made some more refactoring, removed empty sources, corrected code according

to naming conventions, and extended README.toastapi document.

0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,

as recommended, remove Dummy toaster and provide it as an extension), and default Toaster

was left as-is (reference implementation).

0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,

so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.

0004_toaster_snapshot_v8 continues refactoring and has some important changes (added

into README.toastapi new part related TOAST API extensibility - the virtual functions table).

Also, I'll provide documentation package corrected according to Matthias' remarks later,

in the next patch set.

Please check attached patch set.

Also, GIT branch with this patch resides here:

https://github.com/postgrespro/postgres/tree/toasterapi_clean

Thank all reviewers for feedback!

On Sat, Jul 23, 2022 at 10:15 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

Matthias, thank you very much for your feedback!
Sorry, I forgot to attach files.
Attaching here, but they are for the commit tagged "15beta2", I am currently
rebasing this branch onto the actual master and will provide rebased version,
with some corrections according to your feedback, in a day or two.

>Indexes cannot index toasted values, so why would the toaster oid be
>interesting for index storage properties?

Here Teodor might correct me. Toast tables are indexed, and Heap TOAST
mechanics accesses Toasted tuples by index, isn't it the case?

>Moving toasters seems quite expensive when compared to just index
>checks. When you have many toasters, but only a few hot ones, this
>currently will move the "cold" toasters around a lot. Why not use a
>stack instead (or alternatively, a 'zipper' (or similar) data
>structure), where the hottest toaster is on top, so that we avoid
>larger memmove calls?

This is a reasonable remark, we'll consider it for the next iteration. Our reason
is that we think there won't be a lot of custom Toasters, most likely less then
a dozen, for the most complex/heavy datatypes so we haven't considered
these expenses.

>To the best of my knowledge varlena-pointers are unaligned; and this
>is affirmed by the comment immediately under varattrib_1b_e. Assuming
>alignment to 16 bits should/would be incorrect in some of the cases.
>This is handled for normal varatt_external values by memcpy-ing the
>value into local (aligned) fields before using them, but that doesn't
>seem to be the case for varatt_custom?

Alignment correction seemed reasonable for us because structures are
anyway aligned in memory, so when we use 1 and 2-byte fields along
with 4-byte it may create a lot of padding. Am I wrong? Also, correct
alignment somewhat simplifies access to structure fields.

>0003: (re-implement default toaster using toaster interface)
>I see significant changes to the dummy toaster (created in 0002),
>could those be moved to patch 0002 in the next iteration?
Will do.

>detoast.c and detoast.h are effectively empty after this patch (only
>imports and commented-out code remain), please fully remove them
>instead - that saves on patch diff size.
Will do.

About the location of toast_ functions: these functions are part of Heap
TOAST mechanics, and they were scattered among other Heap internals
sources. I've tried to gather them and put them in more straight order, but
this work is not fully finished yet and will take some time. Working on it.

I'll check if table_relation_fetch_toast_slice could be removed, thanks for
the remark.

toast_helper - done, will be provided in rebased version.

toast_internals - this one is an internal part of TOAST implemented in
Heap AM, but I'll try to straighten it out as much as I could.

naming conventions in some sources - done, will be provided in rebased
patch set.

>Shouldn't the creation of toast tables be delegated to the toaster?

Yes, you're right, and actually, it is. I'll check that and correct in rebased
version.

>Although this is code being moved around, the comment is demonstrably
>false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
>leave a toast relation with 2 valid indexes.

This code is quite old, we've not changed it but thanks for the remark,
I'll check it more carefully.

Small fixes are already merged into larger patches in attached files. Also,
I appreciate your feedback on documentation - if you would have an opportunity
please check README provided in 0003. I've took your comments on documentation
into account and will include corrections according to them into rebased patch.

As Aleksander recommended, I've shortened the patch set and left only the most
important part - API and re-implemented default Toast. All bells and whistles are not
of so much importance and could be sent later after the API itself will be straightened
out and commited.

Thank you very much!

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

26 июля 2022 г., 11:22:57

Hi Nikita,

Thanks for an update!

> 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,
> as recommended, remove Dummy toaster and provide it as an extension), and default Toaster
> was left as-is (reference implementation).
>
> 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,
> so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.
>
> 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added
> into README.toastapi new part related TOAST API extensibility - the virtual functions table).

This numbering is confusing. Please use a command like:

```
git format-patch origin/master -v 42
```

This will produce a patchset with a consistent naming like:

```
v42-0001-foo-bar.patch
v42-0002-baz-qux.patch
... etc ...
```

Also cfbot [1] will know in which order to apply them.

> GIT branch with this patch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean

Unfortunately the three patches in question from this branch don't
pass `make check`. Please update
src/test/regress/expected/publication.out and make sure the patchset
passes the rest of the tests at least on one platform before
submitting.

Personally I have a little set of scripts for this [2]. The following
commands should pass:

```
# quick check
./quick-build.sh && ./single-install.sh && make installcheck

# full check
./full-build.sh && ./single-install.sh && make installcheck-world
```

Finally, please update the commit messages. Each commit message should
include a brief description (one line) , a detailed description (the
body), and also the list of the authors, the reviewers and a link to
the discussion. Please use [3] as a template.

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

29 июля 2022 г., 09:16:08

Hi hackers!

Aleksander, thanks for the remark, seems that we've missed recent change - the pubication

test does not have the new column 'Toaster'. Will send a corrected patch tomorrow. Also, thanks

for the patch name note, I've changed it as you suggested.

I'm on vacation, so I read emails not very often and answers take some time, sorry.

On Tue, Jul 26, 2022 at 11:23 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

Thanks for an update!

> 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,
> as recommended, remove Dummy toaster and provide it as an extension), and default Toaster
> was left as-is (reference implementation).
>
> 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,
> so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.
>
> 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added
> into README.toastapi new part related TOAST API extensibility - the virtual functions table).

This numbering is confusing. Please use a command like:

```
git format-patch origin/master -v 42
```

This will produce a patchset with a consistent naming like:

```
v42-0001-foo-bar.patch
v42-0002-baz-qux.patch
... etc ...
```

Also cfbot [1] will know in which order to apply them.

> GIT branch with this patch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean

Unfortunately the three patches in question from this branch don't
pass `make check`. Please update
src/test/regress/expected/publication.out and make sure the patchset
passes the rest of the tests at least on one platform before
submitting.

Personally I have a little set of scripts for this [2]. The following
commands should pass:

```
# quick check
./quick-build.sh && ./single-install.sh && make installcheck

# full check
./full-build.sh && ./single-install.sh && make installcheck-world
```

Finally, please update the commit messages. Each commit message should
include a brief description (one line) , a detailed description (the
body), and also the list of the authors, the reviewers and a link to
the discussion. Please use [3] as a template.

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

02 августа 2022 г., 09:15:12

Hi hackers!

Sorry for the delay.

I've made changes according to Aleksander comments:

>This will produce a patchset with a consistent naming like:
>...
>Also cfbot [1] will know in which order to apply them.

Done.

>Unfortunately the three patches in question from this branch don't
>pass `make check`. Please update
>src/test/regress/expected/publication.out and make sure the patchset
>passes the rest of the tests at least on one platform before
>submitting.

Done, there was absent column in Publication tests.

>Finally, please update the commit messages. Each commit message should
>include a brief description (one line) , a detailed description (the
>body), and also the list of the authors, the reviewers and a link to
>the discussion. Please use [3] as a template.

Done.

Link to the rebased branch:

https://github.com/postgrespro/postgres/tree/toasterapi_clean

Please check.

Attach includes:

v11-0002-toaster-interface.patch - contains TOAST API with default Toaster

left as-is (reference implementation) and Dummy toaster as an example

(will be removed later as a part of refactoring?).

v11-0003-toaster-default.patch - implements reference TOAST as Default Toaster

via TOAST API, so Heap AM calls Toast only via API, and does not have direct

calls to Toast functionality.

v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values

and some refactoring.

Aleksander, thank you and Matthias for the very helpful feedback!

On Fri, Jul 29, 2022 at 9:16 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

Aleksander, thanks for the remark, seems that we've missed recent change - the pubication
test does not have the new column 'Toaster'. Will send a corrected patch tomorrow. Also, thanks
for the patch name note, I've changed it as you suggested.
I'm on vacation, so I read emails not very often and answers take some time, sorry.

On Tue, Jul 26, 2022 at 11:23 AM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi Nikita,

Thanks for an update!

> 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,
> as recommended, remove Dummy toaster and provide it as an extension), and default Toaster
> was left as-is (reference implementation).
>
> 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,
> so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.
>
> 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added
> into README.toastapi new part related TOAST API extensibility - the virtual functions table).

This numbering is confusing. Please use a command like:

```
git format-patch origin/master -v 42
```

This will produce a patchset with a consistent naming like:

```
v42-0001-foo-bar.patch
v42-0002-baz-qux.patch
... etc ...
```

Also cfbot [1] will know in which order to apply them.

> GIT branch with this patch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean

Unfortunately the three patches in question from this branch don't
pass `make check`. Please update
src/test/regress/expected/publication.out and make sure the patchset
passes the rest of the tests at least on one platform before
submitting.

Personally I have a little set of scripts for this [2]. The following
commands should pass:

```
# quick check
./quick-build.sh && ./single-install.sh && make installcheck

# full check
./full-build.sh && ./single-install.sh && make installcheck-world
```

Finally, please update the commit messages. Each commit message should
include a brief description (one line) , a detailed description (the
body), and also the list of the authors, the reviewers and a link to
the discussion. Please use [3] as a template.

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3

--
Best regards,
Aleksander Alekseev

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Andres Freund

Дата:

02 августа 2022 г., 18:37:04

Hi,

On 2022-08-02 09:15:12 +0300, Nikita Malakhov wrote:
> Attach includes:
> v11-0002-toaster-interface.patch - contains TOAST API with default Toaster
> left as-is (reference implementation) and Dummy toaster as an example
> (will be removed later as a part of refactoring?).
> 
> v11-0003-toaster-default.patch - implements reference TOAST as Default
> Toaster
> via TOAST API, so Heap AM calls Toast only via API, and does not have direct
> calls to Toast functionality.
> 
> v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values
> and some refactoring.

I'm a bit confused by the patch numbering - why isn't there a patch 0001 and
0005?

I think 0002 needs to be split further - the relevant part isn't that it
introduces the "dummy toaster" module, it's a large change doing lots of
things, the addition of the contrib module is irrelevant comparatively.

As is the patches unfortunately are close to unreviewable. Lots of code gets
moved around in one patch, then again in the next patch, then again in the
next.


Unfortunately, scanning through these patches, it seems this is a lot of
complexity, with a (for me) comensurate benefit. There's a lot of more general
improvements to toasting and the json type that we can do, that are a lot less
complex than this.


> From 6b35d6091248e120d2361cf0a806dbfb161421cf Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 18:37:21 +0300
> Subject: [PATCH] Pluggable TOAST API interface with dummy_toaster contrib
>  module
> 
> Pluggable TOAST API is introduced with implemented contrib example
> module.
> Pluggable TOAST API consists of 4 parts:
> 1) SQL syntax supports manipulations with toasters - CREATE TABLE ...
> (column type STORAGE storage_type TOASTER toaster), ALTER TABLE ALTER
> COLUMN column SET TOASTER toaster and Toaster definition.
> TOAST API requires earlier patch with CREATE TABLE SET STORAGE clause;
> New column atttoaster is added to pg_attribute.
> Toaster drop is not allowed for not to lose already toasted data;
> 2) New VARATT_CUSTOM data structure with fixed header and variable
> tail to store custom toasted data, with according macros set;

That's adding overhead to every toast interaction, independent of any new
infrastructure being used.



> 4) Dummy toaster implemented via new TOAST API to be used as sample.
> In this patch regular (default) TOAST function is left as-is and not
> yet implemented via new API.
> TOAST API syntax and code explanation provided in additional docs patch.

I'd make this a separate commit.



> @@ -445,6 +447,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
>              return false;
>          if (attr1->attstorage != attr2->attstorage)
>              return false;
> +        if (attr1->atttoaster != attr2->atttoaster)
> +            return false;

So we're increasing pg_attribute - often already the largest catalog table in
a database.

Am I just missing something, or is atttoaster not actually used in this patch?
So most of the contrib module added is unreachable code?


> +/*
> + * Toasters is very often called so syscache lookup and TsrRoutine allocation are
> + * expensive and we need to cache them.

Ugh.

> + * We believe what there are only a few toasters and there is high chance that
> + * only one or only two of them are heavy used, so most used toasters should be
> + * found as easy as possible. So, let us use a simple list, in future it could
> + * be changed to other structure. For now it will be stored in TopCacheContext
> + * and never destroed in backend life cycle - toasters are never deleted.
> + */

That seems not great.


> +typedef struct ToasterCacheEntry
> +{
> +    Oid            toasterOid;
> +    TsrRoutine *routine;
> +} ToasterCacheEntry;
> +
> +static List    *ToasterCache = NIL;
> +
> +/*
> + * SearchTsrCache - get cached toaster routine, emits an error if toaster
> + * doesn't exist
> + */
> +TsrRoutine*
> +SearchTsrCache(Oid    toasterOid)
> +{
> +    ListCell           *lc;
> +    ToasterCacheEntry  *entry;
> +    MemoryContext        ctx;
> +
> +    if (list_length(ToasterCache) > 0)
> +    {
> +        /* fast path */
> +        entry = (ToasterCacheEntry*)linitial(ToasterCache);
> +        if (entry->toasterOid == toasterOid)
> +            return entry->routine;
> +    }
> +
> +    /* didn't find in first position */
> +    ctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
> +    for_each_from(lc, ToasterCache, 0)
> +    {
> +        entry = (ToasterCacheEntry*)lfirst(lc);
> +
> +        if (entry->toasterOid == toasterOid)
> +        {
> +            /* remove entry from list, it will be added in a head of list below */
> +            foreach_delete_current(ToasterCache, lc);

That needs to move later list elements!


> +            goto out;
> +        }
> +    }
> +
> +    /* did not find entry, make a new one */
> +    entry = palloc(sizeof(*entry));
> +
> +    entry->toasterOid = toasterOid;
> +    entry->routine = GetTsrRoutineByOid(toasterOid, false);
> +
> +out:
> +    ToasterCache = lcons(entry, ToasterCache);

That move the whole list around! On a cache hit. Tthis would likely already be
slower than syscache.


> diff --git a/contrib/dummy_toaster/dummy_toaster.c b/contrib/dummy_toaster/dummy_toaster.c
> index 0d261f6042..02f49052b7 100644
> --- a/contrib/dummy_toaster/dummy_toaster.c
> +++ b/contrib/dummy_toaster/dummy_toaster.c

So this is just changing around the code added in the prior commit. Why was it
then included before?


> +++ b/src/include/access/generic_toaster.h

> +HeapTuple
> +heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
> +                            int options);

The generic toast API has heap_* in its name?




> From 4112cd70b05dda39020d576050a98ca3cdcf2860 Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 22:57:21 +0300
> Subject: [PATCH] Versioned rows in TOASTed values for Default Toaster support
> 
> Original TOAST mechanics does not support rows versioning
> for TOASTed values.
> Toaster snapshot - refactored generic toaster, implements
> rows versions check in toasted values to share common parts
> of toasted values between different versions of rows.

This misses explaining *WHY* this is changed.



> diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
> deleted file mode 100644
> index aff8042166..0000000000
> --- a/src/backend/access/common/detoast.c
> +++ /dev/null

These patches really move things around in a largely random way.


> -static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
> -static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
> -
> +static void
> +toast_extract_chunk_fields(Relation toastrel, TupleDesc toasttupDesc,
> +                           Oid valueid, HeapTuple ttup, int32 *seqno,
> +                           char **chunkdata, int *chunksize);
> +
> +static void
> +toast_write_slice(Relation toastrel, Relation *toastidxs,
> +                  int num_indexes, int validIndex,
> +                  Oid valueid, int32 value_size, int32 slice_offset,
> +                  int32 slice_length, char *slice_data,
> +                  int options,
> +                  void *chunk_header, int chunk_header_size,
> +                  ToastChunkVisibilityCheck visibility_check,
> +                  void *visibility_cxt);

What do all these changes have to do with "Versioned rows in TOASTed
values for Default Toaster support"?


> +static void *
> +toast_fetch_old_chunk(Relation toastrel, SysScanDesc toastscan, Oid valueid,
> +                      int32 expected_chunk_seq, int32 last_old_chunk_seq,
> +                      ToastChunkVisibilityCheck visibility_check,
> +                      void *visibility_cxt,
> +                      int32 *p_old_chunk_size, ItemPointer old_tid)
> +{
> +    for (;;)
> +    {
> +        HeapTuple    old_toasttup;
> +        char       *old_chunk_data;
> +        int32        old_chunk_seq;
> +        int32        old_chunk_data_size;
> +
> +        old_toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection);
> +
> +        if (old_toasttup)
> +        {
> +            /* Skip aborted chunks */
> +            if (!HeapTupleHeaderXminCommitted(old_toasttup->t_data))
> +            {
> +                TransactionId xmin = HeapTupleHeaderGetXmin(old_toasttup->t_data);
> +
> +                Assert(!HeapTupleHeaderXminInvalid(old_toasttup->t_data));
> +
> +                if (TransactionIdDidAbort(xmin))
> +                    continue;
> +            }

Why is there visibility logic in quite random places? Also, it's not "legal"
to call TransactionIdDidAbort() without having checked
TransactionIdIsInProgress() first. And what does this this have to do with
snapshots - it's pretty clearly not implementing snapshot logic.


Greetings,

Andres Freund

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 августа 2022 г., 23:18:42

Hi hackers!

I've made a rebase according to Andres and Aleksander comments.

Rebased branch resides here:

https://github.com/postgrespro/postgres/tree/toasterapi_clean

I've decided to leave only the first 2 patches for review and send less significant

changes after the main patch will be straightened out.

So, here is

v13-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST

mechanics left as-is.

v13-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API.

>I'm a bit confused by the patch numbering - why isn't there a patch 0001 and
>0005?
Sorry for the confusion, my fault. The first patch (CREATE TABLE .. SET STORAGE)

is already committed into v15, and several of the late patches weren't included.

I've rearranged patch numbers in this iteration.

>I think 0002 needs to be split further - the relevant part isn't that it
>introduces the "dummy toaster" module, it's a large change doing lots of
>things, the addition of the contrib module is irrelevant comparatively.

Done, contrib /dummy_toaster excluded from main patch and placed in branch

as a separate commit.

>As is the patches unfortunately are close to unreviewable. Lots of code gets
>moved around in one patch, then again in the next patch, then again in the
>next.

So I've decided to put here only the first one while I'm working on the latter to clean

this up - I agree, code in latter patches needs some refactoring. Working on it.

>Unfortunately, scanning through these patches, it seems this is a lot of
>complexity, with a (for me) comensurate benefit. There's a lot of more general
>improvements to toasting and the json type that we can do, that are a lot less
>complex than this.

We have very significant improvements for storing large JSON and a couple of

other TOAST improvements which make a lot of sense, but they are based on

this API. But in the first patch reference TOAST is left as-is, and does not use

TOAST API.

>> 2) New VARATT_CUSTOM data structure with fixed header and variable
>> tail to store custom toasted data, with according macros set;

>That's adding overhead to every toast interaction, independent of any new
>infrastructure being used.

We've performed some tests on this and haven't detected significant overhead,

>So we're increasing pg_attribute - often already the largest catalog table in

>a database.

A little bit, with an OID column storing Toaster OID. We do not see any other way

to keep track of Toaster used by the table's column, because it could be changed

any time by ALTER TABLE ... SET TOASTER.

>Am I just missing something, or is atttoaster not actually used in this patch?
>So most of the contrib module added is unreachable code?

It is necessary for Toasters implemented via TOAST API, the first patch does not

use it directly because reference TOAST is left unchanged. The second one which

implements reference TOAST via TOAST API uses it.

>That seems not great.

About Toasters deletion - we forbid dropping Toasters because if Toaster is dropped

the data TOASTed with it is lost, and as was mentioned before, we think that there

won't be a lot of custom Toasters, likely seems to be less then a dozen.

>That move the whole list around! On a cache hit. Tthis would likely already be
>slower than syscache.

Thank you for the remark, it is questionable approach. I've changed this in current iteration

(patch in attach) to keep Toaster list appended-only if Toaster was not found, and leave

Toaster cache as a straight list - first element in is the head of the list.

Also, documentation on TOAST API is provided in README.toastapi in the first patch -

I'd be grateful for comments on it.

Thanks for the feedback!

On Tue, Aug 2, 2022 at 6:37 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-08-02 09:15:12 +0300, Nikita Malakhov wrote:
> Attach includes:
> v11-0002-toaster-interface.patch - contains TOAST API with default Toaster
> left as-is (reference implementation) and Dummy toaster as an example
> (will be removed later as a part of refactoring?).
>
> v11-0003-toaster-default.patch - implements reference TOAST as Default
> Toaster
> via TOAST API, so Heap AM calls Toast only via API, and does not have direct
> calls to Toast functionality.
>
> v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values
> and some refactoring.

I'm a bit confused by the patch numbering - why isn't there a patch 0001 and
0005?

I think 0002 needs to be split further - the relevant part isn't that it
introduces the "dummy toaster" module, it's a large change doing lots of
things, the addition of the contrib module is irrelevant comparatively.

As is the patches unfortunately are close to unreviewable. Lots of code gets
moved around in one patch, then again in the next patch, then again in the
next.

Unfortunately, scanning through these patches, it seems this is a lot of
complexity, with a (for me) comensurate benefit. There's a lot of more general
improvements to toasting and the json type that we can do, that are a lot less
complex than this.

> From 6b35d6091248e120d2361cf0a806dbfb161421cf Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 18:37:21 +0300
> Subject: [PATCH] Pluggable TOAST API interface with dummy_toaster contrib
> module
>
> Pluggable TOAST API is introduced with implemented contrib example
> module.
> Pluggable TOAST API consists of 4 parts:
> 1) SQL syntax supports manipulations with toasters - CREATE TABLE ...
> (column type STORAGE storage_type TOASTER toaster), ALTER TABLE ALTER
> COLUMN column SET TOASTER toaster and Toaster definition.
> TOAST API requires earlier patch with CREATE TABLE SET STORAGE clause;
> New column atttoaster is added to pg_attribute.
> Toaster drop is not allowed for not to lose already toasted data;
> 2) New VARATT_CUSTOM data structure with fixed header and variable
> tail to store custom toasted data, with according macros set;

That's adding overhead to every toast interaction, independent of any new
infrastructure being used.

> 4) Dummy toaster implemented via new TOAST API to be used as sample.
> In this patch regular (default) TOAST function is left as-is and not
> yet implemented via new API.
> TOAST API syntax and code explanation provided in additional docs patch.

I'd make this a separate commit.

> @@ -445,6 +447,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
> return false;
> if (attr1->attstorage != attr2->attstorage)
> return false;
> + if (attr1->atttoaster != attr2->atttoaster)
> + return false;

So we're increasing pg_attribute - often already the largest catalog table in
a database.

Am I just missing something, or is atttoaster not actually used in this patch?
So most of the contrib module added is unreachable code?

> +/*
> + * Toasters is very often called so syscache lookup and TsrRoutine allocation are
> + * expensive and we need to cache them.

Ugh.

> + * We believe what there are only a few toasters and there is high chance that
> + * only one or only two of them are heavy used, so most used toasters should be
> + * found as easy as possible. So, let us use a simple list, in future it could
> + * be changed to other structure. For now it will be stored in TopCacheContext
> + * and never destroed in backend life cycle - toasters are never deleted.
> + */

That seems not great.

> +typedef struct ToasterCacheEntry
> +{
> + Oid toasterOid;
> + TsrRoutine *routine;
> +} ToasterCacheEntry;
> +
> +static List *ToasterCache = NIL;
> +
> +/*
> + * SearchTsrCache - get cached toaster routine, emits an error if toaster
> + * doesn't exist
> + */
> +TsrRoutine*
> +SearchTsrCache(Oid toasterOid)
> +{
> + ListCell *lc;
> + ToasterCacheEntry *entry;
> + MemoryContext ctx;
> +
> + if (list_length(ToasterCache) > 0)
> + {
> + /* fast path */
> + entry = (ToasterCacheEntry*)linitial(ToasterCache);
> + if (entry->toasterOid == toasterOid)
> + return entry->routine;
> + }
> +
> + /* didn't find in first position */
> + ctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
> + for_each_from(lc, ToasterCache, 0)
> + {
> + entry = (ToasterCacheEntry*)lfirst(lc);
> +
> + if (entry->toasterOid == toasterOid)
> + {
> + /* remove entry from list, it will be added in a head of list below */
> + foreach_delete_current(ToasterCache, lc);

That needs to move later list elements!

> + goto out;
> + }
> + }
> +
> + /* did not find entry, make a new one */
> + entry = palloc(sizeof(*entry));
> +
> + entry->toasterOid = toasterOid;
> + entry->routine = GetTsrRoutineByOid(toasterOid, false);
> +
> +out:
> + ToasterCache = lcons(entry, ToasterCache);

That move the whole list around! On a cache hit. Tthis would likely already be
slower than syscache.

> diff --git a/contrib/dummy_toaster/dummy_toaster.c b/contrib/dummy_toaster/dummy_toaster.c
> index 0d261f6042..02f49052b7 100644
> --- a/contrib/dummy_toaster/dummy_toaster.c
> +++ b/contrib/dummy_toaster/dummy_toaster.c

So this is just changing around the code added in the prior commit. Why was it
then included before?

> +++ b/src/include/access/generic_toaster.h

> +HeapTuple
> +heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
> + int options);

The generic toast API has heap_* in its name?

> From 4112cd70b05dda39020d576050a98ca3cdcf2860 Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 22:57:21 +0300
> Subject: [PATCH] Versioned rows in TOASTed values for Default Toaster support
>
> Original TOAST mechanics does not support rows versioning
> for TOASTed values.
> Toaster snapshot - refactored generic toaster, implements
> rows versions check in toasted values to share common parts
> of toasted values between different versions of rows.

This misses explaining *WHY* this is changed.

> diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
> deleted file mode 100644
> index aff8042166..0000000000
> --- a/src/backend/access/common/detoast.c
> +++ /dev/null

These patches really move things around in a largely random way.

> -static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
> -static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
> -
> +static void
> +toast_extract_chunk_fields(Relation toastrel, TupleDesc toasttupDesc,
> + Oid valueid, HeapTuple ttup, int32 *seqno,
> + char **chunkdata, int *chunksize);
> +
> +static void
> +toast_write_slice(Relation toastrel, Relation *toastidxs,
> + int num_indexes, int validIndex,
> + Oid valueid, int32 value_size, int32 slice_offset,
> + int32 slice_length, char *slice_data,
> + int options,
> + void *chunk_header, int chunk_header_size,
> + ToastChunkVisibilityCheck visibility_check,
> + void *visibility_cxt);

What do all these changes have to do with "Versioned rows in TOASTed
values for Default Toaster support"?

> +static void *
> +toast_fetch_old_chunk(Relation toastrel, SysScanDesc toastscan, Oid valueid,
> + int32 expected_chunk_seq, int32 last_old_chunk_seq,
> + ToastChunkVisibilityCheck visibility_check,
> + void *visibility_cxt,
> + int32 *p_old_chunk_size, ItemPointer old_tid)
> +{
> + for (;;)
> + {
> + HeapTuple old_toasttup;
> + char *old_chunk_data;
> + int32 old_chunk_seq;
> + int32 old_chunk_data_size;
> +
> + old_toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection);
> +
> + if (old_toasttup)
> + {
> + /* Skip aborted chunks */
> + if (!HeapTupleHeaderXminCommitted(old_toasttup->t_data))
> + {
> + TransactionId xmin = HeapTupleHeaderGetXmin(old_toasttup->t_data);
> +
> + Assert(!HeapTupleHeaderXminInvalid(old_toasttup->t_data));
> +
> + if (TransactionIdDidAbort(xmin))
> + continue;
> + }

Why is there visibility logic in quite random places? Also, it's not "legal"
to call TransactionIdDidAbort() without having checked
TransactionIdIsInProgress() first. And what does this this have to do with
snapshots - it's pretty clearly not implementing snapshot logic.

Greetings,

Andres Freund

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

23 августа 2022 г., 11:27:40

Hi Nikita,

> I've decided to leave only the first 2 patches for review and send less significant
> changes after the main patch will be straightened out.
> So, here is
> v13-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST
> mechanics left as-is.
> v13-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API.
> [...]

Great! Thank you.

Unfortunately the patchset still seems to have difficulties passing
the CI checks (see http://cfbot.cputube.org/ ). Any chance we may see
a version rebased to the current `master` branch for the September CF?

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 августа 2022 г., 12:59:23

Hi hackers!

I've rebased actual branch onto the latest master and re-created patches. Checked with git am,

all applied correctly. Please check the attached patches.

Rebased branch resides here:

https://github.com/postgrespro/postgres/tree/toasterapi_clean

Just to remind - I've decided to leave only the first 2 patches for review and send less significant

changes after the main patch will be straightened out.

So, here is

v14-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST

mechanics left as-is.

v14-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API.

On Tue, Aug 23, 2022 at 11:27 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> I've decided to leave only the first 2 patches for review and send less significant
> changes after the main patch will be straightened out.
> So, here is
> v13-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST
> mechanics left as-is.
> v13-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API.
> [...]

Great! Thank you.

Unfortunately the patchset still seems to have difficulties passing
the CI checks (see http://cfbot.cputube.org/ ). Any chance we may see
a version rebased to the current `master` branch for the September CF?

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Jacob Champion

Дата:

13 сентября 2022 г., 00:39:16

On Wed, Aug 24, 2022 at 2:59 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I've rebased actual branch onto the latest master and re-created patches. Checked with git am,
> all applied correctly. Please check the attached patches.
> Rebased branch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean

I tried installing and using the dummy_toaster that's provided with
the gitlink. Upgrade of that cluster fails with the following message:

    pg_restore: creating TOASTER "dummy_toaster"
    pg_restore: while PROCESSING TOC:
    pg_restore: from TOC entry 2044; 9861 16390 TOASTER dummy_toaster (no owner)
    pg_restore: error: could not execute query: ERROR:  unrecognized
or unsupported class OID: 9861
    Command was: CREATE TOASTER "dummy_toaster" HANDLER
"public"."dummy_toaster_handler";

I was looking through the thread for a more in-depth description of
the "vtable" concept, but I didn't see one. It looks like it's just an
arbitrary extension point, and any new additions would require surgery
on whatever function needs the particular magic provided by the
toaster. E.g. your bytea-append toaster extension in the gitlink,
which still has to modify byteacat() in varlena.c to implement a very
specific optimization, and then declares its support for that
hardcoded optimization in the extension.

I'm skeptical that this would remain coherent as it grows. The patch
claims the vtable API is "powerful", which... I suppose it is, if you
get to make arbitrary modifications to the core whenever you implement
it. Did you already have thoughts about which operations would belong
under that umbrella? What would the procedure be for adding
functionality to that API? What happens if a toaster wants to
implement two magic performance optimizations instead of one?

--Jacob

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 сентября 2022 г., 09:44:50

Hi hackers!

Jacob, I agree that the bytea toaster makes a bad example due to core modification,

and actually is not a good example of an extension.

The vtable concept is intended for less invasive additional functionality - like providing

detoast iterators in addition to standard detoast mechanics - such modification requires

only adding iteration methods to toaster and registering them in vtable, without any

core modifications. I'll add this as a separate commit for generic (default) Toaster.

It would be more clear for complex data types like JSONB, where developers could

need some additional functionality to work with internal representation of data type,

and its full potential is revealed in our JSONB toaster extension. The JSONB toaster

is still in development but we plan to make it available soon.

For example, we can pass Toaster options with attoptions (I'm currently working on it)

and these options could, say, allow switching different optimizations in one toaster like

adding specific compression options or data processing directives, etc.

We doubt that there would be a lot of different custom toasters, because the Toaster

is quite a complex piece of machinery, but means for extending them would be heavily

demanded. I have to add some more in-depth explanation of the vtable concept to

README and the documentation package, the dummy toaster contrib does not cover

this topic at all.

On installing dummy_toaster contrib: I've just checked it by making a patch from commit

and applying onto my clone of master and 2 patches provided in previous email without

any errors and sll checks passed - applying with git am, configure with debug, cassert,

depend and enable-tap-tests flags and run checks.

Please advice what would cause such a behavior?

Thank you!

On Tue, Sep 13, 2022 at 12:39 AM Jacob Champion <jchampion@timescale.com> wrote:

On Wed, Aug 24, 2022 at 2:59 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I've rebased actual branch onto the latest master and re-created patches. Checked with git am,
> all applied correctly. Please check the attached patches.
> Rebased branch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean

I tried installing and using the dummy_toaster that's provided with
the gitlink. Upgrade of that cluster fails with the following message:

pg_restore: creating TOASTER "dummy_toaster"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2044; 9861 16390 TOASTER dummy_toaster (no owner)
pg_restore: error: could not execute query: ERROR: unrecognized
or unsupported class OID: 9861
Command was: CREATE TOASTER "dummy_toaster" HANDLER
"public"."dummy_toaster_handler";

I was looking through the thread for a more in-depth description of
the "vtable" concept, but I didn't see one. It looks like it's just an
arbitrary extension point, and any new additions would require surgery
on whatever function needs the particular magic provided by the
toaster. E.g. your bytea-append toaster extension in the gitlink,
which still has to modify byteacat() in varlena.c to implement a very
specific optimization, and then declares its support for that
hardcoded optimization in the extension.

I'm skeptical that this would remain coherent as it grows. The patch
claims the vtable API is "powerful", which... I suppose it is, if you
get to make arbitrary modifications to the core whenever you implement
it. Did you already have thoughts about which operations would belong
under that umbrella? What would the procedure be for adding
functionality to that API? What happens if a toaster wants to
implement two magic performance optimizations instead of one?

--Jacob

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Jacob Champion

Дата:

13 сентября 2022 г., 19:50:31

On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

    =# CREATE EXTENSION dummy_toaster;
    =# CREATE TABLE test (t TEXT
            STORAGE external
            TOASTER dummy_toaster_handler);
    =# \q
    $ initdb -D newdb
    $ pg_ctl -D olddb stop
    $ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 сентября 2022 г., 22:54:54

Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master

(15b4). Also providing patch with documentation package, and the second one contains large

README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -

move all files related to TOAST implementation into new folder /backend/access/toast where

Generic (default) Toaster resides.

Patchset consists of:

v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;

v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:

On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 сентября 2022 г., 15:50:24

Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current

master (from today). The third patch contains documentation package, and the second one contains large

README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -

move all files related to TOAST implementation into new folder /backend/access/toast where

Generic (default) Toaster resides.

Patchset consists of:

v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;

v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

25 сентября 2022 г., 01:41:34

Hi hackers!

Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,

sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

27 сентября 2022 г., 00:26:32

Hi,

Meson build for the patchset failed, meson build files attached and README/Doc package

reworked with more detailed explanation of virtual function table along with other corrections.

On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 октября 2022 г., 01:02:19

Hi hackers!

Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.

Patchset consists of:

v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but

reference TOAST is still unchanged;

v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi,
Meson build for the patchset failed, meson build files attached and README/Doc package
reworked with more detailed explanation of virtual function table along with other corrections.

On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 октября 2022 г., 13:45:38

Hi hackers!

Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.

Patchset consists of:

v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but

reference TOAST is still unchanged;

v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:02 AM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!
Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.

Patchset consists of:
v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi,
Meson build for the patchset failed, meson build files attached and README/Doc package
reworked with more detailed explanation of virtual function table along with other corrections.

On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 октября 2022 г., 13:46:31

Hi hackers!

Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.

Sorry, forgot to attach patch files. My fault.

Patchset consists of:

v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but

reference TOAST is still unchanged;

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!
Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.

Patchset consists of:
v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:02 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.

Patchset consists of:
v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi,
Meson build for the patchset failed, meson build files attached and README/Doc package
reworked with more detailed explanation of virtual function table along with other corrections.

On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

04 октября 2022 г., 22:41:22

Hi hackers!

cfbot is unhappy again, with documentation package.Here's corrected patchset.

Patchset consists of:

v21-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but

reference TOAST is still unchanged;

v21-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v21-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:46 PM Nikita Malakhov <hukutoc@gmail.com> wrote:

Hi hackers!
Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.
Sorry, forgot to attach patch files. My fault.

Patchset consists of:
v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.

Patchset consists of:
v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Oct 4, 2022 at 1:02 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.

Patchset consists of:
v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced but
reference TOAST is still unchanged;
v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi,
Meson build for the patchset failed, meson build files attached and README/Doc package
reworked with more detailed explanation of virtual function table along with other corrections.

On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!
Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,
sorry for the noise.

On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the current
master (from today). The third patch contains documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Again, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at
https://github.com/postgrespro/postgres/tree/toasterapi_clean

On Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
Hi hackers!

Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master
(15b4). Also providing patch with documentation package, and the second one contains large
README.toastapi file providing additional in-depth docs for developers.

Comments would be greatly appreciated.

Also, after checking patch sources I have a strong opinion that it needs some refactoring -
move all files related to TOAST implementation into new folder /backend/access/toast where
Generic (default) Toaster resides.

Patchset consists of:
v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;
v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package

On Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.

Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.

> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?

I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly

=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb

(where <install path>/bin is on the PATH, so we're using the right binaries).

Thanks,
--Jacob

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

--
Regards,

--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 октября 2022 г., 02:13:23

Hi hackers!

Due to recent changes in postgres.h cfbot is failing again.

Here's rebased patch:

v22-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API is introduced but

reference TOAST is still left unchanged;

v22-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as Default Toaster;
v22-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

Please note that because the development goes on, the actual branch contains much more than is provided here.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

13 октября 2022 г., 10:31:45

Hi hackers,

Fixed warning that failed build in previous patchset.

Here's rebased patch:

v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API is introduced but

reference TOAST is still left unchanged;

v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as Default Toaster;
v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

For TOAST API explanation please check /src/backend/access/README.toastapi

Actual GitHub branch resides at

https://github.com/postgrespro/postgres/tree/toasterapi_clean

Please note that because the development goes on, the actual branch contains much more than is provided here.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

21 октября 2022 г., 16:01:15

Hi Nikita,

> Here's rebased patch:
> v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API
isintroduced but
 
> reference TOAST is still left unchanged;
> v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as
DefaultToaster;
 
> v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Thanks for keeping the patch up to date.

As I recall one of the open questions was: how this feature is
supposed to work with table access methods? Could you please summarize
what the current consensus is in this respect?

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

22 октября 2022 г., 01:36:31

Hi!

Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification is needed

to make it work with Pluggable Storage (Access Methods) API.

On Fri, Oct 21, 2022 at 4:01 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Here's rebased patch:
> v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API is introduced but
> reference TOAST is still left unchanged;
> v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as Default Toaster;
> v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Thanks for keeping the patch up to date.

As I recall one of the open questions was: how this feature is
supposed to work with table access methods? Could you please summarize
what the current consensus is in this respect?

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

22 октября 2022 г., 11:58:10

Hi Nikita,

> Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification
isneeded
 
> to make it work with Pluggable Storage (Access Methods) API.

Could you please clarify this a little from the architectural point of view?

Let's say company A implements some specific TableAM (in-memory / the
one that uses undo logging / etc). Company B implements an alternative
TOAST mechanism.

How the TOAST extension is going to work without knowing any specifics
of the TableAM the user chooses for the given relation, and vice
versa? How one of the extensions is going to upgrade / downgrade
between the versions without knowing any implementation details of
another?

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 октября 2022 г., 00:00:12

Hi!

Aleksander, this is a good question.

If I understood you correctly, you mean that the alternative TOAST mechanism B is using a specific

Table AM A?

Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is

free to use any storage methods - we've tested it with some custom Toaster, because it is completely

hidden from the caller, and is not limited to Heap, though extensions' interdependencies is a very tricky

question, and surely not the one to be answered quickly.

Still, I have good news on this topic - I'm currently re-working Pluggable TOAST in a more OOP-correct

way, generalizing Table to Toaster relation from column attribute and reloptions with separate catalog table

describing Relation,Toaster and TOAST storage entities relations, with lazy TOAST Tables creation for

the Generic Toaster, and dropping the limits of 1 TOAST table per relation. In current implementation

Toaster OID and TOAST relation ID are stored as a part of Relation, which is not the best solution, and

leaves some Toaster's nuts and bolts open to AM that uses it, and we decided to hide this part into Toaster

too.

The next logical step is using Table AM API, if Table AM Routine is provided to Toaster, instead of direct

calls to Heap AM methods.

This was thought of in the following way:

Table AM Routine is passed to Toaster as a parameter, and direct Heap calls are replaced with the TAM

Routine calls. This is possible, but needs further investigation, because TOAST manipulations with data

require, as it is seen from the first dive into TAM API, some extension of this API.

I'll present the results of our research as soon as they're ready.

On Sat, Oct 22, 2022 at 11:58 AM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification is needed
> to make it work with Pluggable Storage (Access Methods) API.

Could you please clarify this a little from the architectural point of view?

Let's say company A implements some specific TableAM (in-memory / the
one that uses undo logging / etc). Company B implements an alternative
TOAST mechanism.

How the TOAST extension is going to work without knowing any specifics
of the TableAM the user chooses for the given relation, and vice
versa? How one of the extensions is going to upgrade / downgrade
between the versions without knowing any implementation details of
another?

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

23 октября 2022 г., 12:38:06

Hi Nikita,

> Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is
> free to use any storage methods

Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)

I believe the proposed approach is architecturally broken from the beginning.

It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.

Does it make sense?

Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.

[1]: https://www.postgresql.org/docs/current/tableam.html

--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

23 октября 2022 г., 23:38:13

Hi!

Aleksander,

>Don't you think that this is an arguable design decision? Basically
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)

>I believe the proposed approach is architecturally broken from the beginning.

Existing TOAST mechanics just works, but for certain types of data it does so

very poorly, and, let's face it, this mechanics has very strict limitations that limit

overall capabilities of DBMS, because TOAST was designed when today's

usual amounts of data were not the case - I mean tables with hundreds of

billions of rows, with sizes measured by hundreds of Gb and even by Terabytes.

But TOAST itself is good solution to problem of storing oversized attributes, and

though it has some limitations - it is unwise to just throw it away, better way is to

make it up-to-date by revising it, get rid of the most painful limitations and allow

to use different (custom) TOAST strategies for special cases.

The main idea of Pluggable TOAST is to extend TOAST capabilities by providing

common API allowing to uniformly use different strategies to TOAST different data.

With the acronym "TOAST" I mean that data would be stored externally to source

table, somewhere only its Toaster know where and how - it may be regular Heap

tables, Heap tables with different table structure, some other AM tables, files outside

of the database, even files on different storage systems. Pluggable TOAST allows

using advanced compression methods and complex operations on externally stored

data, like search without fully de-TOASTing data, etc.

Also, existing TOAST is a part of Heap AM and is restricted to use Heap only.

To make it extensible - we have to separate TOAST from Heap AM. Default TOAST

in Pluggable TOAST still uses Heap, but Heap knows nothing about TOAST. It fits

perfectly in OOP paradigms

>It looks like the idea should be actually turned inside out. I.e. what
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.

To implement different TOAST strategies you must have an API to plug them in,

otherwise for each strategy you'd have to change the core. TOAST API allows to plug

in custom TOAST strategies just by adding contrib modules, once the API is merged

into the core. I have to make a point that different TOAST strategies do not have

to store data with other TAMs, they just could store these data in Heap but using

knowledge of internal data structure of workflow to store them in a more optimal

way - like fast and partially compressed and decompressed JSON, lots of large

chunks of binary data stored in the database (as you know, largeobjects are not

of much help with this) and so on.

Implementing another Table AM just to implement another TOAST strategy seems too

much, the TAM API is very heavy and complex, and you would have to add it as a contrib.

Lots of different TAMs would cause much more problems than lots of Toasters because

such a solution results in data incompatibility between installations with different TAMs

and some minor changes in custom TAM contrib could lead to losing all data stored with

this TAM, but with custom TOAST you (in the worst case) could lose just TOASTed data

and nothing else.

We have lots of requests from clients and tickets related to TOAST limitations and

extending Postgres this way - this growing need made us develop Pluggable TOAST.

On Sun, Oct 23, 2022 at 12:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is
> free to use any storage methods

Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)

I believe the proposed approach is architecturally broken from the beginning.

It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.

Does it make sense?

Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.

[1]: https://www.postgresql.org/docs/current/tableam.html

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

24 октября 2022 г., 12:10:18

Hi Nikita,

I don't argue with most of what you say. I am just pointing out the
reason why the chosen approach "N TOASTers x M TableAMs" will not
work:

> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> [...]

Also I completely agree with:

> Implementing another Table AM just to implement another TOAST strategy seems too
> much, the TAM API is very heavy and complex, and you would have to add it as a contrib.

This is what I meant above when talking about the framework for
simplifying this task:

> It looks like the idea should be actually turned inside out. I.e. what
> would be nice to have is some sort of _framework_ that helps TableAM
> authors to implement TOAST (alternatively, the rest of the TableAM
> except for TOAST) if the TableAM is similar to the default one.

From the user perspective it's much easier to think about one entity -
TableAM, and choosing from heapam_with_default_toast and
heapam_with_different_toast.

From the extension implementer point of view creating TableAMs is a
difficult task. This is what the framework should address. Ideally the
interface should be as simple as:

CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
arguments, in the future...)

Where the extension author should be worried only about an alternative
TOAST implementation.

I think at some point such a framework may address at least one more
issue we have - an inability to change the page size on the table
level. As it was shown by Tomas Vondra [1] the default 8 KB page size
can be suboptimal depending on the load. So it would be nice if the
user could change it without rebuilding PostgreSQL. Naturally this is
out of scope of this particular patchset. I just wanted to point out
opportunities we have here.

[1]: https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com

--
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 октября 2022 г., 14:16:07

Hi!

>I don't argue with most of what you say. I am just pointing out the
>reason why the chosen approach "N TOASTers x M TableAMs" will not
>work:

We assume that TAM used in custom Toaster works as it is should work,

and leave TAM internals to this TAM developer - say, we do not want to

change internals of Heap AM.

We don't want to create some kind of silver bullet. There are already existing

and widely-known (from production environments) problems with TOAST

mechanics, and we suggest not too complex way to solve them.

As I mentioned before, Pluggable TOAST does not change Heap AM, it is

not minded to change TAMs.

>This is what I meant above when talking about the framework for
>simplifying this task:

That's a kind of generalizing custom TOAST implementation. It is very

good intention, but keep in mind that different kinds of data require very

different approach to external storage - say, JSON TOAST works with

maps of keys and values, super binary object (experimental name) does

not work with internals of TOASTed data except searching. But, we thought

about that too and reusable code resides in toast_internals.c source - any

custom Toaster working with Heap could use it's insert, update and fetch

methods, but deal with data on it's own.

Even with the general framework there must be a common interface which

would be the entry point for those custom methods developed with the

framework. That's what the TOAST API is - just an interface that all custom

TOAST implementations use to have a common entry point from any TAM,

with syntax support to plug in custom TOAST implementations from the SQL.

No less, but no more.

Moreover, our patches show that even Generic (default) TOAST implementation

could still be left as-is, without necessity to route it via our API, though it is logically

wrong because common API is meant to be common for all TOAST implementations

without exceptions.

Have I answered your question? Please don't hesitate to point to any unclear

parts, I'd be glad to explain that.

The main idea in TOAST API is very elegant and light, and it is designed alike

to Pluggable Storage (Table AM API).

On Mon, Oct 24, 2022 at 12:10 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

I don't argue with most of what you say. I am just pointing out the
reason why the chosen approach "N TOASTers x M TableAMs" will not
work:

> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> [...]

Also I completely agree with:

> Implementing another Table AM just to implement another TOAST strategy seems too
> much, the TAM API is very heavy and complex, and you would have to add it as a contrib.

This is what I meant above when talking about the framework for
simplifying this task:

> It looks like the idea should be actually turned inside out. I.e. what
> would be nice to have is some sort of _framework_ that helps TableAM
> authors to implement TOAST (alternatively, the rest of the TableAM
> except for TOAST) if the TableAM is similar to the default one.

From the user perspective it's much easier to think about one entity -
TableAM, and choosing from heapam_with_default_toast and
heapam_with_different_toast.

From the extension implementer point of view creating TableAMs is a
difficult task. This is what the framework should address. Ideally the
interface should be as simple as:

CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
arguments, in the future...)

Where the extension author should be worried only about an alternative
TOAST implementation.

I think at some point such a framework may address at least one more
issue we have - an inability to change the page size on the table
level. As it was shown by Tomas Vondra [1] the default 8 KB page size
can be suboptimal depending on the load. So it would be nice if the
user could change it without rebuilding PostgreSQL. Naturally this is
out of scope of this particular patchset. I just wanted to point out
opportunities we have here.

[1]: https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

24 октября 2022 г., 14:55:45

Hi Nikita,

> >I don't argue with most of what you say. I am just pointing out the
> >reason why the chosen approach "N TOASTers x M TableAMs" will not
> >work:
>
> We assume that TAM used in custom Toaster works as it is should work,
> and leave TAM internals to this TAM developer - say, we do not want to
> change internals of Heap AM.
>
> We don't want to create some kind of silver bullet.

This is exactly the point. In order to not to create a silver bullet,
TOASTers should be limited to a single TableAM. The one we know uses
pages of a known fixed size, the one that actually requires TOAST
because pages are relatively small, etc.

> That's what the TOAST API is - just an interface that all custom
> TOAST implementations use to have a common entry point from any TAM,
> [...]

I believe this is the source of misunderstanding. Note that not _any_
TableAM needs TOAST to begin with. As an example, if one chooses to
implement a column-organized TableAM that stores all text/bytea
attributes in a separate dictionary file this person doesn't need
TOAST and doesn't want to be constrained by the need of choosing one.

For this reason the "N TOASTers x M TableAMs" approach is
architecturally broken.

> keep in mind that different kinds of data require very
> different approach to external storage - say, JSON TOAST works with
> maps of keys and values, [...]

To clarify: is an ability to specify TOASTers for given columns and/or
types also part of the plan?

> Have I answered your question? Please don't hesitate to point to any unclear
> parts, I'd be glad to explain that.

No. To be honest, it looks like you are merely discarding most/any
feedback the community provided so far.

I really think that pluggable TOASTers would be a great feature.
However if the goal is to get it into the core I doubt that we are
going to make much progress with the current approach.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 октября 2022 г., 15:59:44

Hi,

>This is exactly the point. In order to not to create a silver bullet,
>TOASTers should be limited to a single TableAM. The one we know uses
>pages of a known fixed size, the one that actually requires TOAST
>because pages are relatively small, etc.

Currently all our TOAST implementations use Heap AM, except ones

that use external (truly external, i.e. files outside DB) storage. Using Table AM

Routine and routing AM methods calls via it is a topic for further discussion,

if Pluggable TOAST will be committed. And even then it would be an open issue.

>I believe this is the source of misunderstanding. Note that not _any_
>TableAM needs TOAST to begin with. As an example, if one chooses to
>implement a column-organized TableAM that stores all text/bytea
>attributes in a separate dictionary file this person doesn't need
>TOAST and doesn't want to be constrained by the need of choosing one.

>For this reason the "N TOASTers x M TableAMs" approach is
>architecturally broken.

TOAST implementation is not necessary for Table AM. And TOAST API is just

an optional open interface - SET TOASTER is an option for CREATE/ALTER

TABLE command. In previous discussion we haven't mentioned an approach

"N TOASTers x M TableAMs".

>To clarify: is an ability to specify TOASTers for given columns and/or
>types also part of the plan?

For table columns it is already supported by the syntax part of the TOAST API.

For data types we reserved the validation part of the API, but this support is still a

subject for discussion, although we think it will be very handy for DB users, like

we issue something like:

CREATE TYPE ... TOASTER=jsonb_toaster ... ;

ALTER TYPE JSONB SET TOASTER jsonb_toaster;

and do not have to set special toaster for jsonb column each time we create

or alter a table with it.

>No. To be honest, it looks like you are merely discarding most/any
>feedback the community provided so far.

Very sorry to read that. Almost all of the requests in this discussion have been taken

into account in patches, and the most serious one - I mean pg_attribute expansion

which was mentioned by Tom Lane and Robert Haas - is being fixed right now and

will be ready very soon.

>I really think that pluggable TOASTers would be a great feature.
>However if the goal is to get it into the core I doubt that we are
>going to make much progress with the current approach.

We hope we will. This feature is very demanded by end-users, and will be even more

as time goes by - current TOAST limitations and how they affect DBMS performance is

a serious drawback in comparison to competitors.

On Mon, Oct 24, 2022 at 2:55 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> >I don't argue with most of what you say. I am just pointing out the
> >reason why the chosen approach "N TOASTers x M TableAMs" will not
> >work:
>
> We assume that TAM used in custom Toaster works as it is should work,
> and leave TAM internals to this TAM developer - say, we do not want to
> change internals of Heap AM.
>
> We don't want to create some kind of silver bullet.

This is exactly the point. In order to not to create a silver bullet,
TOASTers should be limited to a single TableAM. The one we know uses
pages of a known fixed size, the one that actually requires TOAST
because pages are relatively small, etc.

> That's what the TOAST API is - just an interface that all custom
> TOAST implementations use to have a common entry point from any TAM,
> [...]

I believe this is the source of misunderstanding. Note that not _any_
TableAM needs TOAST to begin with. As an example, if one chooses to
implement a column-organized TableAM that stores all text/bytea
attributes in a separate dictionary file this person doesn't need
TOAST and doesn't want to be constrained by the need of choosing one.

For this reason the "N TOASTers x M TableAMs" approach is
architecturally broken.

> keep in mind that different kinds of data require very
> different approach to external storage - say, JSON TOAST works with
> maps of keys and values, [...]

To clarify: is an ability to specify TOASTers for given columns and/or
types also part of the plan?

> Have I answered your question? Please don't hesitate to point to any unclear
> parts, I'd be glad to explain that.

No. To be honest, it looks like you are merely discarding most/any
feedback the community provided so far.

I really think that pluggable TOASTers would be a great feature.
However if the goal is to get it into the core I doubt that we are
going to make much progress with the current approach.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

24 октября 2022 г., 16:53:20

Hi Nikita,

> Using Table AM Routine and routing AM methods calls via it is a topic for further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open issue.

From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).

> TOAST implementation is not necessary for Table AM.

What other use cases for TOAST do you have in mind?

>> > Have I answered your question? Please don't hesitate to point to any unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.

To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").

This what I mean by the community feedback being discarded.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 октября 2022 г., 17:44:35

Hi!

>From personal experience with the project I have serious doubts this
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).

Actually, there's a documentation part in the patchset. Also, there is README file

explaining the API.

In addition, we have several custom TOAST implementations with some

results - they were published and presented on PgCon. I was asked to exclude

custom TOAST implementations and some further improvements for the first

iteration, that's why currently the patchset consists only of 3 patches - base

core changes, default TOAST implementation via TOAST API and documentation

package.

>What other use cases for TOAST do you have in mind?

The main use case is the same as for the TOAST mechanism - storing and retrieving

oversized data. But we expanded this case with some details -

- update TOASTed data (yes, current TOAST implementation cannot update stored

data - is marks whole TOASTED object as dead and stores new one);

- retrieve part of the stored data chunks without fully de-TOASTing stored data (even

with existing TOAST this will be painful if you have to get just a small part of the several

hundreds Mb sized object);

- be able to store objects of size larger than 1 Gb;

- store more than 4 Tb of TOASTed data for one table;

- optimize storage for fast search and retrieval of parts of TOASTed object - this is

must-have for effectively using JSON, PostgreSQL already is in catching-up position

in JSON performance field.

For all this cases we have test results that show improvements in storage and performance.

>To clarify, the concern about "N TOASTers vs M TableAM" was expressed
>by Robert Haas back in Jan 2022:

>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.

>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").

>This what I mean by the community feedback being discarded.

Maybe there was some misunderstanding, I was new to this project and

company at that time - I was introduced to is in the middle of December

2021, but Theodor Sigaev gave an answer to Mr. Haas:

>Right. that's why we propose a validate method (may be, it's a wrong
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.

And this is generalized and correct from the OOP POV mean to provide a

way to ensure that this concrete TOAST implementation is valid for Table AM

calling it.

On Mon, Oct 24, 2022 at 4:53 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Using Table AM Routine and routing AM methods calls via it is a topic for further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open issue.

From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).

> TOAST implementation is not necessary for Table AM.

What other use cases for TOAST do you have in mind?

>> > Have I answered your question? Please don't hesitate to point to any unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.

To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").

This what I mean by the community feedback being discarded.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

24 октября 2022 г., 18:37:31

Hi Nikita,

> > > TOAST implementation is not necessary for Table AM.
>
> >What other use cases for TOAST do you have in mind?
>
> The main use case is the same as for the TOAST mechanism - storing and retrieving
> oversized data. But we expanded this case with some details -
> - update TOASTed data (yes, current TOAST implementation cannot update stored
> data - is marks whole TOASTED object as dead and stores new one);
> - retrieve part of the stored data chunks without fully de-TOASTing stored data (even
> with existing TOAST this will be painful if you have to get just a small part of the several
>  hundreds Mb sized object);
> - be able to store objects of size larger than 1 Gb;
> - store more than 4 Tb of TOASTed data for one table;
> - optimize storage for fast search and retrieval of parts of TOASTed object - this is
> must-have for effectively using JSON, PostgreSQL already is in catching-up position
> in JSON performance field.

I see. Actually most of this is what TableAM does. We just happened to
give this part of TableAM a separate name. The only exception is the
last case, when you create custom TOASTers for particular types and
potentially specify them for the given column.

All in all, this makes sense.

> Right. that's why we propose a validate method  (may be, it's a wrong
> name, but I don't known better one) which accepts several arguments, one
> of which is table AM oid. If that method returns false then toaster
> isn't useful with current TAM, storage or/and compression kinds, etc.

OK, I missed this message. So there was some misunderstanding after
all, sorry for this.

That's exactly what I wanted to know. It's much better than allowing
any TOASTer to run on top of any TableAM.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

24 октября 2022 г., 22:24:58

Hi,

Aleksander, thanks for the discussion! It seems to me that I have to add some parts

of it to API documentation, to clarify the details on API purpose and use-cases.

On Mon, Oct 24, 2022 at 6:37 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> > > TOAST implementation is not necessary for Table AM.
>
> >What other use cases for TOAST do you have in mind?
>
> The main use case is the same as for the TOAST mechanism - storing and retrieving
> oversized data. But we expanded this case with some details -
> - update TOASTed data (yes, current TOAST implementation cannot update stored
> data - is marks whole TOASTED object as dead and stores new one);
> - retrieve part of the stored data chunks without fully de-TOASTing stored data (even
> with existing TOAST this will be painful if you have to get just a small part of the several
> hundreds Mb sized object);
> - be able to store objects of size larger than 1 Gb;
> - store more than 4 Tb of TOASTed data for one table;
> - optimize storage for fast search and retrieval of parts of TOASTed object - this is
> must-have for effectively using JSON, PostgreSQL already is in catching-up position
> in JSON performance field.

I see. Actually most of this is what TableAM does. We just happened to
give this part of TableAM a separate name. The only exception is the
last case, when you create custom TOASTers for particular types and
potentially specify them for the given column.

All in all, this makes sense.

> Right. that's why we propose a validate method (may be, it's a wrong
> name, but I don't known better one) which accepts several arguments, one
> of which is table AM oid. If that method returns false then toaster
> isn't useful with current TAM, storage or/and compression kinds, etc.

OK, I missed this message. So there was some misunderstanding after
all, sorry for this.

That's exactly what I wanted to know. It's much better than allowing
any TOASTer to run on top of any TableAM.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

01 ноября 2022 г., 13:05:36

Hi Nikita,

> Aleksander, thanks for the discussion! It seems to me that I have to add some parts
> of it to API documentation, to clarify the details on API purpose and use-cases.

I'm in the process of testing and reviewing the v23 patchset. So far I
just wanted to report that there seems to be certain issues with the
formatting and/or trailing whitespaces in the patches:

```
$ git am <v23-0001-toaster-interface.patch
Applying: Pluggable TOAST API interface along with reference TOAST mechanics
.git/rebase-apply/patch:1122: indent with spaces.
       TsrRoutine *tsr = makeNode(TsrRoutine);
.git/rebase-apply/patch:1123: indent with spaces.
       PG_RETURN_POINTER(tsr);
.git/rebase-apply/patch:6172: new blank line at EOF.
+
warning: 3 lines add whitespace errors.

$ git am <v23-0002-toaster-default.patch
Applying: Default TOAST re-implemented using Toaster API.
.git/rebase-apply/patch:1879: indent with spaces.
    if (*value == old_value)
.git/rebase-apply/patch:1881: indent with spaces.
        return;
.git/rebase-apply/patch:3069: trailing whitespace.
 *                              CREATE TOASTER name HANDLER     handler_name
.git/rebase-apply/patch:3603: trailing whitespace.
fetch_toast_slice(Relation toastrel, Oid valueid,
.git/rebase-apply/patch:3654: trailing whitespace.
toast_fetch_toast_slice(Relation toastrel, Oid valueid,
warning: 5 lines add whitespace errors.

$ git am <v23-0003-toaster-docs.patch
Applying: Pluggable TOAST API documentation package
.git/rebase-apply/patch:388: tab in indent.
        TsrRoutine *tsrroutine = makeNode(TsrRoutine);
.git/rebase-apply/patch:389: tab in indent.
        tsrroutine->init = custom_toast_init;
.git/rebase-apply/patch:390: tab in indent.
        tsrroutine->toast = custom_toast;
.git/rebase-apply/patch:391: tab in indent.
        tsrroutine->detoast = custom_detoast;
.git/rebase-apply/patch:392: tab in indent.
        tsrroutine->deltoast = custom_delete_toast;
warning: squelched 12 whitespace errors
warning: 17 lines add whitespace errors.
```

Please don't forget to run `pgindent` before formatting the patches
with `git format-patch` next time.

I'm going to submit a more detailed code review soon.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

02 ноября 2022 г., 00:15:27

Hi Nikita,

> Please don't forget to run `pgindent` before formatting the patches
> with `git format-patch` next time.

There are also some compiler warnings, please see the attachment.

> I'm going to submit a more detailed code review soon.

-- 
Best regards,
Aleksander Alekseev

Вложения

warnings.txt

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

03 ноября 2022 г., 13:38:57

Hi Nikita,

> I'm going to submit a more detailed code review soon.

As promised, here is my feedback.

```
+      Toaster could be assigned to toastable column with expression
+      <literal>STORAGE external TOASTER
<replaceable>toaster_name</replaceable></literal>
```

1.1. Could you please clarify what is the motivation behind such a
verbose syntax?

```
+Please note that custom Toasters could be assigned only to External
+storage type.
```

1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
strategies as well. On top of that, why should custom TOASTers have
any knowledge of the default four-stage algorithm and the storage
strategies? If the storage strategy is actually ignored, it shouldn't
be used in the syntax.

```
+Toasters could use any storage, advanced compression, encryption, and
```

2. Although it's possible to implement some encryption in a TOASTer I
don't think the documentation should advertise this.

```
+typedef struct TsrRoutine
+{
+    NodeTag        type;
+
+    /* interface functions */
+    toast_init init;
+    toast_function toast;
+    update_toast_function update_toast;
+    copy_toast_function copy_toast;
+    detoast_function detoast;
+    del_toast_function deltoast;
+    get_vtable_function get_vtable;
+    toastervalidate_function toastervalidate;
+} TsrRoutine;
```

3.1. I believe we should rename this to something like `struct
ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
not a routine.
3.2. The names of the fields should be made consistent - e.g. if you
have update_toast and copy_toast then del_toast should be renamed to
delete_toast.
3.2. Additionally, in some parts of the path del_toast is used, while
in others - deltoast.

4. The user documentation should have clear answers on the following questions:
4.1. What will happen if the user tries to delete a TOASTer while
still having data that was TOASTed using this TOASTer? Or this is not
supported and the TOASTers should exist in the database indefinitely
after creation?
4.2. Is it possible to delete and/or alter the default TOASTer?
4.3. Please make sure the previously discussed clarification regarding
"N TOASTers vs M TableAMs" and the validation function is explicitly
present.

```
Toaster initialization. Initial TOAST table creation and other startup
preparations.
```

5.1. The documentation should clarify how many times init() is called
- is it done once for the TOASTer, once per relation, etc.
5.2. Why there is no free() method?

```
Updates TOASTed value. Returns new TOAST pointer.
```

6. It's not clear for what reason update_toast() will be executed to
begin with. This should be clarified in the documentation. How does it
differ from copy_toast()?

```
Validates Toaster for given data type, storage and compression options.
```

7. It should be explicitly stated when validate() is called and what
happens depending on the return value.

```
+Virtual Functions Table API Extension
```

8. IMO this section does a very poor job in explaining what this is
for and when I may want to use this; what particular problem are we
addressing here?

9. There are typos in the comments and the documentation,
s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing
articles. Please run your patches through a spellchecker.

I suggest we address this piece of feedback before proceeding further.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

03 ноября 2022 г., 14:33:47

Hi!

Thank you very much for the feedback.

Answering accordingly to questions/remarks order:

>I'm in the process of testing and reviewing the v23 patchset. So far I
>just wanted to report that there seems to be certain issues with the
>formatting and/or trailing whitespaces in the patches:

Accepted, will be done.

>There are also some compiler warnings, please see the attachment.

This too. There warnings showed up recently due to fresh changes in macros.

>1.1. Could you please clarify what is the motivation behind such a
>verbose syntax?

Toaster is set for the table column. Each TOASTable column could have

a different Toaster, so column option is the most obvious place to add it.

>1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
>strategies as well. On top of that, why should custom TOASTers have
>any knowledge of the default four-stage algorithm and the storage
>strategies? If the storage strategy is actually ignored, it shouldn't
>be used in the syntax.

EXTENDED storage strategy means that TOASTed value is compressed

before being TOASTed, so no knowledge of its internals could be of any

use. EXTERNAL strategy means that value is being TOASTed in original

form. Storage strategy is the thing internal to AM used, and TOAST

mechanics is not meant to interfere with it. Again, STORAGE EXTERNAL

explicitly shows that value will be stored out-of-line.

>2. Although it's possible to implement some encryption in a TOASTer I
>don't think the documentation should advertise this.

It is a good example of what could the Toaster be responsible for, because

we proposed moving compression into Toasters as a TOAST option -

for example, being set while initializing the Toaster.

>3.1. I believe we should rename this to something like `struct
>ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
>not a routine.

It was done similar to Table AM Routine (please check Pluggable

Storage API), along with some other decisions.

>3.2. The names of the fields should be made consistent - e.g. if you
>have update_toast and copy_toast then del_toast should be renamed to
>delete_toast.

>3.2. Additionally, in some parts of the path del_toast is used, while
>in others - deltoast.

Accepted, will be fixed in the next rebase.

>4. The user documentation should have clear answers on the following questions:
>4.1. What will happen if the user tries to delete a TOASTer while
>still having data that was TOASTed using this TOASTer? Or this is not
>supported and the TOASTers should exist in the database indefinitely
>after creation?
>4.2. Is it possible to delete and/or alter the default TOASTer?
>4.3. Please make sure the previously discussed clarification regarding
>"N TOASTers vs M TableAMs" and the validation function is explicitly
>present.

Thank you very much for checking the documentation package. These are

very good comments, I'll include these topics in the next patchset.

>5.1. The documentation should clarify how many times init() is called
>- is it done once for the TOASTer, once per relation, etc.
>5.2. Why there is no free() method?

These too. About free() method - Toasters are not meant to be deleted,

we mentioned this several times. They exist once they are created as long

as the DB itself. Have I answered your question?

>6. It's not clear for what reason update_toast() will be executed to
>begin with. This should be clarified in the documentation. How does it
>differ from copy_toast()?

It is not clear because current TOAST mechanics does not have UPDATE

functionality - it doesn't actually update TOASTed value, it marks this value

"dead" and inserts a new one. This is the cause of TOAST tables bloating

that is being complained about by many users. Update method is provided

for implementation of UPDATE operation.

>7. It should be explicitly stated when validate() is called and what
>happens depending on the return value.

Accepted, will correct this topic.

>8. IMO this section does a very poor job in explaining what this is
>for and when I may want to use this; what particular problem are we
>addressing here?

I already answered this question, maybe the answer was not very clear.

This is just an extension entry point, because for some more advanced

functionality existing pre-defined set of methods would be not enough, i.e.

special Toasters for complex datatypes like JSONb, that have complex

internal structure and may require additional ways to interact with it along

toast/detoast/update/delete.

Also, I'll check README and documentation for typos.

On Thu, Nov 3, 2022 at 1:39 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> I'm going to submit a more detailed code review soon.

As promised, here is my feedback.

```
+ Toaster could be assigned to toastable column with expression
+ <literal>STORAGE external TOASTER
<replaceable>toaster_name</replaceable></literal>
```

1.1. Could you please clarify what is the motivation behind such a
verbose syntax?

```
+Please note that custom Toasters could be assigned only to External
+storage type.
```

1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
strategies as well. On top of that, why should custom TOASTers have
any knowledge of the default four-stage algorithm and the storage
strategies? If the storage strategy is actually ignored, it shouldn't
be used in the syntax.

```
+Toasters could use any storage, advanced compression, encryption, and
```

2. Although it's possible to implement some encryption in a TOASTer I
don't think the documentation should advertise this.

```
+typedef struct TsrRoutine
+{
+ NodeTag type;
+
+ /* interface functions */
+ toast_init init;
+ toast_function toast;
+ update_toast_function update_toast;
+ copy_toast_function copy_toast;
+ detoast_function detoast;
+ del_toast_function deltoast;
+ get_vtable_function get_vtable;
+ toastervalidate_function toastervalidate;
+} TsrRoutine;
```

3.1. I believe we should rename this to something like `struct
ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
not a routine.
3.2. The names of the fields should be made consistent - e.g. if you
have update_toast and copy_toast then del_toast should be renamed to
delete_toast.
3.2. Additionally, in some parts of the path del_toast is used, while
in others - deltoast.

4. The user documentation should have clear answers on the following questions:
4.1. What will happen if the user tries to delete a TOASTer while
still having data that was TOASTed using this TOASTer? Or this is not
supported and the TOASTers should exist in the database indefinitely
after creation?
4.2. Is it possible to delete and/or alter the default TOASTer?
4.3. Please make sure the previously discussed clarification regarding
"N TOASTers vs M TableAMs" and the validation function is explicitly
present.

```
Toaster initialization. Initial TOAST table creation and other startup
preparations.
```

5.1. The documentation should clarify how many times init() is called
- is it done once for the TOASTer, once per relation, etc.
5.2. Why there is no free() method?

```
Updates TOASTed value. Returns new TOAST pointer.
```

6. It's not clear for what reason update_toast() will be executed to
begin with. This should be clarified in the documentation. How does it
differ from copy_toast()?

```
Validates Toaster for given data type, storage and compression options.
```

7. It should be explicitly stated when validate() is called and what
happens depending on the return value.

```
+Virtual Functions Table API Extension
```

8. IMO this section does a very poor job in explaining what this is
for and when I may want to use this; what particular problem are we
addressing here?

9. There are typos in the comments and the documentation,
s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing
articles. Please run your patches through a spellchecker.

I suggest we address this piece of feedback before proceeding further.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

03 ноября 2022 г., 16:26:38

Hi Nikita,

Please, avoid top-posting [1].

> Toaster is set for the table column. Each TOASTable column could have
> a different Toaster, so column option is the most obvious place to add it.

This is a major limitation. IMO the user should be able to set a
custom TOASTer for the entire table as well. Ideally - for the entire
database too. This could be implemented entirely on the syntax level,
the internals of the patch are not going to be affected.

> >1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
> >strategies as well. On top of that, why should custom TOASTers have
> >any knowledge of the default four-stage algorithm and the storage
> >strategies? If the storage strategy is actually ignored, it shouldn't
> >be used in the syntax.
>
> EXTENDED storage strategy means that TOASTed value is compressed
> before being TOASTed, so no knowledge of its internals could be of any
> use. EXTERNAL strategy means that value is being TOASTed in original
> form.  Storage strategy is the thing internal to AM used, and TOAST
> mechanics is not meant to interfere with it. Again, STORAGE EXTERNAL
> explicitly shows that value will be stored out-of-line.

Let me rephrase. Will the custom TOASTers work only for EXTERNAL
storage strategy or this is just a syntax?

> >2. Although it's possible to implement some encryption in a TOASTer I
> >don't think the documentation should advertise this.
>
> It is a good example of what could the Toaster be responsible for

No, encryption is an excellent example of what a TOASTer should NOT
do. If you are interested in encryption consider joining the "Moving
forward with TDE" thread [2].

> >3.1. I believe we should rename this to something like `struct
> >ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
> >not a routine.
>
> It was done similar to Table AM Routine (please check Pluggable
> Storage API), along with some other decisions.

OK, then maybe we shall keep the "Routine" part for consistency. I
still don't like the "Tsr" abbreviation though and find it confusing.

> It is not clear because current TOAST mechanics does not have UPDATE
> functionality - it doesn't actually update TOASTed value, it marks this value
> "dead" and inserts a new one. This is the cause of TOAST tables bloating
> that is being complained about by many users. Update method is provided
> for implementation of UPDATE operation.

But should we really distinguish INSERT and UPDATE cases on this API
level? It seems to me that TableAM just inserts new tuples. It's
TOASTers job to figure out whether similar values existed before and
should or shouldn't be reused. Additionally a particular TOASTer can
reuse old values between _different_ rows, potentially even from
different tables. Another reason why in practice there is little use
of knowing whether the data is INSERTed or UPDATEd.

> I already answered this question, maybe the answer was not very clear.
> This is just an extension entry point, because for some more advanced
> functionality existing pre-defined set of methods would be not enough, i.e.
> special Toasters for complex datatypes like JSONb, that have complex
> internal structure and may require additional ways to interact with it along
> toast/detoast/update/delete.

Maybe so, but it doesn't change the fact that the user documentation
should clearly describe the interface and its usage.

> These too. About free() method - Toasters are not meant to be deleted,
> we mentioned this several times. They exist once they are created as long
> as the DB itself. Have I answered your question?

Users should be able to DROP extension. I seriously doubt that the
patch is going to be accepted as long as it has this limitation.

[1]: https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics
[2]: https://www.postgresql.org/message-id/flat/CAOxo6XJh95xPOpvTxuP_kiGRs8eHcaNrmy3kkzWrzWxvyVkKkQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

03 ноября 2022 г., 17:30:02

Hi,

Setting TOAST for table and database is a subject for discussion. There is already default

Toaster. Also, there is not much sense in setting Jsonb Toaster as default even for table, do

not say database, because table could contain other TOASTable columns not of Json type.

To be able to set custom Toaster as default for table you have to make it work with ALL

TOASTable datatypes - which leads to lots and lots lines of code, complexity and difficulties

supporting such custom Toaster. Custom Toasters are meant to be rather small and have

specialty in some tricky datatypes or workflow.

Custom Toasters will work with Extended storage, but as I answered in previous email -

there is no much use of it, because they would deal with compressed data.

>No, encryption is an excellent example of what a TOASTer should NOT
>do. If you are interested in encryption consider joining the "Moving
>forward with TDE" thread [2].

I'm not working with encryption, so maybe it is really out of scope example. Anyway,

compression and dealing with data with known internal structure or some special

requirements lile geometric data in PostGIS - for example, custom PostGIS Toaster gives

considerable performance boost.

>But should we really distinguish INSERT and UPDATE cases on this API
>level? It seems to me that TableAM just inserts new tuples. It's
>TOASTers job to figure out whether similar values existed before and
>should or shouldn't be reused. Additionally a particular TOASTer can
>reuse old values between _different_ rows, potentially even from
>different tables. Another reason why in practice there is little use
>of knowing whether the data is INSERTed or UPDATEd.

For TOASTer you SHOULD distinguish insert and update operations, really. Because for

TOASTed data these operations affect many tuples, and AM does know which of them

were updated and which were not - that's very serious limitation of current TOAST, and

TOAST mechanics shoud care about updating affected tuples only instead of marking

whole record dead and inserting new one. This is also an argument for not using EXTENDED

storage mode - because with compressed data you do not have such choice, you should

drop the whole record.

Correctly implemented UPDATE for TOAST boosts performance and considerably

decreases size of TOAST tables along with WAL size. This is not a question, an UPDATE

operation for TOASTed data is a must - consider updating 1 Gb TOASTed record - with

current TOAST you would finish having 2 1 Gb records in a table, one of them dead, and

2 Gb in WAL. With update you would have the same 1 Gb record and only update diff in WAL.

>Users should be able to DROP extension. I seriously doubt that the
>patch is going to be accepted as long as it has this limitation.

There is a mention in documentation and previous discussion that this operation would lead

to loss of data TOASTed with this custom Toaster. It was stated as an issue and subject for

further duscucssion in previous emails.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

04 ноября 2022 г., 12:25:46

Hi Nikita,

> Setting TOAST for table and database is a subject for discussion. There is already default
> Toaster. Also, there is not much sense in setting Jsonb Toaster as default even for table, do
> not say database, because table could contain other TOASTable columns not of Json type.

True, but besides Jsonb Toaster one can implement a universal one as
well (your own example: encryption, or a Toaster that bypasses a 1 GB
value limitation). However we can probably keep this out of scope of
this particular patchset for now. As mentioned before this is going to
be just an additional syntax for the user convenience.

> Custom Toasters will work with Extended storage, but as I answered in previous email -
> there is no much use of it, because they would deal with compressed data.

Compression is actually a part of the four-stage TOAST algorithm. So
what you're doing is using the default TOAST most of the time, but if
the storage strategy is EXTERNAL and a custom TOASTer is specified
then you use a type-aware "TOASTer".

If the goal is to implement true "Pluggable TOASTers" then the TOAST
should be substituted entirely. If the goal is to implement type-aware
sub-TOASTers we should be honest about this and call it otherwise:
e.g. "Type-aware TOASTer" or perhaps "subTOASTer". Additionally in
this case there should be no validate() method since this is going to
work only with the default heapam that implements the default TOASTer.

So to clarify, the goal is to deliver true "Pluggable TOASTers" or
only "type-aware sub-TOASTers"?

> For TOASTer you SHOULD distinguish insert and update operations, really. Because for
> TOASTed data these operations affect many tuples, and AM does know which of them
> were updated and which were not - that's very serious limitation of current TOAST, and
> TOAST mechanics shoud care about updating affected tuples only instead of marking
> whole record dead and inserting new one. This is also an argument for not using EXTENDED
> storage mode - because with compressed data you do not have such choice, you should
> drop the whole record.

This may actually be a good point. I suggest reflecting it in the documentation.

> >Users should be able to DROP extension. I seriously doubt that the
> >patch is going to be accepted as long as it has this limitation.
>
> There is a mention in documentation and previous discussion that this operation would lead
> to loss of data TOASTed with this custom Toaster.

I don't see any reason why the semantics for Toasters should be any
different from user-defined types implemented in an extension. If
there are columns that use a given Toaster we should forbid DROPping
the extension. Otherwise "DROP extension" should succeed. No one says
that this should be a fast operation but it should be possible.

[1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

04 ноября 2022 г., 12:35:18

Hi again,

> [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

I added a link but forgot to use it, d'oh!

Please note that if the goal is to merely implement "type-aware
sub-TOASTers" then compression dictionaries [1] arguably provide the
same value with MUCH less complexity.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

06 ноября 2022 г., 17:59:13

Hi,

Pluggable TOAST is provided as an interface to allow developers to plug

in custom TOAST mechanics. It does not determines would it be universal

Toaster or one data type, but syntax for universal Toaster is out of scope

for this patchset.

>True, but besides Jsonb Toaster one can implement a universal one as

>well (your own example: encryption, or a Toaster that bypasses a 1 GB
>value limitation). However we can probably keep this out of scope of
>this particular patchset for now. As mentioned before this is going to
>be just an additional syntax for the user convenience.

To transparently bypass the 1 Gb limit you have to increase size of data

VARLENA type is able to hold. This is out if scope for this patchset too,

but as I mentioned before, there are means to do this with Pluggable

TOAST using toast and detoast iterators.

>Compression is actually a part of the four-stage TOAST algorithm. So
>what you're doing is using the default TOAST most of the time, but if
>the storage strategy is EXTERNAL and a custom TOASTer is specified
>then you use a type-aware "TOASTer".

We provide several custom Toasters for particular types of data causing

a lot of problems in storage. The main idea behind Pluggable TOAST is

using special data-aware Toasters where it is needed.
Extended storage mode supports only 2 compression algorithms, though

there are more efficient ones for different types of data.

>If the goal is to implement true "Pluggable TOASTers" then the TOAST
>should be substituted entirely. If the goal is to implement type-aware
>sub-TOASTers we should be honest about this and call it otherwise:
>e.g. "Type-aware TOASTer" or perhaps "subTOASTer". Additionally in
>this case there should be no validate() method since this is going to
>work only with the default heapam that implements the default TOASTer.

>So to clarify, the goal is to deliver true "Pluggable TOASTers" or
>only "type-aware sub-TOASTers"?

Pluggable TOAST does not supposes complete substitution of existing

TOAST mechanics - this is out of scope for this patchset. It proposes

means to substitute it or plug in additional custom TOAST mechanics

for particular data types.

>I don't see any reason why the semantics for Toasters should be any
>different from user-defined types implemented in an extension. If
>there are columns that use a given Toaster we should forbid DROPping
>the extension. Otherwise "DROP extension" should succeed. No one says
>that this should be a fast operation but it should be possible.

I'm currently working on a revision of Pluggable TOAST that would make

dropping Toaster possible if there is no data TOASTed with it, along with

several other major changes. It will be available in this (I hope so) or the

following, if I won't make it in time, commitfest.

On Fri, Nov 4, 2022 at 12:35 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi again,

> [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com

I added a link but forgot to use it, d'oh!

Please note that if the goal is to merely implement "type-aware
sub-TOASTers" then compression dictionaries [1] arguably provide the
same value with MUCH less complexity.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

07 ноября 2022 г., 13:35:18

Hi Nikita,

> Pluggable TOAST is provided as an interface to allow developers to plug
> in custom TOAST mechanics. It does not determines would it be universal
> Toaster or one data type, but syntax for universal Toaster is out of scope
> for this patchset.

If I understand correctly, what is going to happen - the same
interface TsrRoutine is going to be used for doing two things:

1) Implementing type-aware TOASTers as a special case for the default
TOAST algorithm when EXTERNAL storage strategy is used.
2) Implementing universal TOASTers from scratch that have nothing to
do with the default TOAST algorithm.

Assuming this is the case, using the same interface for doing two very
different things doesn't strike me as a great design decision. While
working on v24 you may want to rethink this.

Personally I believe that Pluggable TOASTers should support only case
#2. If there is a need of reusing parts of the default TOASTer,
corresponding pieces of code should be declared as non-static and
called from the pluggable TOASTers directly.

Alternatively we could have separate interfaces for case #1 and case
#2 but this IMO becomes rather complicated.

> I'm currently working on a revision of Pluggable TOAST that would make
> dropping Toaster possible if there is no data TOASTed with it, along with
> several other major changes. It will be available in this (I hope so) or the
> following, if I won't make it in time, commitfest.

Looking forward to v24!

This is a major change so I hope there will be more feedback from
other people on the mailing list.

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

15 декабря 2022 г., 23:37:15

Hi hackers!

I want to bump this thread and remind the community about Pluggable TOAST.

Overall Pluggable TOAST was revised to work without altering PG_ATTRIBUTE table

- no atttoaster column, allow to drop unused Toasters and some other improvements.

Sorry for the big delay, but it was a big piece of work to do, and the work is still going on.

Here are the main highlights:

1) No need to modify the PG_ATTRIBUTE table. We've introduced new catalog

table with a set of internal support functions that keeps all table-toaster relations:

(This is not final definition)

This approach allows us to keep all Toaster assignment history, as well as correctly storing

Toasters assigned to different columns of the relation, and even have separate TOAST

storage entities (these not necessary to be regular TOAST tables) for different columns.

When the table with the TOASTable column is created - a new row is inserted into

PG_TOASTREL with source table OID, Toaster OID, created TOAST entity OID, column

(attribute) index. Special field "version" is used to keep history of Toasters assigned to

the column - it is a counter which increases with each assignment, and the biggest version

is the current Toaster for the column. All assigned Toasters are automatically cached,

and when the value is TOASTed - first lookup is done in cache, and if there is no cached

Toaster it is searched in PG_TOASTREL and inserted in cache.

2) Attribute "reltoastrelid" was replaced with calls of PG_TOASTREL support functions.

This was done to allow each TOASTed column to be assigned with different Toaster

and have its individual TOAST table.

3) DROP TABLE command was modified to remove corresponding records from the

PG_TOASTREL - to allow dropping toasters that are out of use.

4) DROP TOASTER command was introduced. This command allows to drop unused

Toasters - the ones that do not have records in PG_TOASTREL. If the Toaster was

assigned to a column - it could not be dropped, because all data TOASTed with it will

be lost.

The branch is still in development so I it is too early for patch but here's link to the repo:

https://github.com/postgrespro/postgres/tree/toastapi_with_ctl

On Mon, Nov 7, 2022 at 1:35 PM Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Nikita,

> Pluggable TOAST is provided as an interface to allow developers to plug
> in custom TOAST mechanics. It does not determines would it be universal
> Toaster or one data type, but syntax for universal Toaster is out of scope
> for this patchset.

If I understand correctly, what is going to happen - the same
interface TsrRoutine is going to be used for doing two things:

1) Implementing type-aware TOASTers as a special case for the default
TOAST algorithm when EXTERNAL storage strategy is used.
2) Implementing universal TOASTers from scratch that have nothing to
do with the default TOAST algorithm.

Assuming this is the case, using the same interface for doing two very
different things doesn't strike me as a great design decision. While
working on v24 you may want to rethink this.

Personally I believe that Pluggable TOASTers should support only case
#2. If there is a need of reusing parts of the default TOASTer,
corresponding pieces of code should be declared as non-static and
called from the pluggable TOASTers directly.

Alternatively we could have separate interfaces for case #1 and case
#2 but this IMO becomes rather complicated.

> I'm currently working on a revision of Pluggable TOAST that would make
> dropping Toaster possible if there is no data TOASTed with it, along with
> several other major changes. It will be available in this (I hope so) or the
> following, if I won't make it in time, commitfest.

Looking forward to v24!

This is a major change so I hope there will be more feedback from
other people on the mailing list.

--
Best regards,
Aleksander Alekseev

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

27 декабря 2022 г., 00:01:54

Hi hackers!

Pluggable TOAST API with catalog control table PG_TOASTREL - pre-patch.

Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog
relation containing TOAST dependencies. NOTE: here is a pre-patch, not
a final version, just to introduce another approach to a Pluggable TOAST
idea, it needs some cleanup, tests rework and some improvements, so

the main
goal of this message is to introduce this different approach. This is the
last patch and it is installed on top of older TOAST API patches, so here
are 3 patches attached:

0001_toaster_interface_v24.patch.gz
This patch introduces new custom TOAST pointer, Pluggable TOAST API and
Toaster support functions - cache, lookup, and new attribute 'atttoaster'
in PG_ATTRIBUTE table which stores Toaster OID;

0002_toaster_default_v24.patch.gz
Here the default TOAST mechanics is routed via TOAST API, but still using
varatt_external TOAST Pointer - so this step does not change overall TOAST
mechanics unless you plug in some custom Toaster;

0003_pg_toastrel_table_v24.patch.gz
Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this
patch introduces new catalog table PG_TOASTREL with its support functions.

Motivation: PG_ATTRIBUTE is already the largest catalog table. We try
to avoid modification of existing catalog tables, and previous solution
had several problems:
1) New field in PG_ATTRIBUTE;
2) No opportunity to save all Toaster assignment history;
3) No opportunity to have multi-TOAST tables assigned to a relation or
an attribute;
4) Toaster cannot be dropped - to drop Toaster we need to scan all tables
with TOASTable columns.

Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided
to store all Table-Toaster relations in a new catalog table PG_TOASTREL.
This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store
full history of Toasters assignments, and allows to drop unused Toasters
from system.

Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command
creates a new row in PG_TOASTREL. To distinguish sequential assignments,
PG_TOASTREL has special attribute - 'version'. With each new assignment
its 'version' attribute is increased, and the row with the biggest 'version'
is the current Toaster for a column.

This approach allows to provide different behavior, even for a single table
we can have one TOAST table for the whole relation (as it is in current TOAST
mechanics), or we can have separate TOAST relation(s) for each TOASTable
column - this requires a slight modification if current approach. The latter
also allows simple invariant of column-oriented storage.

Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete -
current mechanics allows only 1 TOAST table for relation, which limits
greatly TOAST capabilities - because all TOASTed columns are stored in this
table, which in its turn limits overall base relation capacity.

In future, this approach allows us to have a kind of near infinite TOAST
storage, with ability to store large values (larger than 512 Mbytes),
auto-creation of TOAST table only when the first value is actually TOASTed,
and much more.

The approach, along with the TOAST API itself, introduces the catalog table
PG_TOASTREL with a set of support functions.

PG_TOASTREL definition:

postgres@postgres=# \d+ pg_toastrel;
Table "pg_catalog.pg_toastrel"
Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description
-------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
oid | oid | | not null | | plain | | | |
toasteroid | oid | | not null | | plain | | | |
relid | oid | | not null | | plain | | | |
toastentid | oid | | not null | | plain | | | |
attnum | smallint | | not null | | plain | | | |
version | smallint | | not null | | plain | | | |
relname | name | | not null | | plain | | | |
toastentname | name | | not null | | plain | | | |
flag | "char" | | not null | | plain | | | |
toastoptions | "char" | | not null | | plain | | | |
Indexes:
"pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
"pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
"pg_toastrel_rel_index" btree (relid, attnum)
"pg_toastrel_tsr_index" btree (toasteroid)
Access method: heap

(This is not a final definition)

Where:
oid - PG_TOASTREL record ID
toasteroid - Toaster OID from PG_TOASTER
relid - base relation OID
toastentid - TOAST entity OID (not necessary to be a table)
attnum - TOASTable attribute index in base relation
version - Toaster assignment version - sequence of assignments
relname - base relation name (optional)
toastentname - TOAST entity name (optional)
flag - special field to mark rows, currently only the value 'x' is used
to mark unused rows

PG_TOASTREL unique key consists of:
toasteroid, relid, attnum, version

All currently assigned Toasters are additionally stored in cache for
fast access. When new row is being TOASTed - Toaster, relation Oid,
TOAST relation Oid, column index are added into Toastrel Cache for fast
access.

Create table, change Toaster, change column type were changed to
add new rows in PG_TOASTREL, to use this table and cache instead
of altering pg_attribute with new column. For table creation from
scratch when no TOAST tables were created is used special condition
with version=0.

DROP TABLE drops rows in PG_TOASTREL for this table. This allows to -
DROP TOASTER command added. When no rows with the according Toaster are
present in PG_TOASTREL - it is considered unused and thus could be safely
dropped from the system.

Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped.

Working branch:

https://github.com/postgrespro/postgres/tree/toastapi_with_ctl

Would be glad to get any proposals and objections.

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

vignesh C

Дата:

04 января 2023 г., 12:52:27

On Tue, 27 Dec 2022 at 02:32, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
>
>     Pluggable TOAST API with catalog control table PG_TOASTREL - pre-patch.
>
>     Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog
>     relation containing TOAST dependencies. NOTE: here is a pre-patch, not
>     a final version, just to introduce another approach to a Pluggable TOAST
>     idea, it needs some cleanup, tests rework and some improvements, so
> the main
>     goal of this message is to introduce this different approach. This is the
>     last patch and it is installed on top of older TOAST API patches, so here
>     are 3 patches attached:
>
>     0001_toaster_interface_v24.patch.gz
>     This patch introduces new custom TOAST pointer, Pluggable TOAST API and
>     Toaster support functions - cache, lookup, and new attribute 'atttoaster'
>     in PG_ATTRIBUTE table which stores Toaster OID;
>
>     0002_toaster_default_v24.patch.gz
>     Here the default TOAST mechanics is routed via TOAST API, but still using
>     varatt_external TOAST Pointer - so this step does not change overall TOAST
>     mechanics unless you plug in some custom Toaster;
>
>     0003_pg_toastrel_table_v24.patch.gz
>     Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this
>     patch introduces new catalog table PG_TOASTREL with its support functions.
>
>     Motivation: PG_ATTRIBUTE is already the largest catalog table. We try
>     to avoid modification of existing catalog tables, and previous solution
>     had several problems:
>     1) New field in PG_ATTRIBUTE;
>     2) No opportunity to save all Toaster assignment history;
>     3) No opportunity to have multi-TOAST tables assigned to a relation or
>     an attribute;
>     4) Toaster cannot be dropped - to drop Toaster we need to scan all tables
>     with TOASTable columns.
>
>     Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided
>     to store all Table-Toaster relations in a new catalog table PG_TOASTREL.
>     This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store
>     full history of Toasters assignments, and allows to drop unused Toasters
>     from system.
>
>     Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command
>     creates a new row in PG_TOASTREL. To distinguish sequential assignments,
>     PG_TOASTREL has special attribute - 'version'. With each new assignment
>     its 'version' attribute is increased, and the row with the biggest 'version'
>     is the current Toaster for a column.
>
>     This approach allows to provide different behavior, even for a single table
>     we can have one TOAST table for the whole relation (as it is in current TOAST
>     mechanics), or we can have separate TOAST relation(s) for each TOASTable
>     column - this requires a slight modification if current approach. The latter
>     also allows simple invariant of column-oriented storage.
>
>     Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete -
>     current mechanics allows only 1 TOAST table for relation, which limits
>     greatly TOAST capabilities - because all TOASTed columns are stored in this
>     table, which in its turn limits overall base relation capacity.
>
>     In future, this approach allows us to have a kind of near infinite TOAST
>     storage, with ability to store large values (larger than 512 Mbytes),
>     auto-creation of TOAST table only when the first value is actually TOASTed,
>     and much more.
>
>     The approach, along with the TOAST API itself, introduces the catalog table
>     PG_TOASTREL with a set of support functions.
>
>     PG_TOASTREL definition:
>
>     postgres@postgres=# \d+ pg_toastrel;
>                                                 Table "pg_catalog.pg_toastrel"
>        Column    |   Type   | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target |
Description
>
-------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
>     oid          | oid      |           | not null |         | plain   |         |             |              |
>     toasteroid   | oid      |           | not null |         | plain   |         |             |              |
>     relid        | oid      |           | not null |         | plain   |         |             |              |
>     toastentid   | oid      |           | not null |         | plain   |         |             |              |
>     attnum       | smallint |           | not null |         | plain   |         |             |              |
>     version      | smallint |           | not null |         | plain   |         |             |              |
>     relname      | name     |           | not null |         | plain   |         |             |              |
>     toastentname | name     |           | not null |         | plain   |         |             |              |
>     flag         | "char"   |           | not null |         | plain   |         |             |              |
>     toastoptions | "char"   |           | not null |         | plain   |         |             |              |
>     Indexes:
>     "pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
>     "pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
>     "pg_toastrel_rel_index" btree (relid, attnum)
>     "pg_toastrel_tsr_index" btree (toasteroid)
>     Access method: heap
>    (This is not a final definition)
>
>     Where:
>     oid - PG_TOASTREL record ID
>     toasteroid - Toaster OID from PG_TOASTER
>     relid - base relation OID
>     toastentid - TOAST entity OID (not necessary to be a table)
>     attnum - TOASTable attribute index in base relation
>     version - Toaster assignment version - sequence of assignments
>     relname - base relation name (optional)
>     toastentname - TOAST entity name (optional)
>     flag - special field to mark rows, currently only the value 'x' is used
>     to mark unused rows
>
>     PG_TOASTREL unique key consists of:
>     toasteroid, relid, attnum, version
>
>     All currently assigned Toasters are additionally stored in cache for
>     fast access. When new row is being TOASTed - Toaster, relation Oid,
>     TOAST relation Oid, column index are added into Toastrel Cache for fast
>     access.
>
>     Create table, change Toaster, change column type were changed to
>     add new rows in PG_TOASTREL, to use this table and cache instead
>     of altering pg_attribute with new column. For table creation from
>     scratch when no TOAST tables were created is used special condition
>     with version=0.
>
>     DROP TABLE drops rows in PG_TOASTREL for this table. This allows to -
>     DROP TOASTER command added. When no rows with the according Toaster are
>     present in PG_TOASTREL - it is considered unused and thus could be safely
>     dropped from the system.
>
>     Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped.
>
> Working branch:
> https://github.com/postgrespro/postgres/tree/toastapi_with_ctl
>
> Would be glad to get any proposals and objections.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
patching file src/backend/utils/cache/syscache.c
=== Applying patches on top of PostgreSQL commit ID
33ab0a2a527e3af5beee3a98fc07201e555d6e45 ===
=== applying patch ./0001_toaster_interface_v24.patch
patching file contrib/test_decoding/expected/ddl.out
Hunk #2 FAILED at 874.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/cache/syscache.c.rej

[1] - http://cfbot.cputube.org/patch_41_3490.log

Regards,
Vignesh

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

07 января 2023 г., 23:10:06

Hi!

Thank you for your attention.

I've rebased the patchset onto the latest master (from 07.01), but the second part is still

in pre-patch shape - it is working, but tests are failing due to changes in TOAST relations

logic - I haven't adapted 'em yet.

The patchset consists of 4 patches:

The first part consists of 2 patches and uses pg_attribute extension, this was an arguable

decision and it has serious downside, so I've decided to make another revision (the second part):

0001_toaster_interface_v25.patch - Pluggable TOAST API, reference TOAST left intact

0002_toaster_default_v25.patch - Reference TOAST is routed via TOAST API

The second part - TOAST API revision that does not extend pg_attribute, is more flexible

and allows a lot more extensibility to the TOAST functionality - instead of storing Toaster

OID in "atttoaster" attribute of pg_attribute - we use new special catalog table pg_toastrel,

which keeps all Toaster assignments history (pelase check my message from 27.12), and

allows to drop Toasters safely. This part is in pre-patch state, I've send it for the review

and feedback on the general approach:

0003_pg_toastrel_control_v25.patch - introduces pg_toastrel catalog relation, which stores

Toaster assignment logic;

0004_drop_toaster_v25.patch - extends SQL syntax with a safe DROP TOASTER command.

On Wed, Jan 4, 2023 at 12:52 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 27 Dec 2022 at 02:32, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
>
> Pluggable TOAST API with catalog control table PG_TOASTREL - pre-patch.
>
> Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog
> relation containing TOAST dependencies. NOTE: here is a pre-patch, not
> a final version, just to introduce another approach to a Pluggable TOAST
> idea, it needs some cleanup, tests rework and some improvements, so
> the main
> goal of this message is to introduce this different approach. This is the
> last patch and it is installed on top of older TOAST API patches, so here
> are 3 patches attached:
>
> 0001_toaster_interface_v24.patch.gz
> This patch introduces new custom TOAST pointer, Pluggable TOAST API and
> Toaster support functions - cache, lookup, and new attribute 'atttoaster'
> in PG_ATTRIBUTE table which stores Toaster OID;
>
> 0002_toaster_default_v24.patch.gz
> Here the default TOAST mechanics is routed via TOAST API, but still using
> varatt_external TOAST Pointer - so this step does not change overall TOAST
> mechanics unless you plug in some custom Toaster;
>
> 0003_pg_toastrel_table_v24.patch.gz
> Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this
> patch introduces new catalog table PG_TOASTREL with its support functions.
>
> Motivation: PG_ATTRIBUTE is already the largest catalog table. We try
> to avoid modification of existing catalog tables, and previous solution
> had several problems:
> 1) New field in PG_ATTRIBUTE;
> 2) No opportunity to save all Toaster assignment history;
> 3) No opportunity to have multi-TOAST tables assigned to a relation or
> an attribute;
> 4) Toaster cannot be dropped - to drop Toaster we need to scan all tables
> with TOASTable columns.
>
> Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided
> to store all Table-Toaster relations in a new catalog table PG_TOASTREL.
> This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store
> full history of Toasters assignments, and allows to drop unused Toasters
> from system.
>
> Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command
> creates a new row in PG_TOASTREL. To distinguish sequential assignments,
> PG_TOASTREL has special attribute - 'version'. With each new assignment
> its 'version' attribute is increased, and the row with the biggest 'version'
> is the current Toaster for a column.
>
> This approach allows to provide different behavior, even for a single table
> we can have one TOAST table for the whole relation (as it is in current TOAST
> mechanics), or we can have separate TOAST relation(s) for each TOASTable
> column - this requires a slight modification if current approach. The latter
> also allows simple invariant of column-oriented storage.
>
> Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete -
> current mechanics allows only 1 TOAST table for relation, which limits
> greatly TOAST capabilities - because all TOASTed columns are stored in this
> table, which in its turn limits overall base relation capacity.
>
> In future, this approach allows us to have a kind of near infinite TOAST
> storage, with ability to store large values (larger than 512 Mbytes),
> auto-creation of TOAST table only when the first value is actually TOASTed,
> and much more.
>
> The approach, along with the TOAST API itself, introduces the catalog table
> PG_TOASTREL with a set of support functions.
>
> PG_TOASTREL definition:
>
> postgres@postgres=# \d+ pg_toastrel;
> Table "pg_catalog.pg_toastrel"
> Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description
> -------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
> oid | oid | | not null | | plain | | | |
> toasteroid | oid | | not null | | plain | | | |
> relid | oid | | not null | | plain | | | |
> toastentid | oid | | not null | | plain | | | |
> attnum | smallint | | not null | | plain | | | |
> version | smallint | | not null | | plain | | | |
> relname | name | | not null | | plain | | | |
> toastentname | name | | not null | | plain | | | |
> flag | "char" | | not null | | plain | | | |
> toastoptions | "char" | | not null | | plain | | | |
> Indexes:
> "pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
> "pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
> "pg_toastrel_rel_index" btree (relid, attnum)
> "pg_toastrel_tsr_index" btree (toasteroid)
> Access method: heap
> (This is not a final definition)
>
> Where:
> oid - PG_TOASTREL record ID
> toasteroid - Toaster OID from PG_TOASTER
> relid - base relation OID
> toastentid - TOAST entity OID (not necessary to be a table)
> attnum - TOASTable attribute index in base relation
> version - Toaster assignment version - sequence of assignments
> relname - base relation name (optional)
> toastentname - TOAST entity name (optional)
> flag - special field to mark rows, currently only the value 'x' is used
> to mark unused rows
>
> PG_TOASTREL unique key consists of:
> toasteroid, relid, attnum, version
>
> All currently assigned Toasters are additionally stored in cache for
> fast access. When new row is being TOASTed - Toaster, relation Oid,
> TOAST relation Oid, column index are added into Toastrel Cache for fast
> access.
>
> Create table, change Toaster, change column type were changed to
> add new rows in PG_TOASTREL, to use this table and cache instead
> of altering pg_attribute with new column. For table creation from
> scratch when no TOAST tables were created is used special condition
> with version=0.
>
> DROP TABLE drops rows in PG_TOASTREL for this table. This allows to -
> DROP TOASTER command added. When no rows with the according Toaster are
> present in PG_TOASTREL - it is considered unused and thus could be safely
> dropped from the system.
>
> Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped.
>
> Working branch:
> https://github.com/postgrespro/postgres/tree/toastapi_with_ctl
>
> Would be glad to get any proposals and objections.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
patching file src/backend/utils/cache/syscache.c
=== Applying patches on top of PostgreSQL commit ID
33ab0a2a527e3af5beee3a98fc07201e555d6e45 ===
=== applying patch ./0001_toaster_interface_v24.patch
patching file contrib/test_decoding/expected/ddl.out
Hunk #2 FAILED at 874.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/cache/syscache.c.rej

[1] - http://cfbot.cputube.org/patch_41_3490.log

Regards,
Vignesh

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Вложения

Re: Pluggable toaster

От

vignesh C

Дата:

14 января 2023 г., 09:56:37

On Sun, 8 Jan 2023 at 01:40, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi!
>
> Thank you for your attention.
> I've rebased the patchset onto the latest master (from 07.01), but the second part is still
> in pre-patch shape - it is working, but tests are failing due to changes in TOAST relations
> logic - I haven't adapted 'em yet.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
c44f6334ca6ff6d242d9eb6742441bc4e1294067 ===
=== expanding ./0002_toaster_default_v25.patch.gz
=== expanding ./0001_toaster_interface_v25.patch.gz
=== expanding ./0004_drop_toaster_v25.patch.gz
=== expanding ./0003_pg_toastrel_control_v25.patch.gz
=== applying patch ./0001_toaster_interface_v25.patch
....
patching file src/include/postgres.h
Hunk #1 succeeded at 80 with fuzz 2 (offset 4 lines).
Hunk #2 FAILED at 148.
Hunk #3 FAILED at 315.
Hunk #4 FAILED at 344.
Hunk #5 FAILED at 359.
4 out of 5 hunks FAILED -- saving rejects to file src/include/postgres.h.rej

[1] - http://cfbot.cputube.org/patch_41_3490.log

Regards,
Vignesh

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

14 января 2023 г., 14:47:57

Hi!

Fails due to recent changes. Working on it.

On Sat, Jan 14, 2023 at 9:56 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, 8 Jan 2023 at 01:40, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi!
>
> Thank you for your attention.
> I've rebased the patchset onto the latest master (from 07.01), but the second part is still
> in pre-patch shape - it is working, but tests are failing due to changes in TOAST relations
> logic - I haven't adapted 'em yet.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
c44f6334ca6ff6d242d9eb6742441bc4e1294067 ===
=== expanding ./0002_toaster_default_v25.patch.gz
=== expanding ./0001_toaster_interface_v25.patch.gz
=== expanding ./0004_drop_toaster_v25.patch.gz
=== expanding ./0003_pg_toastrel_control_v25.patch.gz
=== applying patch ./0001_toaster_interface_v25.patch
....
patching file src/include/postgres.h
Hunk #1 succeeded at 80 with fuzz 2 (offset 4 lines).
Hunk #2 FAILED at 148.
Hunk #3 FAILED at 315.
Hunk #4 FAILED at 344.
Hunk #5 FAILED at 359.
4 out of 5 hunks FAILED -- saving rejects to file src/include/postgres.h.rej

[1] - http://cfbot.cputube.org/patch_41_3490.log

Regards,
Vignesh

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Alvaro Herrera

Дата:

03 февраля 2023 г., 12:56:58

On 2023-Jan-14, Nikita Malakhov wrote:

> Hi!
> Fails due to recent changes. Working on it.

Please see my response here
https://postgr.es/m/20230203095540.zutul5vmsbmantbm@alvherre.pgsql

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

06 февраля 2023 г., 00:10:50

Hi hackers!

In response to opinion in thread on Compresson dictionaries for JSONb [1]

>The approaches are completely different,

>but they seem to be trying to fix the same problem: the fact that the
>default TOAST stuff isn't good enough for JSONB.

The problem, actually, is that the default TOAST is often not good for

modern loads and amounts of data.Pluggable TOAST is based not only

on pure enthusiasm, but on demands and tickets from production

databases.The main demand is effective and transparent storage subsystem

for large values for some problematic types of data, which we already have,

with proven efficiency.

So we're really quite surprised that it has got so little feedback. We've got

some opinions on approach but there is no any general one on the approach

itself except doubts about the TOAST mechanism needs revision at all.

Currently we're busy revising the whole Pluggable TOAST API to make it

available as an extension and based on hooks to minimize changes in

the core. It will be available soon.

[1] https://postgr.es/m/20230203095540.zutul5vmsbmantbm@alvherre.pgsql

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Andres Freund

Дата:

06 февраля 2023 г., 01:33:13

Hi,

On 2023-02-06 00:10:50 +0300, Nikita Malakhov wrote:
> The problem, actually, is that the default TOAST is often not good for
> modern loads and amounts of data.Pluggable TOAST is based not only
> on pure enthusiasm, but on demands and tickets from production
> databases.

> The main demand is effective and transparent storage subsystem
> for large values for some problematic types of data, which we already have,
> with proven efficiency.

> So we're really quite surprised that it has got so little feedback. We've
> got
> some opinions on approach but there is no any general one on the approach
> itself except doubts about the TOAST mechanism needs revision at all.

The problem for me is that what you've been posting doesn't actually fix
any problem, but instead introduces lots of new code and complexity.


> Currently we're busy revising the whole Pluggable TOAST API to make it
> available as an extension and based on hooks to minimize changes in
> the core. It will be available soon.

I don't think we should accept that either. It still doesn't improve
anything about toast, it just allows you to do such improvements out of
core.

Greetings,

Andres Freund

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

06 февраля 2023 г., 13:17:30

Hi,

> > So we're really quite surprised that it has got so little feedback. We've
> > got
> > some opinions on approach but there is no any general one on the approach
> > itself except doubts about the TOAST mechanism needs revision at all.
>
> The problem for me is that what you've been posting doesn't actually fix
> any problem, but instead introduces lots of new code and complexity.

> > Currently we're busy revising the whole Pluggable TOAST API to make it
> > available as an extension and based on hooks to minimize changes in
> > the core. It will be available soon.
>
> I don't think we should accept that either. It still doesn't improve
> anything about toast, it just allows you to do such improvements out of
> core.

Agree. On top of that referencing non-reproducible benchmarks doesn't
help. There were some slides referenced in the thread but I couldn't
find exact steps to reproduce the benchmarks.

Your desire to improve the TOAST mechanism is much appreciated. I
believe we are all on the same side here, the one where people work
together to make PostgreSQL an even better DBMS.

However in order to achieve this firstly a consensus within the
community should be reached about how exactly we are going to do this.
Afterwards, all the code and benchmarks should be made publicly
available under a proper license so that anyone could explore and
reproduce them. Last but not least, the complexity should really be
taken into account. There are real people who are going to maintain
the code after (and if) it will be merged, and there are not so many
of them.

The problems I see are that the type-aware TOASTers skipped step (1)
right to the step (2) and doesn't seem to consider (3). Even after it
was explicitly pointed out that we should take a step back and return
to (1).

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Alvaro Herrera

Дата:

06 февраля 2023 г., 13:49:17

On 2023-Feb-06, Nikita Malakhov wrote:

> Currently we're busy revising the whole Pluggable TOAST API to make it
> available as an extension and based on hooks to minimize changes in
> the core. It will be available soon.

Hmm, I'm not sure why would PGDG want to accept such a thing.  I read
"minimize changes" as "open source Postgres can keep their crap
implementation and companies will offer good ones for a fee".  I'd
rather have something that can give users direct benefit -- not hide it
behind proprietary licenses forcing each company to implement their own
performant toaster.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

06 февраля 2023 г., 16:38:01

Hi!

Existing TOAST has several very painful drawbacks - lack of UPDATE

operation, bloating of TOAST tables, and limits that are implicitly implied

on base tables by their TOAST tables, so it is seems not fair to say that

Pluggable TOAST does not solve any problems but just introduces new

ones.

The main reason behind this decision is that keeping the first implementation

on the side of the vanilla (I mean rebasing it) over time is very difficult due

to the very invasive nature of this solution.

So we decided to reduce changes in the core to the minimum necessary

to make it available through the hooks, because the hooks part is very

lightweight and simple to keep rebasing onto the vanilla core. We plan

to keep this extension free with the PostgreSQL license, so any PostgreSQL

user could benefit from the TOAST on steroids, and sometimes in the

future it will be a much simpler task to integrate the Pluggable TOAST into

the vanilla, along with our advanced TOAST implementations which

we plan to keep under Open Source licenses too.

On Mon, Feb 6, 2023 at 1:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Feb-06, Nikita Malakhov wrote:

> Currently we're busy revising the whole Pluggable TOAST API to make it
> available as an extension and based on hooks to minimize changes in
> the core. It will be available soon.

Hmm, I'm not sure why would PGDG want to accept such a thing. I read
"minimize changes" as "open source Postgres can keep their crap
implementation and companies will offer good ones for a fee". I'd
rather have something that can give users direct benefit -- not hide it
behind proprietary licenses forcing each company to implement their own
performant toaster.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

06 февраля 2023 г., 17:38:34

Hi,

> The main reason behind this decision is that keeping the first implementation
> on the side of the vanilla (I mean rebasing it) over time is very difficult due
> to the very invasive nature of this solution.
>
> So we decided to reduce changes in the core to the minimum necessary
> to make it available through the hooks, because the hooks part is very
> lightweight and simple to keep rebasing onto the vanilla core. We plan
> to keep this extension free with the PostgreSQL license, so any PostgreSQL
> user could benefit from the TOAST on steroids, and sometimes in the
> future it will be a much simpler task to integrate the Pluggable TOAST into
> the vanilla, along with our advanced TOAST implementations which
> we plan to keep under Open Source licenses too.

That's great to hear. I'm looking forward to the link to the
corresponding GitHub repository. Please let us know when this effort
will be available for testing and benchmarking!

I would like to point out however that there were several other pieces
of feedback that could have been missed:

* No one wants to see this as an extension. This was my original
proposal (adding ZSON to /contrib/) and it was rejected. The community
explicitly wants this to be a core feature with its syntax,
autocompletion, documentation and stuff.
* The community wants the feature to have a simple implementation. You
said yourself that the idea of type-aware TOASTers is very invasive,
and I completely agree.
* People also want this to be simple from the user perspective, as
simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd];

At least this is my personal summary/impression from following the mailing list.

Anyhow since we are back to the stage where we discuss the RFC I
suggest continuing it in the compression dictionaries thread [1] since
we made noticeable progress there already.

[1]: https://postgr.es/m/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Andres Freund

Дата:

06 февраля 2023 г., 22:24:01

Hi,

On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote:
> So we decided to reduce changes in the core to the minimum necessary
> to make it available through the hooks, because the hooks part is very
> lightweight and simple to keep rebasing onto the vanilla core.

At least I don't think we should accept such hooks. I don't think I am alone
in that.

Greetings,

Andres Freund

Re: Pluggable toaster

От

Matthias van de Meent

Дата:

06 февраля 2023 г., 23:11:45

On Mon, 6 Feb 2023 at 20:24, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote:
> > So we decided to reduce changes in the core to the minimum necessary
> > to make it available through the hooks, because the hooks part is very
> > lightweight and simple to keep rebasing onto the vanilla core.
>
> At least I don't think we should accept such hooks. I don't think I am alone
> in that.

+1

Assuming type-aware TOASTing is the goal, I don't think hooks are the
right design to implement that.

-Matthias van de Meent

Re: Pluggable toaster

От

Matthias van de Meent

Дата:

06 февраля 2023 г., 23:14:54

On Mon, 6 Feb 2023 at 15:38, Aleksander Alekseev
<aleksander@timescale.com> wrote:
> I would like to point out however that there were several other pieces
> of feedback that could have been missed:
>
> * No one wants to see this as an extension. This was my original
> proposal (adding ZSON to /contrib/) and it was rejected. The community
> explicitly wants this to be a core feature with its syntax,
> autocompletion, documentation and stuff.

I believe that is a misrepresentation of the situation. ZSON had
(has?) several systemic issues and could not be accepted to /contrib/
in the way it was implemented, and it was commented that it would make
sense that the feature of compression assisted by dictionaries would
be implemented in core. Yet still, that feature is only closely
related to pluggable TOAST, but it is not the same as making TOAST
pluggable.

> * The community wants the feature to have a simple implementation. You
> said yourself that the idea of type-aware TOASTers is very invasive,
> and I completely agree.

I'm not sure that this is correct either. Compression (and TOAST) is
inherently complex, and I don't think we should reject improvements
because they are complex.
The problem that I see being raised here, is that there was little
discussion and no observed community consensus about the design of
this complex feature *before* this patch with high complexity was
provided.
The next action that was requested is to take a step back and decide
how we would want to implement type-aware TOASTing (and the associated
patch compression dictionaries) *before* we look into the type-aware
toasting.

> * People also want this to be simple from the user perspective, as
> simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd];

Could you provide a reference to this? I have yet to see a COMPRESSED
TABLE feature or syntax, let alone users asking for TOAST to be as
easily usable as that feature or syntax.

Kind regards,

Matthias van de Meent

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

06 февраля 2023 г., 23:21:45

Hi,

It'll be great to see more opinions on the approach as a whole.

>The problem that I see being raised here, is that there was little
>discussion and no observed community consensus about the design of
>this complex feature *before* this patch with high complexity was
>provided.
>The next action that was requested is to take a step back and decide
>how we would want to implement type-aware TOASTing (and the associated
>patch compression dictionaries) *before* we look into the type-aware
>toasting.

We decided to put this improvement as a patch because we thought

that the most complex and questionable part would be the TOAST

implementations (the Toasters) itself, and the Pluggable TOAST is

just a tool to make plugging different TOAST implementations clean

and simple.

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

07 февраля 2023 г., 13:05:45

Hi,

> I believe that is a misrepresentation of the situation. ZSON had
> (has?) several systemic issues and could not be accepted to /contrib/
> in the way it was implemented, and it was commented that it would make
> sense that the feature of compression assisted by dictionaries would
> be implemented in core. Yet still, that feature is only closely
> related to pluggable TOAST, but it is not the same as making TOAST
> pluggable.
>
> > * The community wants the feature to have a simple implementation. You
> > said yourself that the idea of type-aware TOASTers is very invasive,
> > and I completely agree.
>
> I'm not sure that this is correct either. Compression (and TOAST) is
> inherently complex, and I don't think we should reject improvements
> because they are complex.
> The problem that I see being raised here, is that there was little
> discussion and no observed community consensus about the design of
> this complex feature *before* this patch with high complexity was
> provided.

Strictly speaking there is no such thing as "community opinion". There
are different people, everyone has their own opinion. To make things
more interesting the opinions change with time.

I did my best to make a brief summary of 100+ messages from different
people in something like 4 threads. These are things that were
requested and/or no one disagrees with (at least no one said "no, put
all this out of the core! and make it complicated too!"). Focusing on
something (almost) no one disagrees with seems to be more productive
than arguing about something everyone disagrees with.

As I see it, the goal is not to be right, but rather to find a
consensus most of us will be not unhappy with.

> The next action that was requested is to take a step back and decide
> how we would want to implement type-aware TOASTing (and the associated
> patch compression dictionaries) *before* we look into the type-aware
> toasting.

Yes, I thought we already agreed to forget about type-aware TOASTing
and compression dictionaries, and are looking for a consensus now.

To clarify, I don't think that pluggable TOASTing is an absolutely bad
idea. We are just not discussing this particular idea anymore, at
least for now.

> > * People also want this to be simple from the user perspective, as
> > simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd];
>
> Could you provide a reference to this? I have yet to see a COMPRESSED
> TABLE feature or syntax, let alone users asking for TOAST to be as
> easily usable as that feature or syntax.

I was referring to the recent discussion of the new RFC. Please see
[1] and below.

[1]:
https://www.postgresql.org/message-id/flat/20230203095540.zutul5vmsbmantbm%40alvherre.pgsql#7cce6acef0cb7eb2490715ec9d835e74

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Pavel Borisov

Дата:

07 февраля 2023 г., 13:38:20

Hi, hackers!

Maybe I've read the thread too superficially, but for me, it seems
like more of a discussion on what TOAST should NOT be. Maybe someone
more in the topic could explain what is the consensus on what we
require and what we like to to have in a new TOAST?

For me, a good toast should be chunk-updatable, so that we don't need
to rewrite the whole TOAST and WAL-replicate the whole thing at every
small attribute modification. But obviously, it's just another
opinion.

Kind regards,
Pavel Borisov

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

14 июня 2023 г., 13:05:24

Hi!

I have a question for the community - why was this patch silently put to a "rejected" status?

Should there no be some kind of explanation?

During this discussion I got the impression that for some reason some members of the community

do not want the TOAST functionality, which has such drawbacks that make it really a curse for

in many ways very good DBMS, to be touched. We cannot get rid of it because of backwards

compatibility, so the best way is to make it more adaptable and extensible - this is what this thread

is about. We proposed our vision on how to extend the TOAST Postgres-way, like Pluggable

Storage some time before.

There are some very complex subjects left in this topic that really need a community' attention.

I've mentioned them above, but there was no feedback on them.

Pavel, we've already had an update implementation for TOAST. But it is a part of a Pluggable

TOAST and it hardly would be here without it. I've started another thread on extending the TOAST

pointer, maybe you would want to participate there [1].

We still would be grateful for feedback.

[1] Extending the TOAST Pointer

Regards,

Nikita Malakhov

Postgres Professional

https://postgrespro.ru/

Re: Pluggable toaster

От

Pavel Borisov

Дата:

14 июня 2023 г., 13:21:05

On Wed, 14 Jun 2023 at 14:05, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi!
>
> I have a question for the community - why was this patch silently put to a "rejected" status?
> Should there no be some kind of explanation?
>
> During this discussion I got the impression that for some reason some members of the community
> do not want the TOAST functionality, which has such drawbacks that make it really a curse for
> in many ways very good DBMS, to be touched. We cannot get rid of it because of backwards
> compatibility, so the best way is to make it more adaptable and extensible - this is what this thread
> is about. We proposed our vision on how to extend the TOAST Postgres-way, like Pluggable
> Storage some time before.
>
> There are some very complex subjects left in this topic that really need a community' attention.
> I've mentioned them above, but there was no feedback on them.
>
> Pavel, we've already had an update implementation for TOAST. But it is a part of a Pluggable
> TOAST and it hardly would be here without it. I've started another thread on extending the TOAST
> pointer, maybe you would want to participate there [1].
>
> We still would be grateful for feedback.
>
> [1] Extending the TOAST Pointer
I don't see a clear reason it's rejected, besides technically it's
Waiting on Author since January. If it's a mistake and the patch is
up-to-date you can set an appropriate status.

Regards,
Pavel Borisov,
Supabase.

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

14 июня 2023 г., 13:56:34

Hi,

> > I have a question for the community - why was this patch silently put to a "rejected" status?
> > Should there no be some kind of explanation?

I wouldn't say that it happened "silently" nor that the reason is so
mysterious [1].

[1]: https://www.postgresql.org/message-id/flat/CAM-w4HPjg7NwEWBtXn1empgAg3fqJHifHo_nhgqFWopiYaNxYg%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Nikita Malakhov

Дата:

16 июня 2023 г., 13:18:49

Hi,

Seems that I missed the thread mentioned above. I strongly disagree

with such statement, Pluggable TOAST could not be a part or Compression

Dictionaries thread because the TOAST improvement is a more general subject,

it involves much deeper and tricky changes in the core. And also is much more

promising in terms of performance and storage improvements.

We already have a lot of changes in Pluggable TOAST that were not committed

to the main GIT branch of this thread, so it seems that I have to merge them and

reopen it.

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

16 июня 2023 г., 16:20:38

Hi Nikita,

> We already have a lot of changes in Pluggable TOAST that were not committed
> to the main GIT branch of this thread, so it seems that I have to merge them and
> reopen it.

Pretty sure that reopening an already rejected patch that is competing
with compression dictionaries (which the rest of us are currently
focusing on) will achieve anything. Consider joining the compression
dictionaries effort instead [1]. During the discussion with the
community it ended up being a TOAST improvement after all. So we could
use your expertise in this area.

[1]: https://commitfest.postgresql.org/43/3626/

-- 
Best regards,
Aleksander Alekseev

Re: Pluggable toaster

От

Aleksander Alekseev

Дата:

16 июня 2023 г., 17:44:37

Hi,

> Pretty sure that reopening an already rejected patch that is competing
> with compression dictionaries (which the rest of us are currently
> focusing on) will achieve anything.

Ooops, I didn't mean to be sarcastic:

s/will achieve/will not achieve/

My apologies.

> Consider joining the compression
> dictionaries effort instead [1]. During the discussion with the
> community it ended up being a TOAST improvement after all. So we could
> use your expertise in this area.
>
> [1]: https://commitfest.postgresql.org/43/3626/

-- 
Best regards,
Aleksander Alekseev

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Pluggable toaster

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения