Обсуждение: Pluggable toaster
Hi! We are working on custom toaster for JSONB [1], because current TOAST is universal for any data type and because of that it has some disadvantages: - "one toast fits all" may be not the best solution for particular type or/and use cases - it doesn't know the internal structure of data type, so it cannot choose an optimal toast strategy - it can't share common parts between different rows and even versions of rows Modification of current toaster for all tasks and cases looks too complex, moreover, it will not works for custom data types. Postgres is an extensible database, why not to extent its extensibility even further, to have pluggable TOAST! We propose an idea to separate toaster from heap using toaster API similar to table AM API etc. Following patches are applicable over patch in [1] 1) 1_toaster_interface_v1.patch.gz https://github.com/postgrespro/postgres/tree/toaster_interface Introduces syntax for storage and formal toaster API. Adds column atttoaster to pg_attribute, by design this column should not be equal to invalid oid for any toastable datatype, ie it must have correct oid for any type (not column) with non-plain storage. Since toaster may support only particular datatype, core should check correctness of toaster set by toaster validate method. New commands could be found in src/test/regress/sql/toaster.sql On-disk toast pointer structure now has one more possible struct - varatt_custom with fixed header and variable tail which uses as a storage for custom toasters. Format of built-in toaster is kept to allow simple pg_upgrade logic. Since toaster for column could be changed during table's lifetime we had two options about toaster's drop operation: - if column's toaster has been changed, then we need to re-toast all values, which could be extremely expensive. In any case, functions/operators should be ready to work with values toasted by different toasters, although any toaster should execute simple toast/detoast operation, which allows any existing code to work with the new approach. Tracking dependency of toasters and rows looks as bad idea. - disallow drop toaster. We don't believe that there will be many toasters at the same time (number of AM isn't very high too and we don't believe that it will be changed significantly in the near future), so prohibition of dropping of toaster looks reasonable. In this patch set we choose second option. Toaster API includes get_vtable method, which is planned to access the custom toaster features which isn't covered by this API. The idea is, that toaster returns some structure with some values and/or pointers to toaster's methods and caller could use it for particular purposes, see patch 4). Kind of structure identified by magic number, which should be a first field in this structure. Also added contrib/dummy_toaster to simplify checking. psql/pg_dump are modified to support toaster object concept. 2) 2_toaster_default_v1.patch.gz https://github.com/postgrespro/postgres/tree/toaster_default Built-in toaster implemented (with some refactoring) uisng toaster API as generic (or default) toaster. dummy_toaster here is a minimal workable example, it saves value directly in toast pointer and fails if value is greater than 1kb. 3) 3_toaster_snapshot_v1.patch.gz https://github.com/postgrespro/postgres/tree/toaster_snapshot The patch implements technology to distinguish row's versions in toasted values to share common parts of toasted values between different versions of rows 4) 4_bytea_appendable_toaster_v1.patch.gz https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster Contrib module implements toaster for non-compressed bytea columns, which allows fast appending to existing bytea value. Appended tail stored directly in toaster pointer, if there is enough place to do it. Note: patch modifies byteacat() to support contrib toaster. Seems, it's looks ugly and contrib module should create new concatenation function. We are open for any questions, discussions, objections and advices. Thank you. Peoples behind: Oleg Bartunov Nikita Gluhov Nikita Malakhov Teodor Sigaev [1] https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru <https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b@sigaev.ru> -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Вложения
On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor@sigaev.ru> wrote: > We are working on custom toaster for JSONB [1], because current TOAST is > universal for any data type and because of that it has some disadvantages: > - "one toast fits all" may be not the best solution for particular > type or/and use cases > - it doesn't know the internal structure of data type, so it cannot > choose an optimal toast strategy > - it can't share common parts between different rows and even > versions of rows Agreed, Oleg has made some very clear analysis of the value of having a higher degree of control over toasting from within the datatype. In my understanding, we want to be able to 1. Access data from a toasted object one slice at a time, by using knowledge of the structure 2. If toasted data is updated, then update a minimum number of slices(s), without rewriting the existing slices 3. If toasted data is expanded, then allownew slices to be appended to the object without rewriting the existing slices > Modification of current toaster for all tasks and cases looks too > complex, moreover, it will not works for custom data types. Postgres > is an extensible database, why not to extent its extensibility even > further, to have pluggable TOAST! We propose an idea to separate > toaster from heap using toaster API similar to table AM API etc. > Following patches are applicable over patch in [1] ISTM that we would want the toast algorithm to be associated with the datatype, not the column? Can you explain your thinking? We already have Expanded toast format, in-memory, which was designed specifically to allow us to access sub-structure of the datatype in-memory. So I was expecting to see an Expanded, on-disk, toast format that roughly matched that concept, since Tom has already shown us the way. (varatt_expanded). This would be usable by both JSON and PostGIS. Some other thoughts: I imagine the data type might want to keep some kind of dictionary inside the main toast pointer, so we could make allowance for some optional datatype-specific private area in the toast pointer itself, allowing a mix of inline and out-of-line data, and/or a table of contents to the slices. I'm thinking could also tackle these things at the same time: * We want to expand TOAST to 64-bit pointers, so we can have more pointers in a table * We want to avoid putting the data length into the toast pointer, so we can allow the toasted data to be expanded without rewriting everything (to avoid O(N^2) cost) -- Simon Riggs http://www.EnterpriseDB.com/
Simon, thank you for your review.
I'll try to give a brief explanation on some topics you've mentioned.
My colleagues would correct me if I miss the point and provide some more details.
>Agreed, Oleg has made some very clear analysis of the value of having
>a higher degree of control over toasting from within the datatype.
Currently we see the biggest flaw in TOAST functionality is that it
does not provide any means for extension and modification except
modifying the core code itself. It is not possible to use any other
TOAST strategy except existing in the core, the same issue is with
assigning different TOAST methods to columns and datatypes.
The main point in this patch is actually to provide an open API and
syntax for creation of new Toasters as pluggable extensions, and
to make an existing (default) toaster to work via this API without
affecting its function. Also, the default toaster is strongly cross-tied
with Heap access, with somewhat unclear code relations (headers,
function locations and calls, etc.) that are not quite good logically
structured and ask to be straightened out.
>In my understanding, we want to be able to
>1. Access data from a toasted object one slice at a time, by using
>knowledge of the structure
>2. If toasted data is updated, then update a minimum number of
>slices(s), without rewriting the existing slices
>3. If toasted data is expanded, then allownew slices to be appended to
>the object without rewriting the existing slices
There are two main ideas behind Pluggable Toaster patch -
First - to provide an extensible API for all Postgres developers, to
be able to develop and plug in custom toasters as independent
extensions for different data types and columns, to use different
toast strategies, access and compression methods, and so on;
Second - to refactor current Toast functionality, to improve Toast
code structure and make it more logically structured and
understandable, to 'detach' default ('generic', as it is currently
named, or maybe the best naming for it to be 'heap') toaster from DBMS
core code, route it through new API and hide all existing internal
specific Toast functionality behind new API.
All the points you mentioned are made available for development by
this patch (and, actually, some are being developed - in the
bytea_appendable_toaster part of this patch or jSONb toaster by Nikita
Glukhov, he could provide much better explanation on this topic).
>> Modification of current toaster for all tasks and cases looks too
>> complex, moreover, it will not works for custom data types. Postgres
>> is an extensible database, why not to extent its extensibility even
>> further, to have pluggable TOAST! We propose an idea to separate
>> toaster from heap using toaster API similar to table AM API etc.
>> Following patches are applicable over patch in [1]
>ISTM that we would want the toast algorithm to be associated with the
>datatype, not the column?
>Can you explain your thinking?
This possibility is considered for future development.
>We already have Expanded toast format, in-memory, which was designed
>specifically to allow us to access sub-structure of the datatype
>in-memory. So I was expecting to see an Expanded, on-disk, toast
>format that roughly matched that concept, since Tom has already shown
>us the way. (varatt_expanded). This would be usable by both JSON and
>PostGIS.
The main disadvantage is that it does not suppose either usage of any
other toasting strategies, or compressions methods except plgz and
lz4.
>Some other thoughts:
>I imagine the data type might want to keep some kind of dictionary
>inside the main toast pointer, so we could make allowance for some
>optional datatype-specific private area in the toast pointer itself,
>allowing a mix of inline and out-of-line data, and/or a table of
>contents to the slices.
It is partly implemented in jSONb custom Toaster, as I mentioned
above, and also could be considered for future improvement of existing
Toaster as an extension.
>I'm thinking could also tackle these things at the same time:
>* We want to expand TOAST to 64-bit pointers, so we can have more
>pointers in a table
This issue is being discussed but not currently implemented, it was
considered as one of the possible future improvements.
>* We want to avoid putting the data length into the toast pointer, so
>we can allow the toasted data to be expanded without rewriting
>everything (to avoid O(N^2) cost)
May I correct you - actual relation is O(N), not O(N^2).
Currently data length is stored outside customized toaster data, in
the varatt_custom structure that is supposed to be used by all custom
(extended) toasters. Data that is specific to some custom Toasted will
be stored inside va_toasterdata structure.
Looking forward to your thoughts on our work.
--
Best regards,
Nikita A. Malakhov
On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor@sigaev.ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows
Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.
In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices
> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it will not works for custom data types. Postgres
> is an extensible database, why not to extent its extensibility even
> further, to have pluggable TOAST! We propose an idea to separate
> toaster from heap using toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]
ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?
We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.
Some other thoughts:
I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.
I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)
--
Simon Riggs http://www.EnterpriseDB.com/
> In my understanding, we want to be able to > 1. Access data from a toasted object one slice at a time, by using > knowledge of the structure > 2. If toasted data is updated, then update a minimum number of > slices(s), without rewriting the existing slices > 3. If toasted data is expanded, then allownew slices to be appended to > the object without rewriting the existing slices There are more options: 1 share common parts between not only versions of row but between all rows in a column. Seems strange but examples: - urls often have a common prefix and so storing in a prefix tree (as SP-GiST does) allows significantly decrease storage size - the same for json - it's often use case with common part of its hierarchical structure - one more usecase for json. If json use only a few schemes (structure) it's possible to store in toast storage only values and don't store keys and structure 2 Current toast storage stores chunks in heap accesses method and to provide fast access by toast id it makes an index. Ideas: - store chunks directly in btree tree, pgsql's btree already has an INCLUDE columns, so, chunks and visibility data will be stored only in leaf pages. Obviously it reduces number of disk's access for "untoasting". - use another access method for chunk storage > ISTM that we would want the toast algorithm to be associated with the > datatype, not the column? > Can you explain your thinking? Hm. I'll try to explain my motivation. 1) Datatype could have more than one suitable toasters. For different usecases: fast retrieving, compact storage, fast update etc. As I told above, for jsonb there are several optimal strategies for toasting: for values with a few different structures, for close to hierarchical structures, for values with different parts by access mode (easy to imagine json with some keys used for search and some keys only for output to user) 2) Toaster could be designed to work with different data type. Suggested appendable toaster is designed to work with bytea but could work with text Looking on this point I have doubts where to store connection between toaster and datatype. If we add toasteroid to pg_type how to deal with several toaster for one datatype? (And we could want to has different toaster on one table!) If we add typoid to pg_toaster then how it will work with several datatypes? An idea to add a new many-to-many connection table seems workable but here there are another questions, such as will any toaster work with any table access method? To resolve this bundle of question we propose validate() method of toaster, which should be called during DDL operation, i.e. toaster is assigned to column or column's datatype is changed. More thought: Now postgres has two options for column: storage and compression and now we add toaster. For me it seems too redundantly. Seems, storage should be binary value: inplace (plain as now) and toastable. All other variation such as toast limit, compression enabling, compression kind should be an per-column option for toaster (that's why we suggest valid toaster oid for any column with varlena/toastable datatype). It looks like a good abstraction but we will have a problem with backward compatibility and I'm afraid I can't implement it very fast. > > We already have Expanded toast format, in-memory, which was designed > specifically to allow us to access sub-structure of the datatype > in-memory. So I was expecting to see an Expanded, on-disk, toast > format that roughly matched that concept, since Tom has already shown > us the way. (varatt_expanded). This would be usable by both JSON and > PostGIS. Hm, I don't understand. varatt_custom has variable-length tail which toaster could use it by any way, appandable toaster use it to store appended tail. > > > Some other thoughts: > > I imagine the data type might want to keep some kind of dictionary > inside the main toast pointer, so we could make allowance for some > optional datatype-specific private area in the toast pointer itself, > allowing a mix of inline and out-of-line data, and/or a table of > contents to the slices. > > I'm thinking could also tackle these things at the same time: > * We want to expand TOAST to 64-bit pointers, so we can have more > pointers in a table > * We want to avoid putting the data length into the toast pointer, so > we can allow the toasted data to be expanded without rewriting > everything (to avoid O(N^2) cost) Right -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
I'd like to ask your opinion for next small questions 1) May be, we could add toasteroid to pg_type to for default toaster for datatype. ALTER TYPE type SET DEFAULT TOASTER toaster; 2) The name of default toaster is deftoaster, which was choosen at night, may be heap_toaster is better? heap because default toaster stores chunks in heap table. Thank you! -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 1/14/22 19:41, Teodor Sigaev wrote: > >> In my understanding, we want to be able to >> 1. Access data from a toasted object one slice at a time, by using >> knowledge of the structure >> 2. If toasted data is updated, then update a minimum number of >> slices(s), without rewriting the existing slices >> 3. If toasted data is expanded, then allownew slices to be appended to >> the object without rewriting the existing slices > > There are more options: > 1 share common parts between not only versions of row but between all > rows in a column. Seems strange but examples: > - urls often have a common prefix and so storing in a prefix tree (as > SP-GiST does) allows significantly decrease storage size > - the same for json - it's often use case with common part of its > hierarchical structure > - one more usecase for json. If json use only a few schemes > (structure) it's possible to store in toast storage only values and > don't store keys and structure This sounds interesting, but very much like column compression, which was proposed some time ago. If we haven't made much progrees with that patch (AFAICS), what's the likelihood we'll succeed here, when it's combined with yet more complexity? Maybe doing that kind of compression in TOAST is somehow simpler, but I don't see it. > 2 Current toast storage stores chunks in heap accesses method and to > provide fast access by toast id it makes an index. Ideas: > - store chunks directly in btree tree, pgsql's btree already has an > INCLUDE columns, so, chunks and visibility data will be stored only > in leaf pages. Obviously it reduces number of disk's access for > "untoasting". > - use another access method for chunk storage > Maybe, but that probably requires more thought - e.g. btree requires the values to be less than 1/3 page, so I wonder how would that play with toasting of values. >> ISTM that we would want the toast algorithm to be associated with the >> datatype, not the column? >> Can you explain your thinking? > Hm. I'll try to explain my motivation. > 1) Datatype could have more than one suitable toasters. For different > usecases: fast retrieving, compact storage, fast update etc. As I > told above, for jsonb there are several optimal strategies for > toasting: for values with a few different structures, for close to > hierarchical structures, for values with different parts by access > mode (easy to imagine json with some keys used for search and some > keys only for output to user) > 2) Toaster could be designed to work with different data type. Suggested > appendable toaster is designed to work with bytea but could work with > text > > Looking on this point I have doubts where to store connection between > toaster and datatype. If we add toasteroid to pg_type how to deal with > several toaster for one datatype? (And we could want to has different > toaster on one table!) If we add typoid to pg_toaster then how it will > work with several datatypes? An idea to add a new many-to-many > connection table seems workable but here there are another questions, > such as will any toaster work with any table access method? > > To resolve this bundle of question we propose validate() method of > toaster, which should be called during DDL operation, i.e. toaster is > assigned to column or column's datatype is changed. > Seems you'd need a mapping table, to allow M:N mapping between types and toasters, linking it to all "compatible" types. It's not clear to me how would this work with custom data types, domains etc. Also, what happens to existing values when you change the toaster? What if the toasters don't use the same access method to store the chunks (heap vs. btree)? And so on. > More thought: > Now postgres has two options for column: storage and compression and now > we add toaster. For me it seems too redundantly. Seems, storage should > be binary value: inplace (plain as now) and toastable. All other > variation such as toast limit, compression enabling, compression kind > should be an per-column option for toaster (that's why we suggest valid > toaster oid for any column with varlena/toastable datatype). It looks > like a good abstraction but we will have a problem with backward > compatibility and I'm afraid I can't implement it very fast. > So you suggest we move all of this to toaster? I'd say -1 to that, because it makes it much harder to e.g. add custom compression method, etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>was proposed some time ago. If we haven't made much progrees with that
>patch (AFAICS), what's the likelihood we'll succeed here, when it's
>combined with yet more complexity?
The main concern is that this patch provides open API for Toast functionality
>Maybe doing that kind of compression in TOAST is somehow simpler, but I
>don't see it.
>toasters, linking it to all "compatible" types. It's not clear to me how
>would this work with custom data types, domains etc.
Any suitable [custom] Toaster could be plugged in for any table column,
>Also, what happens to existing values when you change the toaster? What
>if the toasters don't use the same access method to store the chunks
>(heap vs. btree)? And so on.
>because it makes it much harder to e.g. add custom compression method, etc.
On 1/14/22 19:41, Teodor Sigaev wrote:
>
>> In my understanding, we want to be able to
>> 1. Access data from a toasted object one slice at a time, by using
>> knowledge of the structure
>> 2. If toasted data is updated, then update a minimum number of
>> slices(s), without rewriting the existing slices
>> 3. If toasted data is expanded, then allownew slices to be appended to
>> the object without rewriting the existing slices
>
> There are more options:
> 1 share common parts between not only versions of row but between all
> rows in a column. Seems strange but examples:
> - urls often have a common prefix and so storing in a prefix tree (as
> SP-GiST does) allows significantly decrease storage size
> - the same for json - it's often use case with common part of its
> hierarchical structure
> - one more usecase for json. If json use only a few schemes
> (structure) it's possible to store in toast storage only values and
> don't store keys and structure
This sounds interesting, but very much like column compression, which
was proposed some time ago. If we haven't made much progrees with that
patch (AFAICS), what's the likelihood we'll succeed here, when it's
combined with yet more complexity?
Maybe doing that kind of compression in TOAST is somehow simpler, but I
don't see it.
> 2 Current toast storage stores chunks in heap accesses method and to
> provide fast access by toast id it makes an index. Ideas:
> - store chunks directly in btree tree, pgsql's btree already has an
> INCLUDE columns, so, chunks and visibility data will be stored only
> in leaf pages. Obviously it reduces number of disk's access for
> "untoasting".
> - use another access method for chunk storage
>
Maybe, but that probably requires more thought - e.g. btree requires the
values to be less than 1/3 page, so I wonder how would that play with
toasting of values.
>> ISTM that we would want the toast algorithm to be associated with the
>> datatype, not the column?
>> Can you explain your thinking?
> Hm. I'll try to explain my motivation.
> 1) Datatype could have more than one suitable toasters. For different
> usecases: fast retrieving, compact storage, fast update etc. As I
> told above, for jsonb there are several optimal strategies for
> toasting: for values with a few different structures, for close to
> hierarchical structures, for values with different parts by access
> mode (easy to imagine json with some keys used for search and some
> keys only for output to user)
> 2) Toaster could be designed to work with different data type. Suggested
> appendable toaster is designed to work with bytea but could work with
> text
>
> Looking on this point I have doubts where to store connection between
> toaster and datatype. If we add toasteroid to pg_type how to deal with
> several toaster for one datatype? (And we could want to has different
> toaster on one table!) If we add typoid to pg_toaster then how it will
> work with several datatypes? An idea to add a new many-to-many
> connection table seems workable but here there are another questions,
> such as will any toaster work with any table access method?
>
> To resolve this bundle of question we propose validate() method of
> toaster, which should be called during DDL operation, i.e. toaster is
> assigned to column or column's datatype is changed.
>
Seems you'd need a mapping table, to allow M:N mapping between types and
toasters, linking it to all "compatible" types. It's not clear to me how
would this work with custom data types, domains etc.
Also, what happens to existing values when you change the toaster? What
if the toasters don't use the same access method to store the chunks
(heap vs. btree)? And so on.
> More thought:
> Now postgres has two options for column: storage and compression and now
> we add toaster. For me it seems too redundantly. Seems, storage should
> be binary value: inplace (plain as now) and toastable. All other
> variation such as toast limit, compression enabling, compression kind
> should be an per-column option for toaster (that's why we suggest valid
> toaster oid for any column with varlena/toastable datatype). It looks
> like a good abstraction but we will have a problem with backward
> compatibility and I'm afraid I can't implement it very fast.
>
So you suggest we move all of this to toaster? I'd say -1 to that,
because it makes it much harder to e.g. add custom compression method, etc.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi! > Maybe doing that kind of compression in TOAST is somehow simpler, but I > don't see it. Seems, in ideal world, compression should be inside toaster. > >> 2 Current toast storage stores chunks in heap accesses method and to >> provide fast access by toast id it makes an index. Ideas: >> - store chunks directly in btree tree, pgsql's btree already has an >> INCLUDE columns, so, chunks and visibility data will be stored only >> in leaf pages. Obviously it reduces number of disk's access for >> "untoasting". >> - use another access method for chunk storage >> > > Maybe, but that probably requires more thought - e.g. btree requires the > values to be less than 1/3 page, so I wonder how would that play with > toasting of values. That's ok, because chunk size is 2000 bytes right now and its could be saved. > > Seems you'd need a mapping table, to allow M:N mapping between types and > toasters, linking it to all "compatible" types. It's not clear to me how > would this work with custom data types, domains etc. If toaster will look into internal structure then it should know type's binary format. So, new custom types have a little chance to work with old custom toaster. Default toaster works with any types. > > Also, what happens to existing values when you change the toaster? What > if the toasters don't use the same access method to store the chunks > (heap vs. btree)? And so on. vatatt_custom contains an oid of toaster and toaster is not allowed to delete (at least, in suggested patches). So, if column's toaster has been changed then old values will be detoasted by toaster pointed in varatt_custom structure, not in column definition. This is very similar to storage attribute works: we we alter storage attribute only new values will be stored with pointed storage type. > >> More thought: >> Now postgres has two options for column: storage and compression and >> now we add toaster. For me it seems too redundantly. Seems, storage >> should be binary value: inplace (plain as now) and toastable. All >> other variation such as toast limit, compression enabling, compression >> kind should be an per-column option for toaster (that's why we suggest >> valid toaster oid for any column with varlena/toastable datatype). It >> looks like a good abstraction but we will have a problem with backward >> compatibility and I'm afraid I can't implement it very fast. >> > > So you suggest we move all of this to toaster? I'd say -1 to that, > because it makes it much harder to e.g. add custom compression method, etc. Hmm, I suggested to leave only toaster at upper level. Compression kind could be chosen in toaster's options (not implemented yet) or even make an API interface to compression to make it configurable. Right now, module developer could not implement a module with new compression method and it is a disadvantage. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 1/18/22 15:56, Teodor Sigaev wrote: > Hi! > >> Maybe doing that kind of compression in TOAST is somehow simpler, but >> I don't see it. > Seems, in ideal world, compression should be inside toaster. > I'm not convinced that's universally true. Yes, I'm sure certain TOAST implementations would benefit from tighter control over compression, but does that imply compression and toast are redundant? I doubt that, because we compress non-toasted types too, for example. And layering has a value too, as makes it easier to replace the pieces. >> >>> 2 Current toast storage stores chunks in heap accesses method and to >>> provide fast access by toast id it makes an index. Ideas: >>> - store chunks directly in btree tree, pgsql's btree already has an >>> INCLUDE columns, so, chunks and visibility data will be stored only >>> in leaf pages. Obviously it reduces number of disk's access for >>> "untoasting". >>> - use another access method for chunk storage >>> >> >> Maybe, but that probably requires more thought - e.g. btree requires >> the values to be less than 1/3 page, so I wonder how would that play >> with toasting of values. > That's ok, because chunk size is 2000 bytes right now and its could be > saved. >> Perhaps. My main point is that we should not be making too many radical changes at once - it makes it much harder to actually get anything done. So yeah, doing TOAST through IOT might be interesting, but I'd leave that for a separate patch. > >> Seems you'd need a mapping table, to allow M:N mapping between types >> and toasters, linking it to all "compatible" types. It's not clear to >> me how would this work with custom data types, domains etc. > If toaster will look into internal structure then it should know type's > binary format. So, new custom types have a little chance to work with > old custom toaster. Default toaster works with any types. The question is what happens when you combine data type with a toaster that is not designed for that type. I mean, imagine you have a JSONB toaster and you set it for a bytea column. Naive implementation will just crash, because it'll try to process bytea as if it was JSONB. It seems better to prevent such incompatible combinations and restrict each toaster to just compatible data types, and the mapping table (linking toaster and data types) seems a way to do that. However, it seems toasters are either generic (agnostic to data types, treating everything as bytea) or specialized. I doubt any specialized toaster can reasonably support multiple data types, so maybe each toaster can have just one "compatible type" OID. If it's invalid, it'd be "generic" and otherwise it's useful for that type and types derived from it (e.g. domains). So you'd have the toaster OID in two places: pg_type.toaster_oid - default toaster for the type pg_attribute.toaster_oid - current toaster for this column and then you'd have pg_toaster.typid - type this toaster handles (or InvalidOid for generic) >> >> Also, what happens to existing values when you change the toaster? >> What if the toasters don't use the same access method to store the >> chunks (heap vs. btree)? And so on. > > vatatt_custom contains an oid of toaster and toaster is not allowed to > delete (at least, in suggested patches). So, if column's toaster has > been changed then old values will be detoasted by toaster pointed in > varatt_custom structure, not in column definition. This is very similar > to storage attribute works: we we alter storage attribute only new > values will be stored with pointed storage type. > IIRC we do this for compression methods, right? >> >>> More thought: >>> Now postgres has two options for column: storage and compression and >>> now we add toaster. For me it seems too redundantly. Seems, storage >>> should be binary value: inplace (plain as now) and toastable. All >>> other variation such as toast limit, compression enabling, >>> compression kind should be an per-column option for toaster (that's >>> why we suggest valid toaster oid for any column with >>> varlena/toastable datatype). It looks like a good abstraction but we >>> will have a problem with backward compatibility and I'm afraid I >>> can't implement it very fast. >>> >> >> So you suggest we move all of this to toaster? I'd say -1 to that, >> because it makes it much harder to e.g. add custom compression method, >> etc. > Hmm, I suggested to leave only toaster at upper level. Compression kind > could be chosen in toaster's options (not implemented yet) or even make > an API interface to compression to make it configurable. Right now, > module developer could not implement a module with new compression > method and it is a disadvantage. If you have to implement custom toaster to implement custom compression method, doesn't that make things more complex? You'd have to solve all the issues for custom compression methods and also all issues for custom toaster. Also, what if you want to just compress the column, not toast? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>implementations would benefit from tighter control over compression, but
>does that imply compression and toast are redundant? I doubt that,
>because we compress non-toasted types too, for example. And layering has
>a value too, as makes it easier to replace the pieces.
>changes at once - it makes it much harder to actually get anything done.
>So yeah, doing TOAST through IOT might be interesting, but I'd leave
>that for a separate patch.
>each toaster to just compatible data types, and the mapping table
>(linking toaster and data types) seems a way to do that.
>method, doesn't that make things more complex? You'd have to solve all
>the issues for custom compression methods and also all issues for custom
>toaster. Also, what if you want to just compress the column, not toast?
On 1/18/22 15:56, Teodor Sigaev wrote:
> Hi!
>
>> Maybe doing that kind of compression in TOAST is somehow simpler, but
>> I don't see it.
> Seems, in ideal world, compression should be inside toaster.
>
I'm not convinced that's universally true. Yes, I'm sure certain TOAST
implementations would benefit from tighter control over compression, but
does that imply compression and toast are redundant? I doubt that,
because we compress non-toasted types too, for example. And layering has
a value too, as makes it easier to replace the pieces.
>>
>>> 2 Current toast storage stores chunks in heap accesses method and to
>>> provide fast access by toast id it makes an index. Ideas:
>>> - store chunks directly in btree tree, pgsql's btree already has an
>>> INCLUDE columns, so, chunks and visibility data will be stored only
>>> in leaf pages. Obviously it reduces number of disk's access for
>>> "untoasting".
>>> - use another access method for chunk storage
>>>
>>
>> Maybe, but that probably requires more thought - e.g. btree requires
>> the values to be less than 1/3 page, so I wonder how would that play
>> with toasting of values.
> That's ok, because chunk size is 2000 bytes right now and its could be
> saved.
>>
Perhaps. My main point is that we should not be making too many radical
changes at once - it makes it much harder to actually get anything done.
So yeah, doing TOAST through IOT might be interesting, but I'd leave
that for a separate patch.
>
>> Seems you'd need a mapping table, to allow M:N mapping between types
>> and toasters, linking it to all "compatible" types. It's not clear to
>> me how would this work with custom data types, domains etc.
> If toaster will look into internal structure then it should know type's
> binary format. So, new custom types have a little chance to work with
> old custom toaster. Default toaster works with any types.
The question is what happens when you combine data type with a toaster
that is not designed for that type. I mean, imagine you have a JSONB
toaster and you set it for a bytea column. Naive implementation will
just crash, because it'll try to process bytea as if it was JSONB.
It seems better to prevent such incompatible combinations and restrict
each toaster to just compatible data types, and the mapping table
(linking toaster and data types) seems a way to do that.
However, it seems toasters are either generic (agnostic to data types,
treating everything as bytea) or specialized. I doubt any specialized
toaster can reasonably support multiple data types, so maybe each
toaster can have just one "compatible type" OID. If it's invalid, it'd
be "generic" and otherwise it's useful for that type and types derived
from it (e.g. domains).
So you'd have the toaster OID in two places:
pg_type.toaster_oid - default toaster for the type
pg_attribute.toaster_oid - current toaster for this column
and then you'd have
pg_toaster.typid - type this toaster handles (or InvalidOid for generic)
>>
>> Also, what happens to existing values when you change the toaster?
>> What if the toasters don't use the same access method to store the
>> chunks (heap vs. btree)? And so on.
>
> vatatt_custom contains an oid of toaster and toaster is not allowed to
> delete (at least, in suggested patches). So, if column's toaster has
> been changed then old values will be detoasted by toaster pointed in
> varatt_custom structure, not in column definition. This is very similar
> to storage attribute works: we we alter storage attribute only new
> values will be stored with pointed storage type.
>
IIRC we do this for compression methods, right?
>>
>>> More thought:
>>> Now postgres has two options for column: storage and compression and
>>> now we add toaster. For me it seems too redundantly. Seems, storage
>>> should be binary value: inplace (plain as now) and toastable. All
>>> other variation such as toast limit, compression enabling,
>>> compression kind should be an per-column option for toaster (that's
>>> why we suggest valid toaster oid for any column with
>>> varlena/toastable datatype). It looks like a good abstraction but we
>>> will have a problem with backward compatibility and I'm afraid I
>>> can't implement it very fast.
>>>
>>
>> So you suggest we move all of this to toaster? I'd say -1 to that,
>> because it makes it much harder to e.g. add custom compression method,
>> etc.
> Hmm, I suggested to leave only toaster at upper level. Compression kind
> could be chosen in toaster's options (not implemented yet) or even make
> an API interface to compression to make it configurable. Right now,
> module developer could not implement a module with new compression
> method and it is a disadvantage.
If you have to implement custom toaster to implement custom compression
method, doesn't that make things more complex? You'd have to solve all
the issues for custom compression methods and also all issues for custom
toaster. Also, what if you want to just compress the column, not toast?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Dec 30, 2021 at 11:40 AM Teodor Sigaev <teodor@sigaev.ru> wrote: > We are working on custom toaster for JSONB [1], because current TOAST is > universal for any data type and because of that it has some disadvantages: > - "one toast fits all" may be not the best solution for particular > type or/and use cases > - it doesn't know the internal structure of data type, so it cannot > choose an optimal toast strategy > - it can't share common parts between different rows and even > versions of rows I agree ... but I'm also worried about what happens when we have multiple table AMs. One can imagine a new table AM that is specifically optimized for TOAST which can be used with an existing heap table. One can imagine a new table AM for the main table that wants to use something different for TOAST. So, I don't think it's right to imagine that the choice of TOASTer depends solely on the column data type. I'm not really sure how this should work exactly ... but it needs careful thought. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Dec 30, 2021 at 11:40 AM Teodor Sigaev <teodor@sigaev.ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows
I agree ... but I'm also worried about what happens when we have
multiple table AMs. One can imagine a new table AM that is
specifically optimized for TOAST which can be used with an existing
heap table. One can imagine a new table AM for the main table that
wants to use something different for TOAST. So, I don't think it's
right to imagine that the choice of TOASTer depends solely on the
column data type. I'm not really sure how this should work exactly ...
but it needs careful thought.
--
Robert Haas
EDB: http://www.enterprisedb.com
> I agree ... but I'm also worried about what happens when we have > multiple table AMs. One can imagine a new table AM that is > specifically optimized for TOAST which can be used with an existing > heap table. One can imagine a new table AM for the main table that > wants to use something different for TOAST. So, I don't think it's > right to imagine that the choice of TOASTer depends solely on the > column data type. I'm not really sure how this should work exactly ... > but it needs careful thought. Right. that's why we propose a validate method (may be, it's a wrong name, but I don't known better one) which accepts several arguments, one of which is table AM oid. If that method returns false then toaster isn't useful with current TAM, storage or/and compression kinds, etc. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.
Right. that's why we propose a validate method (may be, it's a wrong
name, but I don't known better one) which accepts several arguments, one
of which is table AM oid. If that method returns false then toaster
isn't useful with current TAM, storage or/and compression kinds, etc.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Вложения
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Mar 11 13:47:26 2022 -0500
https://github.com/postgrespro/postgres/tree/toaster_interface
Introduces syntax for storage and formal toaster API.
2) 2_toaster_default_v2.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring) uisng toaster API
as generic (or default) toaster.
3) 3_toaster_snapshot_v2.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows
4) 4_bytea_appendable_toaster_v2.patch.gz
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value.
> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.
Right. that's why we propose a validate method (may be, it's a wrong
name, but I don't known better one) which accepts several arguments, one
of which is table AM oid. If that method returns false then toaster
isn't useful with current TAM, storage or/and compression kinds, etc.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Вложения
Hi, On 2022-03-22 02:31:21 +0300, Nikita Malakhov wrote: > Hi Hackers, > Because of 3 months have passed since Pluggable Toaster presentation and a > lot of > commits were pushed into v15 master - we would like to re-introduce > this patch > rebased onto actual master. Last commit being used - > commit 641f3dffcdf1c7378cfb94c98b6642793181d6db (origin/master) > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri Mar 11 13:47:26 2022 -0500 It currently fails to apply: http://cfbot.cputube.org/patch_37_3490.log Given the size of the patch, and the degree of review it has gotten so far, it seems not realistically a fit for 15, but is marked as such. Think it should be moved to the next CF. Marked as waiting-on-author for now. Greetings, Andres Freund
Hi,
On 2022-03-22 02:31:21 +0300, Nikita Malakhov wrote:
> Hi Hackers,
> Because of 3 months have passed since Pluggable Toaster presentation and a
> lot of
> commits were pushed into v15 master - we would like to re-introduce
> this patch
> rebased onto actual master. Last commit being used -
> commit 641f3dffcdf1c7378cfb94c98b6642793181d6db (origin/master)
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Fri Mar 11 13:47:26 2022 -0500
It currently fails to apply: http://cfbot.cputube.org/patch_37_3490.log
Given the size of the patch, and the degree of review it has gotten so far, it
seems not realistically a fit for 15, but is marked as such.
Think it should be moved to the next CF. Marked as waiting-on-author for now.
Greetings,
Andres Freund
Вложения
It looks like it's still not actually building on some compilers. I see a bunch of warnings as well as an error: [03:53:24.660] dummy_toaster.c:97:2: error: void function 'dummyDelete' should not return a value [-Wreturn-type] Also the "publication" regression test needs to be adjusted as it includes \d+ output which has changed to include the toaster.
It looks like it's still not actually building on some compilers. I
see a bunch of warnings as well as an error:
[03:53:24.660] dummy_toaster.c:97:2: error: void function
'dummyDelete' should not return a value [-Wreturn-type]
Also the "publication" regression test needs to be adjusted as it
includes \d+ output which has changed to include the toaster.
Вложения
Hm. It compiles but it's failing regression tests: diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out --- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out 2022-04-02 16:02:47.874360253 +0000 +++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out 2022-04-02 16:07:20.878047769 +0000 @@ -20,186 +20,7 @@ .... +server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. +connection to server was lost I think this will require some real debugging, so I'm marking this Waiting on Author.
Hi, On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote: >Hm. It compiles but it's failing regression tests: > >diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out >/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out >--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out >2022-04-02 16:02:47.874360253 +0000 >+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out >2022-04-02 16:07:20.878047769 +0000 >@@ -20,186 +20,7 @@ >.... >+server closed the connection unexpectedly >+ This probably means the server terminated abnormally >+ before or while processing the request. >+connection to server was lost >I think this will require some real debugging, so I'm marking this >Waiting on Author. Yes, dumps core (just like in several previous runs): https://cirrus-ci.com/task/4710272324599808?logs=cores#L44 Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi,
On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote:
>Hm. It compiles but it's failing regression tests:
>
>diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>2022-04-02 16:02:47.874360253 +0000
>+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>2022-04-02 16:07:20.878047769 +0000
>@@ -20,186 +20,7 @@
>....
>+server closed the connection unexpectedly
>+ This probably means the server terminated abnormally
>+ before or while processing the request.
>+connection to server was lost
>I think this will require some real debugging, so I'm marking this
>Waiting on Author.
Yes, dumps core (just like in several previous runs):
https://cirrus-ci.com/task/4710272324599808?logs=cores#L44
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi,
On April 2, 2022 6:20:36 PM PDT, Greg Stark <stark@mit.edu> wrote:
>Hm. It compiles but it's failing regression tests:
>
>diff -U3 /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>/tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>--- /tmp/cirrus-ci-build/contrib/dummy_toaster/expected/dummy_toaster.out
>2022-04-02 16:02:47.874360253 +0000
>+++ /tmp/cirrus-ci-build/contrib/dummy_toaster/results/dummy_toaster.out
>2022-04-02 16:07:20.878047769 +0000
>@@ -20,186 +20,7 @@
>....
>+server closed the connection unexpectedly
>+ This probably means the server terminated abnormally
>+ before or while processing the request.
>+connection to server was lost
>I think this will require some real debugging, so I'm marking this
>Waiting on Author.
Yes, dumps core (just like in several previous runs):
https://cirrus-ci.com/task/4710272324599808?logs=cores#L44
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Вложения
On Mon, Apr 4, 2022 at 12:13 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > I'm really sorry. Messed up some code while merging rebased branches with previous (v1) > patches issued in December and haven't noticed that seg fault because of corrupted code > while running check-world. > I've fixed messed code in Dummy Toaster package and checked twice - all's correct now, > patches are applied correctly and tests are clean. > Thank you for pointing out the error and for your patience! Hi, This patch set doesn't seem anywhere close to committable to me. For example: - It apparently adds a new command called CREATE TOASTER but that command doesn't seem to be documented anywhere. - contrib/dummy_toaster is added in patch 1 with a no implementation and code comments that say "Bloom index utilities" and then those comments are fixed and an implementation is added in later patches. - What is supposedly patch 1 is actually 32 patch files concatenated together. It doesn't apply properly either with 'patch' or 'git am' so I don't understand what we would even do with this. - None of these patches have appropriate commit messages. - Many of these patches add 'XXX' comments or errors or other obviously unfinished bits of code. Some of this may be cleaned up by later patches but it's hard to tell because right now one can't even apply the patch set properly. Even if that were possible, it's the job of the person submitting the patch to organize the patch into independent, separately committable chunks that are self-contained and have good comments and good commit messages for each one. I don't think based on the status of this patch set that it's even possible to provide useful feedback on the design at this point, never mind getting anything committed. -- Robert Haas EDB: http://www.enterprisedb.com
Thanks for the review Robert. I think that gives some pretty actionable advice on how to improve the patch and it doesn't seem likely to get much more in this cycle. I'll mark the patch Returned with Feedback. Hope to see it come back with improvements in the next release.
On Mon, Apr 4, 2022 at 12:13 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I'm really sorry. Messed up some code while merging rebased branches with previous (v1)
> patches issued in December and haven't noticed that seg fault because of corrupted code
> while running check-world.
> I've fixed messed code in Dummy Toaster package and checked twice - all's correct now,
> patches are applied correctly and tests are clean.
> Thank you for pointing out the error and for your patience!
Hi,
This patch set doesn't seem anywhere close to committable to me. For example:
- It apparently adds a new command called CREATE TOASTER but that
command doesn't seem to be documented anywhere.
- contrib/dummy_toaster is added in patch 1 with a no implementation
and code comments that say "Bloom index utilities" and then those
comments are fixed and an implementation is added in later patches.
- What is supposedly patch 1 is actually 32 patch files concatenated
together. It doesn't apply properly either with 'patch' or 'git am' so
I don't understand what we would even do with this.
- None of these patches have appropriate commit messages.
- Many of these patches add 'XXX' comments or errors or other
obviously unfinished bits of code. Some of this may be cleaned up by
later patches but it's hard to tell because right now one can't even
apply the patch set properly. Even if that were possible, it's the job
of the person submitting the patch to organize the patch into
independent, separately committable chunks that are self-contained and
have good comments and good commit messages for each one.
I don't think based on the status of this patch set that it's even
possible to provide useful feedback on the design at this point, never
mind getting anything committed.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > - Is 'git apply' not a valid way to apply such patches? I have found that it never works. This case is no exception: [rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch /Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace. toasterapi.o /Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace. { /Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace. * CREATE TOASTER name HANDLER handler_name /Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace. * va_toasterdata could contain varatt_external structure for old Toast /Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace. SELECT attnum, attname, atttypid, attstorage, tsrname error: patch failed: src/backend/commands/tablecmds.c:42 error: src/backend/commands/tablecmds.c: patch does not apply error: patch failed: src/backend/commands/tablecmds.c:943 error: src/backend/commands/tablecmds.c: patch does not apply error: patch failed: src/backend/commands/tablecmds.c:973 error: src/backend/commands/tablecmds.c: patch does not apply error: patch failed: src/backend/commands/tablecmds.c:44 error: src/backend/commands/tablecmds.c: patch does not apply I would really encourage you to use 'git format-patch' to generate a stack of patches. But there is no point in reposting 30+ patches that haven't been properly refactored into separate chunks. You need to maintain a branch, periodically rebased over master, with some probably-small number of patches on it, each of which is a logically independent patch with its own commit message, its own clear purpose, etc. And then generate patches to post from there using 'git format-patch'. Look into using 'git rebase -i --autosquash' and 'git commit --fixup' to maintain the branch, if you're not already familiar with those things. Also, it is a really good idea when you post the patch set to include in the email a clear description of the overall purpose of the patch set and what each patch does toward that goal. e.g. "The overall goal of this patch set is to support faster-than-light travel. Currently, PostgreSQL does not know anything about the speed of light, so 0001 adds some code for speed-of-light detection. Building on this, 0002 adds general support for disabling physical laws of the universe. Then, 0003 makes use of this support to disable specifically the speed of light." Perhaps you want a little more text than that for each patch, depending on the situation, but this gives you the idea, I hope. -- Robert Haas EDB: http://www.enterprisedb.com
https://github.com/postgrespro/postgres/tree/toaster_interface
Introduces syntax for storage and formal toaster API. Adds column
atttoaster to pg_attribute, by design this column should not be equal to
invalid oid for any toastable datatype, ie it must have correct oid for
any type (not column) with non-plain storage. Since toaster may support
only particular datatype, core should check correctness of toaster set
by toaster validate method. New commands could be found in
src/test/regress/sql/toaster.sql. Also includes modification of pg_dump.
2) 2_toaster_default
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring) using toaster API
as generic (or default) toaster. dummy_toaster here is a minimal
workable example, it saves value directly in toast pointer and fails if
value is greater than 1kb.
3) 3_toaster_snapshot
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows
4) 4_bytea_appendable_toaster
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value. Appended tail
stored directly in toaster pointer, if there is enough space to do it.
Working on refactoring according to your recommendations.
On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?
I have found that it never works. This case is no exception:
[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply
I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.
Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?
I have found that it never works. This case is no exception:
[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply
I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.
Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.
--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения
- 0005_bytea_appendable_toaster_v5.patch.gz
- 0004_toaster_snapshot_v5.patch.gz
- 0001_create_table_storage_v3.patch.gz
- 0003_toaster_default_v5.patch.gz
- 0002_toaster_interface_v6.patch.gz
- 0006_toasterapi_docs_v1.patch.gz
- 0008_fix_toast_tuple_externalize.patch.gz
- 0007_fix_alignment_of_custom_toast_pointers.patch.gz
Hi,I reworked previous patch set according to recommendations. Patchesare generated by format-patch and applied by git am. Patches are based onmaster from 03.11. Also, now we've got clean branch with incremental commitswhich could be easily rebased onto a fresh master.Currently, there are 8 patches:1) 0001_create_table_storage_v3.patch - SET STORAGE option for CREATETABLE command by Teodor Sigaev which is required by all the following functionality;2) 0002_toaster_interface_v6.patch - Toaster API (SQL syntax for toasters + API)with Dummy toaster as an example of how this API should be used, but with defaulttoaster left 'as-is';3) 0003_toaster_default_v5.patch - default (regular) toaster is implementedvia new API;4) 0004_toaster_snapshot_v5.patch - refactoring of default toaster and supportof versioned toasted rows;5) 0005_bytea_appendable_toaster_v5.patch - bytea toaster by Nikita GlukhovCustom toaster for bytea data with support of appending (instead of rewriting)stored data;6) 0006_toasterapi_docs_v1.patch - brief documentation on Toaster API in Pg docs;7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer'salignment required by bytea toaster by Nikita Glukhov;8) 0008_fix_toast_tuple_externalize.patch - fixes toast_tuple_externalize functionnot to call toast if old data is the same as new one.I would be grateful for feedback on the reworked patch set.On Mon, Apr 4, 2022 at 11:18 PM Robert Haas <robertmhaas@gmail.com> wrote:On Mon, Apr 4, 2022 at 4:05 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> - Is 'git apply' not a valid way to apply such patches?
I have found that it never works. This case is no exception:
[rhaas pgsql]$ git apply ~/Downloads/1_toaster_interface_v4.patch
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:253: trailing whitespace.
toasterapi.o
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1276: trailing whitespace.
{
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:1294: trailing whitespace.
* CREATE TOASTER name HANDLER handler_name
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:2261: trailing whitespace.
* va_toasterdata could contain varatt_external structure for old Toast
/Users/rhaas/Downloads/1_toaster_interface_v4.patch:3047: trailing whitespace.
SELECT attnum, attname, atttypid, attstorage, tsrname
error: patch failed: src/backend/commands/tablecmds.c:42
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:943
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:973
error: src/backend/commands/tablecmds.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:44
error: src/backend/commands/tablecmds.c: patch does not apply
I would really encourage you to use 'git format-patch' to generate a
stack of patches. But there is no point in reposting 30+ patches that
haven't been properly refactored into separate chunks. You need to
maintain a branch, periodically rebased over master, with some
probably-small number of patches on it, each of which is a logically
independent patch with its own commit message, its own clear purpose,
etc. And then generate patches to post from there using 'git
format-patch'. Look into using 'git rebase -i --autosquash' and 'git
commit --fixup' to maintain the branch, if you're not already familiar
with those things.
Also, it is a really good idea when you post the patch set to include
in the email a clear description of the overall purpose of the patch
set and what each patch does toward that goal. e.g. "The overall goal
of this patch set is to support faster-than-light travel. Currently,
PostgreSQL does not know anything about the speed of light, so 0001
adds some code for speed-of-light detection. Building on this, 0002
adds general support for disabling physical laws of the universe.
Then, 0003 makes use of this support to disable specifically the speed
of light." Perhaps you want a little more text than that for each
patch, depending on the situation, but this gives you the idea, I
hope.
--
Robert Haas
EDB: http://www.enterprisedb.com--
Вложения
Hi hackers, > For a pluggable toaster - in previous patch set part 7 patch file contains invalid string. > Fixup (v2 file should used instead of previous) patch: > 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's > alignment required by bytea toaster by Nikita Glukhov; I finished digesting the thread and the referred presentations per Matthias (cc:'ed) suggestion in [1] discussion. Although the patchset got a fair amount of push-back above, I prefer to stay open minded and invest some of my time into this effort as a tester/reviewer during the July CF. Even if the patchset will not make it entirely to the core, some of its parts can be useful. Unfortunately, I didn't manage to find something that can be applied and tested. cfbot is currently not happy with the patchset. 0001_create_table_storage_v3.patch doesn't apply to the current origin/master manually either: ``` error: patch failed: src/backend/parser/gram.y:2318 error: src/backend/parser/gram.y: patch does not apply ``` Any chance we can see a rebased patchset for the July CF? [1]: https://commitfest.postgresql.org/38/3626/ -- Best regards, Aleksander Alekseev
Hi hackers,
> For a pluggable toaster - in previous patch set part 7 patch file contains invalid string.
> Fixup (v2 file should used instead of previous) patch:
> 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom toast pointer's
> alignment required by bytea toaster by Nikita Glukhov;
I finished digesting the thread and the referred presentations per
Matthias (cc:'ed) suggestion in [1] discussion. Although the patchset
got a fair amount of push-back above, I prefer to stay open minded and
invest some of my time into this effort as a tester/reviewer during
the July CF. Even if the patchset will not make it entirely to the
core, some of its parts can be useful.
Unfortunately, I didn't manage to find something that can be applied
and tested. cfbot is currently not happy with the patchset.
0001_create_table_storage_v3.patch doesn't apply to the current
origin/master manually either:
```
error: patch failed: src/backend/parser/gram.y:2318
error: src/backend/parser/gram.y: patch does not apply
```
Any chance we can see a rebased patchset for the July CF?
[1]: https://commitfest.postgresql.org/38/3626/
--
Best regards,
Aleksander Alekseev
Hi Nikita, > We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF. > Thank you for your patience. Just to clarify, does it include the dependent "CREATE TABLE ( .. STORAGE .. )" patch [1]? I was considering changing the patch according to the feedback it got, but if you are already working on this I'm not going to interfere. [1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru -- Best regards, Aleksander Alekseev
Hi Nikita,
> We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.
> Thank you for your patience.
Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.
[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev
Hi,Alexander, thank you for your feedback and willingness to help. You can send a suggested fixup in this thread, I'll check the issue you've mentioned.Best regards,Nikita MalakhovOn Thu, Jun 23, 2022 at 4:38 PM Aleksander Alekseev <aleksander@timescale.com> wrote:Hi Nikita,
> We're currently working on rebase along other TOAST improvements, hope to do it in time for July CF.
> Thank you for your patience.
Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.
[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev
Вложения
- 0005_bytea_appendable_toaster_v6.patch.gz
- 0003_toaster_default_v6.patch.gz
- 0002_toaster_interface_v7.patch.gz
- 0001_create_table_storage_v4.patch.gz
- 0004_toaster_snapshot_v6.patch.gz
- 0006_toasterapi_docs_v1.patch.gz
- 0008_fix_toast_tuple_externalize_v2.patch.gz
- 0006_toasterapi_docs_v2.patch.gz
- 0009_bytea_contrib_and_varlena_v1.patch.gz
- 0007_fix_alignment_of_custom_toast_pointers_v2.patch.gz
Rebased onto 15 REL BETA 2
Hi Nikita, > Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06). Thanks for the rebased patchset. This is a huge effort though. I suggest splitting it into several CF entries as we previously did with other patches (64-bit XIDs to name one, which BTW is arguably much simpler, but still we had to split it). This will simplify the review, limit the scope of the discussion and simplify passing the CI. Cfbot is already not happy with the patchset. 0001 - is already in a separate thread [1], that's good. I suggest marking it in the patch description for clarity. 0002, 0003 - I suggest focusing on these two in this thread and keep the rest of the changes for later discussion. Please submit 0004, 0005... next time, when we finish with 0001-0003. The order of proposed changes IMO is wrong. 0002 should refactor the default TOASTer in a manner similar to a pluggable one. Nothing should change from the user's perspective. If you do benchmarks, I suggest not to reference the previous talks. I familiarized myself with all the related talks linked before (took me some time...) and found them useless for the discussion since they don't provide exact steps to reproduce. Please provide exact scripts and benchmarks results for 0002 in this thread. 0003 should add an interface that allows replacing the default TOASTer with an alternative one. There is no need for contrib/dummy_toaster similarly as there is no contrib/ for a dummy TableAM. The provided example doesn't do much anyway since all the heavy lifting should be done in the callbacks implementation. For test purposes please use src/test/regress/. [1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru -- Best regards, Aleksander Alekseev
Hi Nikita,
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).
Thanks for the rebased patchset.
This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.
0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.
The order of proposed changes IMO is wrong.
0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.
0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.
[1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi hackers! > Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06). Thanks! > Just to remind: > With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internalsand/or workflow and workload > of data being TOASTed makes Custom Toasters much more efficient in performance and storage. The new toast API doesn't seem to be very well documented, nor are the new features. Could you include a README or extend the comments on how this is expected to work, and/or how you expect people to use (the result of) `get_vtable`? > Patch set consists of 9 incremental patches: > [...] > 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER...) > and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);) This patch 0002 seems to include changes to log files (!) that don't exist in current HEAD, but at the same time are not created by patch 0001. Could you please check and sanitize your patches to ensure that the changes are actually accurate? Like Robert Haas mentioned earlier[0], please create a branch in a git repository that has a commit containing the changes for each patch, and then use git format-patch to generate a single patchset, one that shares a single version number. Keeping track of what patches are needed to test this CF entry is already quite difficult due to the amount of patches and their packaging (I'm having troubles managing these seperate .patch.gz), and the different version tags definitely don't help in finding the correct set of patches to apply once downloaded. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).
Thanks!
> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.
The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?
> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)
This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?
Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.
Kind regards,
Matthias van de Meent
[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;
Hi!We have branch with incremental commits worm where patches were generated with format-patch -I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.Best regards,Nikita MalakhovOn Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).
Thanks!
> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.
The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?
> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)
This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?
Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.
Kind regards,
Matthias van de Meent
[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
Вложения
- 0001_create_table_storage_v5.patch.gz
- 0004_toaster_snapshot_v7.patch.gz
- 0003_toaster_default_v7.patch.gz
- 0002_toaster_interface_v8.patch.gz
- 0005_bytea_appendable_toaster_v7.patch.gz
- 0006_toasterapi_docs_v3.patch.gz
- 0008_fix_toast_tuple_externalize_v3.patch.gz
- 0007_fix_alignment_of_custom_toast_pointers_v3.patch.gz
AS 'MODULE_PATHNAME'
LANGUAGE C;
CREATE TOASTER custom_toaster HANDLER custom_toaster_handler;
select * from pg_toaster;
oid | tsrname | tsrhandler
-------+----------------+-------------------------
9864 | deftoaster | default_toaster_handler
32772 | custom_toaster | custom_toaster_handler
CREATE TABLE tst1 (
c1 text STORAGE plain,
c2 text STORAGE external TOASTER custom_toaster,
id int4
);
ALTER TABLE tst1 ALTER COLUMN c1 SET TOASTER custom_toaster;
=# \d+ tst1
Column | Type | Collation | Nullable | Default | Storage | Toaster |...
--------+---------+-----------+----------+---------+----------+----------------+...
c1 | text | | | | plain | deftoaster |...
c2 | text | | | | external | custom_toaster |...
id | integer | | | | plain | |...
Access method: heap
Hi hackers!According to previous requests the patch branch was cleaned up from garbage, logs, etc. All conflicts' resolutions were mergedinto patch commits where they appear, branch was rebased to present one commit for one patch. The branch was actualized,and a fresh patch set was generated.What we propose in short:We suggest a way to make TOAST pluggable as Storage (in a way like Pluggable Access Methods) - detached TOASTmechanics from Heap AM, and made it an independent pluggable and extensible part with our freshly developed TOAST API.With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internalsand/or workflow and workload of data being TOASTed makes Custom Toasters much more efficient in performance and storage.We keep backwards compatibility and default TOAST mechanics works as it worked previously, working silently with anyToastable datatype(and TOASTed values and tables from previous versions, no changes in this) and set as default Toaster is not stated otherwise,but through our TOAST API.We've already presented out work at HighLoad, PgCon and PgConf conferences, you can find materials hereTesting scripts used in talks are a bit scarce and have a lot of manual handling, so it is another bit of work to bunch them intopatch set, please be patient, I'll try to make it ASAP.We have ready to plug in extension Toasters- bytea appendable toaster for bytea datatype (impressive speedup with bytea append operation) is included in this patch set;- JSONB toaster for JSONB (very cool performance improvements when dealing with TOASTed JSONB) will be provided later;- Prototype Toasters (in development) for PostGIS (much faster then default with geometric data), large binary objects(like pg_largeobject, but much, much larger, and without existing large object limitations), and currently we're checking defaultToaster implementation without using Indexes (direct access by TIDs, up to 3 times faster than default on smaller values,less storage due to absence of index tree).Patch set consists of 8 incremental patches:0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE clause, processing SET STORAGE... correctly;This patch is already discussed in a separate thread;0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support for versioned Toast rows;0005_bytea_appendable_toaster_v7.patch - contrib module bytea_appendable_toaster - special Toaster for bytea datatype with customized append operation;0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast pointer'salignment required by bytea toaster by Nikita Glukhov;0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize functionnot to call toast if old data is the same as new one.The example of usage the TOAST API:CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;This patch set opens the following issues:1) With TOAST independent of AM it is used by it makes sense to move compression from AM into Toaster and make Compression one of Toaster's options.Actually, Toasters allow to use any compression methods independently of AM;2) Implement default Toaster without using Indexes (currently in development)?3) Allows different, SQL-accessed large objects of almost infinite size IN DATABASE, unlike current large_object functionality and does not limit their quantity;4) Several already developed Toasters show impressive results for datatypes they were designed for.On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi!We have branch with incremental commits worm where patches were generated with format-patch -I'll clean up commits from garbage files asap, sorry, haven't noticed them while moving changes.Best regards,Nikita MalakhovOn Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).
Thanks!
> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in performance and storage.
The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?
> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE EXTERNAL TOASTER bytea_toaster);)
This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?
Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.
Kind regards,
Matthias van de Meent
[0] https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
Вложения
>>similarly as there is no contrib/ for a dummy TableAM. The provided
>>example doesn't do much anyway since all the heavy lifting should be
>>done in the callbacks implementation.
Hi Nikita,
> Here is the patch set rebased onto current master (15 rel beta 2 with commit from 29.06).
Thanks for the rebased patchset.
This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.
0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.
The order of proposed changes IMO is wrong.
0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.
0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.
[1]: https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev
Hi Nikita, > I've reworked the patch set according to recommendations of Aleksander Alekseev, Robert Haas > and Matthias van de Meent, and decided, as it was recommended earlier, include only the most > important part in the first set. Also, I've added a large README on Pluggable TOAST to sources, > I'll be grateful for feedback on this README file. Also, some minor fixes were included in patches. Many thanks for accounting for the previous feedback. I will take a look somewhere early next week. > 0001 [...] This patch is presented by Teodor Sigaev and is already discussed and marked as ready for commit in a separate thread; FYI it was merged, see 784cedda [1]. Also, you forgot the attachments :) [1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda -- Best regards, Aleksander Alekseev
On Wed, 20 Jul 2022 at 11:16, Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi hackers! Hi, Please don't top-post here. See https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics. > We really need your feedback on the last patchset update! This is feedback on the latest version that was shared on the mailing list here [0]. Your mail from today didn't seem to have an attachment, and I haven't checked the git repository for changes. 0001: Create Table Storage: LGTM --- 0002: Toaster API interface > tablecmds.c > SetIndexStorageProperties(Relation rel, Relation attrelation, > AttrNumber attnum, > bool setstorage, char newstorage, > bool setcompression, char newcompression, > + bool settoaster, Oid toasterOid, > LOCKMODE lockmode) Indexes cannot index toasted values, so why would the toaster oid be interesting for index storage properties? > static List *MergeAttributes(List *schema, List *supers, char relpersistence, > - bool is_partition, List **supconstr); > + bool is_partition, List **supconstr, > + Oid accessMethodId); > toasterapi.h: > +SearchTsrCache(Oid toasterOid) > ... > + for_each_from(lc, ToasterCache, 0) > + { > + entry = (ToasterCacheEntry*)lfirst(lc); > + > + if (entry->toasterOid == toasterOid) > + { > + /* remove entry from list, it will be added in a head of list below */ > + foreach_delete_current(ToasterCache, lc); > + goto out; > + } > + } Moving toasters seems quite expensive when compared to just index checks. When you have many toasters, but only a few hot ones, this currently will move the "cold" toasters around a lot. Why not use a stack instead (or alternatively, a 'zipper' (or similar) data structure), where the hottest toaster is on top, so that we avoid larger memmove calls? > postgres.h > +/* varatt_custom uses 16bit aligment */ To the best of my knowledge varlena-pointers are unaligned; and this is affirmed by the comment immediately under varattrib_1b_e. Assuming alignment to 16 bits should/would be incorrect in some of the cases. This is handled for normal varatt_external values by memcpy-ing the value into local (aligned) fields before using them, but that doesn't seem to be the case for varatt_custom? --- 0003: (re-implement default toaster using toaster interface) I see significant changes to the dummy toaster (created in 0002), could those be moved to patch 0002 in the next iteration? detoast.c and detoast.h are effectively empty after this patch (only imports and commented-out code remain), please fully remove them instead - that saves on patch diff size. With the new API, I'm getting confused about the location of the various toast_* functions. They are spread around in various files that have no clear distinction on why it is (still) located there: some functions are moved to access/toast/*, while others are moved around in catalog/toasting.c, access/common/toast_internals.c and access/table/toast_helper.c. > detoast.c / tableam.h According to a quick search, all core usage of table_relation_fetch_toast_slice is removed. Shouldn't we remove that tableAM API (+ heapam implementation) instead of updating and maintaining it? Same question for table_relation_toast_am - I'm not sure that it remains the correct way of dealing with toast. > toast_helper.c toast_delete_external_datum: Please clean up code that was commented out from the patches, it detracts from the readability of a patch. > toast_internals.c This seems like a bit of a mess, considering the lack of Can't we split this up into a heaptoast (or whatever we're going to call the default toaster) and actual toast internals? It seems to me that > generic_toaster.c Could you align name styles in this new file? It has both camelCase and snake_case for function names. > toasting.c I'm not entirely sure that we should retain catalog/toasting.c if we are going to depend on the custom toaster API. Shouldn't the creation of toast tables be delegated to the toaster? > + * toast_get_valid_index > + * > + * Get OID of valid index associated to given toast relation. A toast > + * relation can have only one valid index at the same time. Although this is code being moved around, the comment is demonstrably false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can leave a toast relation with 2 valid indexes. --- 0004: refactoring and optimization of default toaster 0005: bytea appendable toaster I dind't review these yet. --- 0006: docs Seems like a good start, but I'm not sure that we need the struct definition in the docs. I think the BRIN extensibility docs [1] are a better example on what I think the documentation for this API should look like. --- 0007: fix alignment of custom toast pointers This is not a valid fix for the alignment requirement for custom toast pointers. You now leave one byte empty if you are not already aligned, which for on-disk toast pointers means that we're dealing with a 4-byte aligned value, which is not the case because this is a 2-byte-aligned value. Regardless, this should be merged into 0002, not remain a seperate patch. --- 0008: fix tuple externalization Should be merged into the relevant patch as well, not as a separate patch. --- > Currently I'm working on an update to the default Toaster (some internal optimizations, not affecting functionality) > and readme files explaining Pluggable TOAST. That would be greatly appreciated - 0006 does not cover why we need vtable, nor how it's expected to be used in type-aware code. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/CAN-LCVNkU%2Bkdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg%40mail.gmail.com [1] https://www.postgresql.org/docs/15/brin-extensibility.html
>interesting for index storage properties?
>checks. When you have many toasters, but only a few hot ones, this
>currently will move the "cold" toasters around a lot. Why not use a
>stack instead (or alternatively, a 'zipper' (or similar) data
>structure), where the hottest toaster is on top, so that we avoid
>larger memmove calls?
>is affirmed by the comment immediately under varattrib_1b_e. Assuming
>alignment to 16 bits should/would be incorrect in some of the cases.
>This is handled for normal varatt_external values by memcpy-ing the
>value into local (aligned) fields before using them, but that doesn't
>seem to be the case for varatt_custom?
>I see significant changes to the dummy toaster (created in 0002),
>could those be moved to patch 0002 in the next iteration?
>imports and commented-out code remain), please fully remove them
>instead - that saves on patch diff size.
>false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
>leave a toast relation with 2 valid indexes.
On Wed, 20 Jul 2022 at 11:16, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
Hi,
Please don't top-post here. See
https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics.
> We really need your feedback on the last patchset update!
This is feedback on the latest version that was shared on the mailing
list here [0]. Your mail from today didn't seem to have an attachment,
and I haven't checked the git repository for changes.
0001: Create Table Storage:
LGTM
---
0002: Toaster API interface
> tablecmds.c
> SetIndexStorageProperties(Relation rel, Relation attrelation,
> AttrNumber attnum,
> bool setstorage, char newstorage,
> bool setcompression, char newcompression,
> + bool settoaster, Oid toasterOid,
> LOCKMODE lockmode)
Indexes cannot index toasted values, so why would the toaster oid be
interesting for index storage properties?
> static List *MergeAttributes(List *schema, List *supers, char relpersistence,
> - bool is_partition, List **supconstr);
> + bool is_partition, List **supconstr,
> + Oid accessMethodId);
> toasterapi.h:
> +SearchTsrCache(Oid toasterOid)
> ...
> + for_each_from(lc, ToasterCache, 0)
> + {
> + entry = (ToasterCacheEntry*)lfirst(lc);
> +
> + if (entry->toasterOid == toasterOid)
> + {
> + /* remove entry from list, it will be added in a head of list below */
> + foreach_delete_current(ToasterCache, lc);
> + goto out;
> + }
> + }
Moving toasters seems quite expensive when compared to just index
checks. When you have many toasters, but only a few hot ones, this
currently will move the "cold" toasters around a lot. Why not use a
stack instead (or alternatively, a 'zipper' (or similar) data
structure), where the hottest toaster is on top, so that we avoid
larger memmove calls?
> postgres.h
> +/* varatt_custom uses 16bit aligment */
To the best of my knowledge varlena-pointers are unaligned; and this
is affirmed by the comment immediately under varattrib_1b_e. Assuming
alignment to 16 bits should/would be incorrect in some of the cases.
This is handled for normal varatt_external values by memcpy-ing the
value into local (aligned) fields before using them, but that doesn't
seem to be the case for varatt_custom?
---
0003: (re-implement default toaster using toaster interface)
I see significant changes to the dummy toaster (created in 0002),
could those be moved to patch 0002 in the next iteration?
detoast.c and detoast.h are effectively empty after this patch (only
imports and commented-out code remain), please fully remove them
instead - that saves on patch diff size.
With the new API, I'm getting confused about the location of the
various toast_* functions. They are spread around in various files
that have no clear distinction on why it is (still) located there:
some functions are moved to access/toast/*, while others are moved
around in catalog/toasting.c, access/common/toast_internals.c and
access/table/toast_helper.c.
> detoast.c / tableam.h
According to a quick search, all core usage of
table_relation_fetch_toast_slice is removed. Shouldn't we remove that
tableAM API (+ heapam implementation) instead of updating and
maintaining it? Same question for table_relation_toast_am - I'm not
sure that it remains the correct way of dealing with toast.
> toast_helper.c
toast_delete_external_datum:
Please clean up code that was commented out from the patches, it
detracts from the readability of a patch.
> toast_internals.c
This seems like a bit of a mess, considering the lack of
Can't we split this up into a heaptoast (or whatever we're going to
call the default toaster) and actual toast internals? It seems to me
that
> generic_toaster.c
Could you align name styles in this new file? It has both camelCase
and snake_case for function names.
> toasting.c
I'm not entirely sure that we should retain catalog/toasting.c if we
are going to depend on the custom toaster API. Shouldn't the creation
of toast tables be delegated to the toaster?
> + * toast_get_valid_index
> + *
> + * Get OID of valid index associated to given toast relation. A toast
> + * relation can have only one valid index at the same time.
Although this is code being moved around, the comment is demonstrably
false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
leave a toast relation with 2 valid indexes.
---
0004: refactoring and optimization of default toaster
0005: bytea appendable toaster
I dind't review these yet.
---
0006: docs
Seems like a good start, but I'm not sure that we need the struct
definition in the docs. I think the BRIN extensibility docs [1] are a
better example on what I think the documentation for this API should
look like.
---
0007: fix alignment of custom toast pointers
This is not a valid fix for the alignment requirement for custom toast
pointers. You now leave one byte empty if you are not already aligned,
which for on-disk toast pointers means that we're dealing with a
4-byte aligned value, which is not the case because this is a
2-byte-aligned value.
Regardless, this should be merged into 0002, not remain a seperate patch.
---
0008: fix tuple externalization
Should be merged into the relevant patch as well, not as a separate patch.
---
> Currently I'm working on an update to the default Toaster (some internal optimizations, not affecting functionality)
> and readme files explaining Pluggable TOAST.
That would be greatly appreciated - 0006 does not cover why we need
vtable, nor how it's expected to be used in type-aware code.
Kind regards,
Matthias van de Meent
[0] https://www.postgresql.org/message-id/CAN-LCVNkU%2Bkdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg%40mail.gmail.com
[1] https://www.postgresql.org/docs/15/brin-extensibility.html
Вложения
Hi hackers!Matthias, thank you very much for your feedback!Sorry, I forgot to attach files.Attaching here, but they are for the commit tagged "15beta2", I am currentlyrebasing this branch onto the actual master and will provide rebased version,with some corrections according to your feedback, in a day or two.>Indexes cannot index toasted values, so why would the toaster oid be
>interesting for index storage properties?Here Teodor might correct me. Toast tables are indexed, and Heap TOASTmechanics accesses Toasted tuples by index, isn't it the case?>Moving toasters seems quite expensive when compared to just index
>checks. When you have many toasters, but only a few hot ones, this
>currently will move the "cold" toasters around a lot. Why not use a
>stack instead (or alternatively, a 'zipper' (or similar) data
>structure), where the hottest toaster is on top, so that we avoid
>larger memmove calls?This is a reasonable remark, we'll consider it for the next iteration. Our reasonis that we think there won't be a lot of custom Toasters, most likely less thena dozen, for the most complex/heavy datatypes so we haven't consideredthese expenses.>To the best of my knowledge varlena-pointers are unaligned; and this
>is affirmed by the comment immediately under varattrib_1b_e. Assuming
>alignment to 16 bits should/would be incorrect in some of the cases.
>This is handled for normal varatt_external values by memcpy-ing the
>value into local (aligned) fields before using them, but that doesn't
>seem to be the case for varatt_custom?Alignment correction seemed reasonable for us because structures areanyway aligned in memory, so when we use 1 and 2-byte fields alongwith 4-byte it may create a lot of padding. Am I wrong? Also, correctalignment somewhat simplifies access to structure fields.>0003: (re-implement default toaster using toaster interface)
>I see significant changes to the dummy toaster (created in 0002),
>could those be moved to patch 0002 in the next iteration?Will do.>detoast.c and detoast.h are effectively empty after this patch (only
>imports and commented-out code remain), please fully remove them
>instead - that saves on patch diff size.Will do.About the location of toast_ functions: these functions are part of HeapTOAST mechanics, and they were scattered among other Heap internalssources. I've tried to gather them and put them in more straight order, butthis work is not fully finished yet and will take some time. Working on it.I'll check if table_relation_fetch_toast_slice could be removed, thanks forthe remark.toast_helper - done, will be provided in rebased version.toast_internals - this one is an internal part of TOAST implemented inHeap AM, but I'll try to straighten it out as much as I could.naming conventions in some sources - done, will be provided in rebasedpatch set.>Shouldn't the creation of toast tables be delegated to the toaster?Yes, you're right, and actually, it is. I'll check that and correct in rebasedversion.>Although this is code being moved around, the comment is demonstrably
>false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
>leave a toast relation with 2 valid indexes.This code is quite old, we've not changed it but thanks for the remark,I'll check it more carefully.Small fixes are already merged into larger patches in attached files. Also,I appreciate your feedback on documentation - if you would have an opportunityplease check README provided in 0003. I've took your comments on documentationinto account and will include corrections according to them into rebased patch.As Aleksander recommended, I've shortened the patch set and left only the mostimportant part - API and re-implemented default Toast. All bells and whistles are notof so much importance and could be sent later after the API itself will be straightenedout and commited.Thank you very much!
Вложения
Hi Nikita, Thanks for an update! > 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would, > as recommended, remove Dummy toaster and provide it as an extension), and default Toaster > was left as-is (reference implementation). > > 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API, > so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality. > > 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added > into README.toastapi new part related TOAST API extensibility - the virtual functions table). This numbering is confusing. Please use a command like: ``` git format-patch origin/master -v 42 ``` This will produce a patchset with a consistent naming like: ``` v42-0001-foo-bar.patch v42-0002-baz-qux.patch ... etc ... ``` Also cfbot [1] will know in which order to apply them. > GIT branch with this patch resides here: > https://github.com/postgrespro/postgres/tree/toasterapi_clean Unfortunately the three patches in question from this branch don't pass `make check`. Please update src/test/regress/expected/publication.out and make sure the patchset passes the rest of the tests at least on one platform before submitting. Personally I have a little set of scripts for this [2]. The following commands should pass: ``` # quick check ./quick-build.sh && ./single-install.sh && make installcheck # full check ./full-build.sh && ./single-install.sh && make installcheck-world ``` Finally, please update the commit messages. Each commit message should include a brief description (one line) , a detailed description (the body), and also the list of the authors, the reviewers and a link to the discussion. Please use [3] as a template. [1]: http://cfbot.cputube.org/ [2]: https://github.com/afiskon/pgscripts/ [3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3 -- Best regards, Aleksander Alekseev
Hi Nikita,
Thanks for an update!
> 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,
> as recommended, remove Dummy toaster and provide it as an extension), and default Toaster
> was left as-is (reference implementation).
>
> 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,
> so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.
>
> 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added
> into README.toastapi new part related TOAST API extensibility - the virtual functions table).
This numbering is confusing. Please use a command like:
```
git format-patch origin/master -v 42
```
This will produce a patchset with a consistent naming like:
```
v42-0001-foo-bar.patch
v42-0002-baz-qux.patch
... etc ...
```
Also cfbot [1] will know in which order to apply them.
> GIT branch with this patch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
Unfortunately the three patches in question from this branch don't
pass `make check`. Please update
src/test/regress/expected/publication.out and make sure the patchset
passes the rest of the tests at least on one platform before
submitting.
Personally I have a little set of scripts for this [2]. The following
commands should pass:
```
# quick check
./quick-build.sh && ./single-install.sh && make installcheck
# full check
./full-build.sh && ./single-install.sh && make installcheck-world
```
Finally, please update the commit messages. Each commit message should
include a brief description (one line) , a detailed description (the
body), and also the list of the authors, the reviewers and a link to
the discussion. Please use [3] as a template.
[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3
--
Best regards,
Aleksander Alekseev
>...
>Also cfbot [1] will know in which order to apply them.
>Unfortunately the three patches in question from this branch don't
>pass `make check`. Please update
>src/test/regress/expected/publication.out and make sure the patchset
>passes the rest of the tests at least on one platform before
>submitting.
>include a brief description (one line) , a detailed description (the
>body), and also the list of the authors, the reviewers and a link to
>the discussion. Please use [3] as a template.
v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values
Hi hackers!Aleksander, thanks for the remark, seems that we've missed recent change - the pubicationtest does not have the new column 'Toaster'. Will send a corrected patch tomorrow. Also, thanksfor the patch name note, I've changed it as you suggested.I'm on vacation, so I read emails not very often and answers take some time, sorry.On Tue, Jul 26, 2022 at 11:23 AM Aleksander Alekseev <aleksander@timescale.com> wrote:Hi Nikita,
Thanks for an update!
> 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an example (but I would,
> as recommended, remove Dummy toaster and provide it as an extension), and default Toaster
> was left as-is (reference implementation).
>
> 0003_toaster_default_v9 implements reference TOAST as Default Toaster via TOAST API,
> so Heap AM calls Toast only via API, and does not have direct calls to Toast functionality.
>
> 0004_toaster_snapshot_v8 continues refactoring and has some important changes (added
> into README.toastapi new part related TOAST API extensibility - the virtual functions table).
This numbering is confusing. Please use a command like:
```
git format-patch origin/master -v 42
```
This will produce a patchset with a consistent naming like:
```
v42-0001-foo-bar.patch
v42-0002-baz-qux.patch
... etc ...
```
Also cfbot [1] will know in which order to apply them.
> GIT branch with this patch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
Unfortunately the three patches in question from this branch don't
pass `make check`. Please update
src/test/regress/expected/publication.out and make sure the patchset
passes the rest of the tests at least on one platform before
submitting.
Personally I have a little set of scripts for this [2]. The following
commands should pass:
```
# quick check
./quick-build.sh && ./single-install.sh && make installcheck
# full check
./full-build.sh && ./single-install.sh && make installcheck-world
```
Finally, please update the commit messages. Each commit message should
include a brief description (one line) , a detailed description (the
body), and also the list of the authors, the reviewers and a link to
the discussion. Please use [3] as a template.
[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3
--
Best regards,
Aleksander Alekseev--
Вложения
Hi, On 2022-08-02 09:15:12 +0300, Nikita Malakhov wrote: > Attach includes: > v11-0002-toaster-interface.patch - contains TOAST API with default Toaster > left as-is (reference implementation) and Dummy toaster as an example > (will be removed later as a part of refactoring?). > > v11-0003-toaster-default.patch - implements reference TOAST as Default > Toaster > via TOAST API, so Heap AM calls Toast only via API, and does not have direct > calls to Toast functionality. > > v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values > and some refactoring. I'm a bit confused by the patch numbering - why isn't there a patch 0001 and 0005? I think 0002 needs to be split further - the relevant part isn't that it introduces the "dummy toaster" module, it's a large change doing lots of things, the addition of the contrib module is irrelevant comparatively. As is the patches unfortunately are close to unreviewable. Lots of code gets moved around in one patch, then again in the next patch, then again in the next. Unfortunately, scanning through these patches, it seems this is a lot of complexity, with a (for me) comensurate benefit. There's a lot of more general improvements to toasting and the json type that we can do, that are a lot less complex than this. > From 6b35d6091248e120d2361cf0a806dbfb161421cf Mon Sep 17 00:00:00 2001 > From: Nikita Malakhov <n.malakhov@postgrespro.ru> > Date: Tue, 12 Apr 2022 18:37:21 +0300 > Subject: [PATCH] Pluggable TOAST API interface with dummy_toaster contrib > module > > Pluggable TOAST API is introduced with implemented contrib example > module. > Pluggable TOAST API consists of 4 parts: > 1) SQL syntax supports manipulations with toasters - CREATE TABLE ... > (column type STORAGE storage_type TOASTER toaster), ALTER TABLE ALTER > COLUMN column SET TOASTER toaster and Toaster definition. > TOAST API requires earlier patch with CREATE TABLE SET STORAGE clause; > New column atttoaster is added to pg_attribute. > Toaster drop is not allowed for not to lose already toasted data; > 2) New VARATT_CUSTOM data structure with fixed header and variable > tail to store custom toasted data, with according macros set; That's adding overhead to every toast interaction, independent of any new infrastructure being used. > 4) Dummy toaster implemented via new TOAST API to be used as sample. > In this patch regular (default) TOAST function is left as-is and not > yet implemented via new API. > TOAST API syntax and code explanation provided in additional docs patch. I'd make this a separate commit. > @@ -445,6 +447,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2) > return false; > if (attr1->attstorage != attr2->attstorage) > return false; > + if (attr1->atttoaster != attr2->atttoaster) > + return false; So we're increasing pg_attribute - often already the largest catalog table in a database. Am I just missing something, or is atttoaster not actually used in this patch? So most of the contrib module added is unreachable code? > +/* > + * Toasters is very often called so syscache lookup and TsrRoutine allocation are > + * expensive and we need to cache them. Ugh. > + * We believe what there are only a few toasters and there is high chance that > + * only one or only two of them are heavy used, so most used toasters should be > + * found as easy as possible. So, let us use a simple list, in future it could > + * be changed to other structure. For now it will be stored in TopCacheContext > + * and never destroed in backend life cycle - toasters are never deleted. > + */ That seems not great. > +typedef struct ToasterCacheEntry > +{ > + Oid toasterOid; > + TsrRoutine *routine; > +} ToasterCacheEntry; > + > +static List *ToasterCache = NIL; > + > +/* > + * SearchTsrCache - get cached toaster routine, emits an error if toaster > + * doesn't exist > + */ > +TsrRoutine* > +SearchTsrCache(Oid toasterOid) > +{ > + ListCell *lc; > + ToasterCacheEntry *entry; > + MemoryContext ctx; > + > + if (list_length(ToasterCache) > 0) > + { > + /* fast path */ > + entry = (ToasterCacheEntry*)linitial(ToasterCache); > + if (entry->toasterOid == toasterOid) > + return entry->routine; > + } > + > + /* didn't find in first position */ > + ctx = MemoryContextSwitchTo(CacheMemoryContext); > + > + for_each_from(lc, ToasterCache, 0) > + { > + entry = (ToasterCacheEntry*)lfirst(lc); > + > + if (entry->toasterOid == toasterOid) > + { > + /* remove entry from list, it will be added in a head of list below */ > + foreach_delete_current(ToasterCache, lc); That needs to move later list elements! > + goto out; > + } > + } > + > + /* did not find entry, make a new one */ > + entry = palloc(sizeof(*entry)); > + > + entry->toasterOid = toasterOid; > + entry->routine = GetTsrRoutineByOid(toasterOid, false); > + > +out: > + ToasterCache = lcons(entry, ToasterCache); That move the whole list around! On a cache hit. Tthis would likely already be slower than syscache. > diff --git a/contrib/dummy_toaster/dummy_toaster.c b/contrib/dummy_toaster/dummy_toaster.c > index 0d261f6042..02f49052b7 100644 > --- a/contrib/dummy_toaster/dummy_toaster.c > +++ b/contrib/dummy_toaster/dummy_toaster.c So this is just changing around the code added in the prior commit. Why was it then included before? > +++ b/src/include/access/generic_toaster.h > +HeapTuple > +heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup, > + int options); The generic toast API has heap_* in its name? > From 4112cd70b05dda39020d576050a98ca3cdcf2860 Mon Sep 17 00:00:00 2001 > From: Nikita Malakhov <n.malakhov@postgrespro.ru> > Date: Tue, 12 Apr 2022 22:57:21 +0300 > Subject: [PATCH] Versioned rows in TOASTed values for Default Toaster support > > Original TOAST mechanics does not support rows versioning > for TOASTed values. > Toaster snapshot - refactored generic toaster, implements > rows versions check in toasted values to share common parts > of toasted values between different versions of rows. This misses explaining *WHY* this is changed. > diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c > deleted file mode 100644 > index aff8042166..0000000000 > --- a/src/backend/access/common/detoast.c > +++ /dev/null These patches really move things around in a largely random way. > -static bool toastrel_valueid_exists(Relation toastrel, Oid valueid); > -static bool toastid_valueid_exists(Oid toastrelid, Oid valueid); > - > +static void > +toast_extract_chunk_fields(Relation toastrel, TupleDesc toasttupDesc, > + Oid valueid, HeapTuple ttup, int32 *seqno, > + char **chunkdata, int *chunksize); > + > +static void > +toast_write_slice(Relation toastrel, Relation *toastidxs, > + int num_indexes, int validIndex, > + Oid valueid, int32 value_size, int32 slice_offset, > + int32 slice_length, char *slice_data, > + int options, > + void *chunk_header, int chunk_header_size, > + ToastChunkVisibilityCheck visibility_check, > + void *visibility_cxt); What do all these changes have to do with "Versioned rows in TOASTed values for Default Toaster support"? > +static void * > +toast_fetch_old_chunk(Relation toastrel, SysScanDesc toastscan, Oid valueid, > + int32 expected_chunk_seq, int32 last_old_chunk_seq, > + ToastChunkVisibilityCheck visibility_check, > + void *visibility_cxt, > + int32 *p_old_chunk_size, ItemPointer old_tid) > +{ > + for (;;) > + { > + HeapTuple old_toasttup; > + char *old_chunk_data; > + int32 old_chunk_seq; > + int32 old_chunk_data_size; > + > + old_toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection); > + > + if (old_toasttup) > + { > + /* Skip aborted chunks */ > + if (!HeapTupleHeaderXminCommitted(old_toasttup->t_data)) > + { > + TransactionId xmin = HeapTupleHeaderGetXmin(old_toasttup->t_data); > + > + Assert(!HeapTupleHeaderXminInvalid(old_toasttup->t_data)); > + > + if (TransactionIdDidAbort(xmin)) > + continue; > + } Why is there visibility logic in quite random places? Also, it's not "legal" to call TransactionIdDidAbort() without having checked TransactionIdIsInProgress() first. And what does this this have to do with snapshots - it's pretty clearly not implementing snapshot logic. Greetings, Andres Freund
>0005?
Sorry for the confusion, my fault. The first patch (CREATE TABLE .. SET STORAGE)
>I think 0002 needs to be split further - the relevant part isn't that it
>introduces the "dummy toaster" module, it's a large change doing lots of
>things, the addition of the contrib module is irrelevant comparatively.
>As is the patches unfortunately are close to unreviewable. Lots of code gets
>moved around in one patch, then again in the next patch, then again in the
>next.
>Unfortunately, scanning through these patches, it seems this is a lot of
>complexity, with a (for me) comensurate benefit. There's a lot of more general
>improvements to toasting and the json type that we can do, that are a lot less
>complex than this.
>> tail to store custom toasted data, with according macros set;
>That's adding overhead to every toast interaction, independent of any new
>infrastructure being used.
>Am I just missing something, or is atttoaster not actually used in this patch?
>So most of the contrib module added is unreachable code?
>slower than syscache.
Hi,
On 2022-08-02 09:15:12 +0300, Nikita Malakhov wrote:
> Attach includes:
> v11-0002-toaster-interface.patch - contains TOAST API with default Toaster
> left as-is (reference implementation) and Dummy toaster as an example
> (will be removed later as a part of refactoring?).
>
> v11-0003-toaster-default.patch - implements reference TOAST as Default
> Toaster
> via TOAST API, so Heap AM calls Toast only via API, and does not have direct
> calls to Toast functionality.
>
> v11-0004-toaster-snapshot.patch - supports row versioning for TOASTed values
> and some refactoring.
I'm a bit confused by the patch numbering - why isn't there a patch 0001 and
0005?
I think 0002 needs to be split further - the relevant part isn't that it
introduces the "dummy toaster" module, it's a large change doing lots of
things, the addition of the contrib module is irrelevant comparatively.
As is the patches unfortunately are close to unreviewable. Lots of code gets
moved around in one patch, then again in the next patch, then again in the
next.
Unfortunately, scanning through these patches, it seems this is a lot of
complexity, with a (for me) comensurate benefit. There's a lot of more general
improvements to toasting and the json type that we can do, that are a lot less
complex than this.
> From 6b35d6091248e120d2361cf0a806dbfb161421cf Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 18:37:21 +0300
> Subject: [PATCH] Pluggable TOAST API interface with dummy_toaster contrib
> module
>
> Pluggable TOAST API is introduced with implemented contrib example
> module.
> Pluggable TOAST API consists of 4 parts:
> 1) SQL syntax supports manipulations with toasters - CREATE TABLE ...
> (column type STORAGE storage_type TOASTER toaster), ALTER TABLE ALTER
> COLUMN column SET TOASTER toaster and Toaster definition.
> TOAST API requires earlier patch with CREATE TABLE SET STORAGE clause;
> New column atttoaster is added to pg_attribute.
> Toaster drop is not allowed for not to lose already toasted data;
> 2) New VARATT_CUSTOM data structure with fixed header and variable
> tail to store custom toasted data, with according macros set;
That's adding overhead to every toast interaction, independent of any new
infrastructure being used.
> 4) Dummy toaster implemented via new TOAST API to be used as sample.
> In this patch regular (default) TOAST function is left as-is and not
> yet implemented via new API.
> TOAST API syntax and code explanation provided in additional docs patch.
I'd make this a separate commit.
> @@ -445,6 +447,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
> return false;
> if (attr1->attstorage != attr2->attstorage)
> return false;
> + if (attr1->atttoaster != attr2->atttoaster)
> + return false;
So we're increasing pg_attribute - often already the largest catalog table in
a database.
Am I just missing something, or is atttoaster not actually used in this patch?
So most of the contrib module added is unreachable code?
> +/*
> + * Toasters is very often called so syscache lookup and TsrRoutine allocation are
> + * expensive and we need to cache them.
Ugh.
> + * We believe what there are only a few toasters and there is high chance that
> + * only one or only two of them are heavy used, so most used toasters should be
> + * found as easy as possible. So, let us use a simple list, in future it could
> + * be changed to other structure. For now it will be stored in TopCacheContext
> + * and never destroed in backend life cycle - toasters are never deleted.
> + */
That seems not great.
> +typedef struct ToasterCacheEntry
> +{
> + Oid toasterOid;
> + TsrRoutine *routine;
> +} ToasterCacheEntry;
> +
> +static List *ToasterCache = NIL;
> +
> +/*
> + * SearchTsrCache - get cached toaster routine, emits an error if toaster
> + * doesn't exist
> + */
> +TsrRoutine*
> +SearchTsrCache(Oid toasterOid)
> +{
> + ListCell *lc;
> + ToasterCacheEntry *entry;
> + MemoryContext ctx;
> +
> + if (list_length(ToasterCache) > 0)
> + {
> + /* fast path */
> + entry = (ToasterCacheEntry*)linitial(ToasterCache);
> + if (entry->toasterOid == toasterOid)
> + return entry->routine;
> + }
> +
> + /* didn't find in first position */
> + ctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
> + for_each_from(lc, ToasterCache, 0)
> + {
> + entry = (ToasterCacheEntry*)lfirst(lc);
> +
> + if (entry->toasterOid == toasterOid)
> + {
> + /* remove entry from list, it will be added in a head of list below */
> + foreach_delete_current(ToasterCache, lc);
That needs to move later list elements!
> + goto out;
> + }
> + }
> +
> + /* did not find entry, make a new one */
> + entry = palloc(sizeof(*entry));
> +
> + entry->toasterOid = toasterOid;
> + entry->routine = GetTsrRoutineByOid(toasterOid, false);
> +
> +out:
> + ToasterCache = lcons(entry, ToasterCache);
That move the whole list around! On a cache hit. Tthis would likely already be
slower than syscache.
> diff --git a/contrib/dummy_toaster/dummy_toaster.c b/contrib/dummy_toaster/dummy_toaster.c
> index 0d261f6042..02f49052b7 100644
> --- a/contrib/dummy_toaster/dummy_toaster.c
> +++ b/contrib/dummy_toaster/dummy_toaster.c
So this is just changing around the code added in the prior commit. Why was it
then included before?
> +++ b/src/include/access/generic_toaster.h
> +HeapTuple
> +heap_toast_insert_or_update(Relation rel, HeapTuple newtup, HeapTuple oldtup,
> + int options);
The generic toast API has heap_* in its name?
> From 4112cd70b05dda39020d576050a98ca3cdcf2860 Mon Sep 17 00:00:00 2001
> From: Nikita Malakhov <n.malakhov@postgrespro.ru>
> Date: Tue, 12 Apr 2022 22:57:21 +0300
> Subject: [PATCH] Versioned rows in TOASTed values for Default Toaster support
>
> Original TOAST mechanics does not support rows versioning
> for TOASTed values.
> Toaster snapshot - refactored generic toaster, implements
> rows versions check in toasted values to share common parts
> of toasted values between different versions of rows.
This misses explaining *WHY* this is changed.
> diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
> deleted file mode 100644
> index aff8042166..0000000000
> --- a/src/backend/access/common/detoast.c
> +++ /dev/null
These patches really move things around in a largely random way.
> -static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
> -static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
> -
> +static void
> +toast_extract_chunk_fields(Relation toastrel, TupleDesc toasttupDesc,
> + Oid valueid, HeapTuple ttup, int32 *seqno,
> + char **chunkdata, int *chunksize);
> +
> +static void
> +toast_write_slice(Relation toastrel, Relation *toastidxs,
> + int num_indexes, int validIndex,
> + Oid valueid, int32 value_size, int32 slice_offset,
> + int32 slice_length, char *slice_data,
> + int options,
> + void *chunk_header, int chunk_header_size,
> + ToastChunkVisibilityCheck visibility_check,
> + void *visibility_cxt);
What do all these changes have to do with "Versioned rows in TOASTed
values for Default Toaster support"?
> +static void *
> +toast_fetch_old_chunk(Relation toastrel, SysScanDesc toastscan, Oid valueid,
> + int32 expected_chunk_seq, int32 last_old_chunk_seq,
> + ToastChunkVisibilityCheck visibility_check,
> + void *visibility_cxt,
> + int32 *p_old_chunk_size, ItemPointer old_tid)
> +{
> + for (;;)
> + {
> + HeapTuple old_toasttup;
> + char *old_chunk_data;
> + int32 old_chunk_seq;
> + int32 old_chunk_data_size;
> +
> + old_toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection);
> +
> + if (old_toasttup)
> + {
> + /* Skip aborted chunks */
> + if (!HeapTupleHeaderXminCommitted(old_toasttup->t_data))
> + {
> + TransactionId xmin = HeapTupleHeaderGetXmin(old_toasttup->t_data);
> +
> + Assert(!HeapTupleHeaderXminInvalid(old_toasttup->t_data));
> +
> + if (TransactionIdDidAbort(xmin))
> + continue;
> + }
Why is there visibility logic in quite random places? Also, it's not "legal"
to call TransactionIdDidAbort() without having checked
TransactionIdIsInProgress() first. And what does this this have to do with
snapshots - it's pretty clearly not implementing snapshot logic.
Greetings,
Andres Freund
Вложения
Hi Nikita, > I've decided to leave only the first 2 patches for review and send less significant > changes after the main patch will be straightened out. > So, here is > v13-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST > mechanics left as-is. > v13-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API. > [...] Great! Thank you. Unfortunately the patchset still seems to have difficulties passing the CI checks (see http://cfbot.cputube.org/ ). Any chance we may see a version rebased to the current `master` branch for the September CF? -- Best regards, Aleksander Alekseev
Hi Nikita,
> I've decided to leave only the first 2 patches for review and send less significant
> changes after the main patch will be straightened out.
> So, here is
> v13-0001-toaster-interface.patch - main TOAST API patch, with reference TOAST
> mechanics left as-is.
> v13-0002-toaster-default.patch - reference TOAST re-implemented via TOAST API.
> [...]
Great! Thank you.
Unfortunately the patchset still seems to have difficulties passing
the CI checks (see http://cfbot.cputube.org/ ). Any chance we may see
a version rebased to the current `master` branch for the September CF?
--
Best regards,
Aleksander Alekseev
Вложения
On Wed, Aug 24, 2022 at 2:59 AM Nikita Malakhov <hukutoc@gmail.com> wrote: > I've rebased actual branch onto the latest master and re-created patches. Checked with git am, > all applied correctly. Please check the attached patches. > Rebased branch resides here: > https://github.com/postgrespro/postgres/tree/toasterapi_clean I tried installing and using the dummy_toaster that's provided with the gitlink. Upgrade of that cluster fails with the following message: pg_restore: creating TOASTER "dummy_toaster" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 2044; 9861 16390 TOASTER dummy_toaster (no owner) pg_restore: error: could not execute query: ERROR: unrecognized or unsupported class OID: 9861 Command was: CREATE TOASTER "dummy_toaster" HANDLER "public"."dummy_toaster_handler"; I was looking through the thread for a more in-depth description of the "vtable" concept, but I didn't see one. It looks like it's just an arbitrary extension point, and any new additions would require surgery on whatever function needs the particular magic provided by the toaster. E.g. your bytea-append toaster extension in the gitlink, which still has to modify byteacat() in varlena.c to implement a very specific optimization, and then declares its support for that hardcoded optimization in the extension. I'm skeptical that this would remain coherent as it grows. The patch claims the vtable API is "powerful", which... I suppose it is, if you get to make arbitrary modifications to the core whenever you implement it. Did you already have thoughts about which operations would belong under that umbrella? What would the procedure be for adding functionality to that API? What happens if a toaster wants to implement two magic performance optimizations instead of one? --Jacob
On Wed, Aug 24, 2022 at 2:59 AM Nikita Malakhov <hukutoc@gmail.com> wrote:
> I've rebased actual branch onto the latest master and re-created patches. Checked with git am,
> all applied correctly. Please check the attached patches.
> Rebased branch resides here:
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
I tried installing and using the dummy_toaster that's provided with
the gitlink. Upgrade of that cluster fails with the following message:
pg_restore: creating TOASTER "dummy_toaster"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2044; 9861 16390 TOASTER dummy_toaster (no owner)
pg_restore: error: could not execute query: ERROR: unrecognized
or unsupported class OID: 9861
Command was: CREATE TOASTER "dummy_toaster" HANDLER
"public"."dummy_toaster_handler";
I was looking through the thread for a more in-depth description of
the "vtable" concept, but I didn't see one. It looks like it's just an
arbitrary extension point, and any new additions would require surgery
on whatever function needs the particular magic provided by the
toaster. E.g. your bytea-append toaster extension in the gitlink,
which still has to modify byteacat() in varlena.c to implement a very
specific optimization, and then declares its support for that
hardcoded optimization in the extension.
I'm skeptical that this would remain coherent as it grows. The patch
claims the vtable API is "powerful", which... I suppose it is, if you
get to make arbitrary modifications to the core whenever you implement
it. Did you already have thoughts about which operations would belong
under that umbrella? What would the procedure be for adding
functionality to that API? What happens if a toaster wants to
implement two magic performance optimizations instead of one?
--Jacob
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > It would be more clear for complex data types like JSONB, where developers could > need some additional functionality to work with internal representation of data type, > and its full potential is revealed in our JSONB toaster extension. The JSONB toaster > is still in development but we plan to make it available soon. Okay. It'll be good to have that, because as it is now it's hard to see the whole picture. > On installing dummy_toaster contrib: I've just checked it by making a patch from commit > and applying onto my clone of master and 2 patches provided in previous email without > any errors and sll checks passed - applying with git am, configure with debug, cassert, > depend and enable-tap-tests flags and run checks. > Please advice what would cause such a behavior? I don't think the default pg_upgrade tests will upgrade contrib objects (there are instructions in src/bin/pg_upgrade/TESTING that cover manual dumps, if you prefer that method). My manual steps were roughly =# CREATE EXTENSION dummy_toaster; =# CREATE TABLE test (t TEXT STORAGE external TOASTER dummy_toaster_handler); =# \q $ initdb -D newdb $ pg_ctl -D olddb stop $ pg_upgrade -b <install path>/bin -B <install path>/bin -d ./olddb -D ./newdb (where <install path>/bin is on the PATH, so we're using the right binaries). Thanks, --Jacob
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation package
On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob
Вложения
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob--
Вложения
Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob----
Вложения
Hi hackers!Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,sorry for the noise.On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob------
Вложения
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Hi,Meson build for the patchset failed, meson build files attached and README/Doc packagereworked with more detailed explanation of virtual function table along with other corrections.On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,sorry for the noise.On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob--------
Вложения
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Hi hackers!Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.Patchset consists of:v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi,Meson build for the patchset failed, meson build files attached and README/Doc packagereworked with more detailed explanation of virtual function table along with other corrections.On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,sorry for the noise.On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob----------
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Hi hackers!Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.Patchset consists of:v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Oct 4, 2022 at 1:02 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.Patchset consists of:v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi,Meson build for the patchset failed, meson build files attached and README/Doc packagereworked with more detailed explanation of virtual function table along with other corrections.On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,sorry for the noise.On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob------------
Вложения
v21-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Hi hackers!Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.Sorry, forgot to attach patch files. My fault.Patchset consists of:v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Oct 4, 2022 at 1:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Now cfbot is happy, but there were warnings due to recent changes in PointerGetDatum function, so here's corrected patchset.Patchset consists of:v20-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v20-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v20-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Oct 4, 2022 at 1:02 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot failed in meson build with previous patchsets, so I've rebased them onto the latest master and added necessary meson build info.Patchset consists of:v19-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics - new API is introduced butreference TOAST is still unchanged;v19-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API - reference TOAST is re-implemented via new API;
v19-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Tue, Sep 27, 2022 at 12:26 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi,Meson build for the patchset failed, meson build files attached and README/Doc packagereworked with more detailed explanation of virtual function table along with other corrections.On Sun, Sep 25, 2022 at 1:41 AM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Last patchset has an invalid patch file - v16-0003-toaster-docs.patch. Here's corrected patchset,sorry for the noise.On Sat, Sep 24, 2022 at 3:50 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is still not happy with the patchset, so I'm attaching a rebased one, rebased onto the currentmaster (from today). The third patch contains documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Again, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v16-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v16-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v16-0003-toaster-docs.patch - Pluggable TOAST API documentation packageActual GitHub branch resides atOn Fri, Sep 23, 2022 at 10:54 PM Nikita Malakhov <hukutoc@gmail.com> wrote:Hi hackers!Cfbot is not happy with previous patchset, so I'm attaching new one, rebased onto current master(15b4). Also providing patch with documentation package, and the second one contains largeREADME.toastapi file providing additional in-depth docs for developers.Comments would be greatly appreciated.Also, after checking patch sources I have a strong opinion that it needs some refactoring -move all files related to TOAST implementation into new folder /backend/access/toast whereGeneric (default) Toaster resides.Patchset consists of:v15-0001-toaster-interface.patch - Pluggable TOAST API interface along with reference TOAST mechanics;v15-0002-toaster-default.patch - Default TOAST re-implemented using Toaster API;
v15-0003-toaster-docs.patch - Pluggable TOAST API documentation packageOn Tue, Sep 13, 2022 at 7:50 PM Jacob Champion <jchampion@timescale.com> wrote:On Mon, Sep 12, 2022 at 11:45 PM Nikita Malakhov <hukutoc@gmail.com> wrote:
> It would be more clear for complex data types like JSONB, where developers could
> need some additional functionality to work with internal representation of data type,
> and its full potential is revealed in our JSONB toaster extension. The JSONB toaster
> is still in development but we plan to make it available soon.
Okay. It'll be good to have that, because as it is now it's hard to
see the whole picture.
> On installing dummy_toaster contrib: I've just checked it by making a patch from commit
> and applying onto my clone of master and 2 patches provided in previous email without
> any errors and sll checks passed - applying with git am, configure with debug, cassert,
> depend and enable-tap-tests flags and run checks.
> Please advice what would cause such a behavior?
I don't think the default pg_upgrade tests will upgrade contrib
objects (there are instructions in src/bin/pg_upgrade/TESTING that
cover manual dumps, if you prefer that method). My manual steps were
roughly
=# CREATE EXTENSION dummy_toaster;
=# CREATE TABLE test (t TEXT
STORAGE external
TOASTER dummy_toaster_handler);
=# \q
$ initdb -D newdb
$ pg_ctl -D olddb stop
$ pg_upgrade -b <install path>/bin -B <install path>/bin -d
./olddb -D ./newdb
(where <install path>/bin is on the PATH, so we're using the right binaries).
Thanks,
--Jacob--------------
Вложения
v22-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Вложения
v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Вложения
Hi Nikita, > Here's rebased patch: > v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API isintroduced but > reference TOAST is still left unchanged; > v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as DefaultToaster; > v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package Thanks for keeping the patch up to date. As I recall one of the open questions was: how this feature is supposed to work with table access methods? Could you please summarize what the current consensus is in this respect? -- Best regards, Aleksander Alekseev
Hi Nikita,
> Here's rebased patch:
> v23-0001-toaster-interface.patch - Pluggable TOAST API interface with reference (original) TOAST mechanics - new API is introduced but
> reference TOAST is still left unchanged;
> v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented using TOAST API and is plugged in as Default Toaster;
> v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package
Thanks for keeping the patch up to date.
As I recall one of the open questions was: how this feature is
supposed to work with table access methods? Could you please summarize
what the current consensus is in this respect?
--
Best regards,
Aleksander Alekseev
Hi Nikita, > Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification isneeded > to make it work with Pluggable Storage (Access Methods) API. Could you please clarify this a little from the architectural point of view? Let's say company A implements some specific TableAM (in-memory / the one that uses undo logging / etc). Company B implements an alternative TOAST mechanism. How the TOAST extension is going to work without knowing any specifics of the TableAM the user chooses for the given relation, and vice versa? How one of the extensions is going to upgrade / downgrade between the versions without knowing any implementation details of another? -- Best regards, Aleksander Alekseev
Hi Nikita,
> Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification is needed
> to make it work with Pluggable Storage (Access Methods) API.
Could you please clarify this a little from the architectural point of view?
Let's say company A implements some specific TableAM (in-memory / the
one that uses undo logging / etc). Company B implements an alternative
TOAST mechanism.
How the TOAST extension is going to work without knowing any specifics
of the TableAM the user chooses for the given relation, and vice
versa? How one of the extensions is going to upgrade / downgrade
between the versions without knowing any implementation details of
another?
--
Best regards,
Aleksander Alekseev
Hi Nikita, > Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is > free to use any storage methods Don't you think that this is an arguable design decision? Basically all we know about the underlying TableAM is that it stores tuples _somehow_ and that tuples have TIDs [1]. That's it. We don't know if it even has any sort of pages, whether they are fixed in size or not, whether it uses shared buffers, etc. It may not even require TOAST. (Not to mention the fact that when you have N TOAST implementations and M TableAM implementations now you have to run N x M compatibility tests. And this doesn't account for different versions of Ns and Ms, different platforms and different versions of PostgreSQL.) I believe the proposed approach is architecturally broken from the beginning. It looks like the idea should be actually turned inside out. I.e. what would be nice to have is some sort of _framework_ that helps TableAM authors to implement TOAST (alternatively, the rest of the TableAM except for TOAST) if the TableAM is similar to the default one. In other words the idea is not to implement alternative TOASTers that will work with all possible TableAMs but rather to simplify the task of implementing an alternative TableAM which is similar to the default one except for TOAST. These TableAMs should reuse as much common code as possible except for the parts where they differ. Does it make sense? Sorry, I realize this will probably imply a complete rewrite of the patch. This is the reason why one should start proposing changes from gathering the requirements, writing an RFC and run it through several rounds of discussion. [1]: https://www.postgresql.org/docs/current/tableam.html -- Best regards, Aleksander Alekseev
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)
>I believe the proposed approach is architecturally broken from the beginning.
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.
Hi Nikita,
> Pluggable TOAST API was designed with storage flexibility in mind, and Custom TOAST mechanics is
> free to use any storage methods
Don't you think that this is an arguable design decision? Basically
all we know about the underlying TableAM is that it stores tuples
_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
it even has any sort of pages, whether they are fixed in size or not,
whether it uses shared buffers, etc. It may not even require TOAST.
(Not to mention the fact that when you have N TOAST implementations
and M TableAM implementations now you have to run N x M compatibility
tests. And this doesn't account for different versions of Ns and Ms,
different platforms and different versions of PostgreSQL.)
I believe the proposed approach is architecturally broken from the beginning.
It looks like the idea should be actually turned inside out. I.e. what
would be nice to have is some sort of _framework_ that helps TableAM
authors to implement TOAST (alternatively, the rest of the TableAM
except for TOAST) if the TableAM is similar to the default one. In
other words the idea is not to implement alternative TOASTers that
will work with all possible TableAMs but rather to simplify the task
of implementing an alternative TableAM which is similar to the default
one except for TOAST. These TableAMs should reuse as much common code
as possible except for the parts where they differ.
Does it make sense?
Sorry, I realize this will probably imply a complete rewrite of the
patch. This is the reason why one should start proposing changes from
gathering the requirements, writing an RFC and run it through several
rounds of discussion.
[1]: https://www.postgresql.org/docs/current/tableam.html
--
Best regards,
Aleksander Alekseev
Hi Nikita, I don't argue with most of what you say. I am just pointing out the reason why the chosen approach "N TOASTers x M TableAMs" will not work: > Don't you think that this is an arguable design decision? Basically > all we know about the underlying TableAM is that it stores tuples > _somehow_ and that tuples have TIDs [1]. That's it. We don't know if > it even has any sort of pages, whether they are fixed in size or not, > whether it uses shared buffers, etc. It may not even require TOAST. > [...] Also I completely agree with: > Implementing another Table AM just to implement another TOAST strategy seems too > much, the TAM API is very heavy and complex, and you would have to add it as a contrib. This is what I meant above when talking about the framework for simplifying this task: > It looks like the idea should be actually turned inside out. I.e. what > would be nice to have is some sort of _framework_ that helps TableAM > authors to implement TOAST (alternatively, the rest of the TableAM > except for TOAST) if the TableAM is similar to the default one. From the user perspective it's much easier to think about one entity - TableAM, and choosing from heapam_with_default_toast and heapam_with_different_toast. From the extension implementer point of view creating TableAMs is a difficult task. This is what the framework should address. Ideally the interface should be as simple as: CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other arguments, in the future...) Where the extension author should be worried only about an alternative TOAST implementation. I think at some point such a framework may address at least one more issue we have - an inability to change the page size on the table level. As it was shown by Tomas Vondra [1] the default 8 KB page size can be suboptimal depending on the load. So it would be nice if the user could change it without rebuilding PostgreSQL. Naturally this is out of scope of this particular patchset. I just wanted to point out opportunities we have here. [1]: https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com -- Best regards, Aleksander Alekseev
>reason why the chosen approach "N TOASTers x M TableAMs" will not
>work:
>simplifying this task:
Hi Nikita,
I don't argue with most of what you say. I am just pointing out the
reason why the chosen approach "N TOASTers x M TableAMs" will not
work:
> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> [...]
Also I completely agree with:
> Implementing another Table AM just to implement another TOAST strategy seems too
> much, the TAM API is very heavy and complex, and you would have to add it as a contrib.
This is what I meant above when talking about the framework for
simplifying this task:
> It looks like the idea should be actually turned inside out. I.e. what
> would be nice to have is some sort of _framework_ that helps TableAM
> authors to implement TOAST (alternatively, the rest of the TableAM
> except for TOAST) if the TableAM is similar to the default one.
From the user perspective it's much easier to think about one entity -
TableAM, and choosing from heapam_with_default_toast and
heapam_with_different_toast.
From the extension implementer point of view creating TableAMs is a
difficult task. This is what the framework should address. Ideally the
interface should be as simple as:
CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
arguments, in the future...)
Where the extension author should be worried only about an alternative
TOAST implementation.
I think at some point such a framework may address at least one more
issue we have - an inability to change the page size on the table
level. As it was shown by Tomas Vondra [1] the default 8 KB page size
can be suboptimal depending on the load. So it would be nice if the
user could change it without rebuilding PostgreSQL. Naturally this is
out of scope of this particular patchset. I just wanted to point out
opportunities we have here.
[1]: https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com
--
Best regards,
Aleksander Alekseev
Hi Nikita, > >I don't argue with most of what you say. I am just pointing out the > >reason why the chosen approach "N TOASTers x M TableAMs" will not > >work: > > We assume that TAM used in custom Toaster works as it is should work, > and leave TAM internals to this TAM developer - say, we do not want to > change internals of Heap AM. > > We don't want to create some kind of silver bullet. This is exactly the point. In order to not to create a silver bullet, TOASTers should be limited to a single TableAM. The one we know uses pages of a known fixed size, the one that actually requires TOAST because pages are relatively small, etc. > That's what the TOAST API is - just an interface that all custom > TOAST implementations use to have a common entry point from any TAM, > [...] I believe this is the source of misunderstanding. Note that not _any_ TableAM needs TOAST to begin with. As an example, if one chooses to implement a column-organized TableAM that stores all text/bytea attributes in a separate dictionary file this person doesn't need TOAST and doesn't want to be constrained by the need of choosing one. For this reason the "N TOASTers x M TableAMs" approach is architecturally broken. > keep in mind that different kinds of data require very > different approach to external storage - say, JSON TOAST works with > maps of keys and values, [...] To clarify: is an ability to specify TOASTers for given columns and/or types also part of the plan? > Have I answered your question? Please don't hesitate to point to any unclear > parts, I'd be glad to explain that. No. To be honest, it looks like you are merely discarding most/any feedback the community provided so far. I really think that pluggable TOASTers would be a great feature. However if the goal is to get it into the core I doubt that we are going to make much progress with the current approach. -- Best regards, Aleksander Alekseev
>TOASTers should be limited to a single TableAM. The one we know uses
>pages of a known fixed size, the one that actually requires TOAST
>because pages are relatively small, etc.
>TableAM needs TOAST to begin with. As an example, if one chooses to
>implement a column-organized TableAM that stores all text/bytea
>attributes in a separate dictionary file this person doesn't need
>TOAST and doesn't want to be constrained by the need of choosing one.
>For this reason the "N TOASTers x M TableAMs" approach is
>architecturally broken.
>types also part of the plan?
>feedback the community provided so far.
Very sorry to read that. Almost all of the requests in this discussion have been taken
>I really think that pluggable TOASTers would be a great feature.
>However if the goal is to get it into the core I doubt that we are
>going to make much progress with the current approach.
Hi Nikita,
> >I don't argue with most of what you say. I am just pointing out the
> >reason why the chosen approach "N TOASTers x M TableAMs" will not
> >work:
>
> We assume that TAM used in custom Toaster works as it is should work,
> and leave TAM internals to this TAM developer - say, we do not want to
> change internals of Heap AM.
>
> We don't want to create some kind of silver bullet.
This is exactly the point. In order to not to create a silver bullet,
TOASTers should be limited to a single TableAM. The one we know uses
pages of a known fixed size, the one that actually requires TOAST
because pages are relatively small, etc.
> That's what the TOAST API is - just an interface that all custom
> TOAST implementations use to have a common entry point from any TAM,
> [...]
I believe this is the source of misunderstanding. Note that not _any_
TableAM needs TOAST to begin with. As an example, if one chooses to
implement a column-organized TableAM that stores all text/bytea
attributes in a separate dictionary file this person doesn't need
TOAST and doesn't want to be constrained by the need of choosing one.
For this reason the "N TOASTers x M TableAMs" approach is
architecturally broken.
> keep in mind that different kinds of data require very
> different approach to external storage - say, JSON TOAST works with
> maps of keys and values, [...]
To clarify: is an ability to specify TOASTers for given columns and/or
types also part of the plan?
> Have I answered your question? Please don't hesitate to point to any unclear
> parts, I'd be glad to explain that.
No. To be honest, it looks like you are merely discarding most/any
feedback the community provided so far.
I really think that pluggable TOASTers would be a great feature.
However if the goal is to get it into the core I doubt that we are
going to make much progress with the current approach.
--
Best regards,
Aleksander Alekseev
Hi Nikita, > Using Table AM Routine and routing AM methods calls via it is a topic for further discussion, > if Pluggable TOAST will be committed. [...] And even then it would be an open issue. From personal experience with the project I have serious doubts this is going to happen. Before such invasive changes are going to be accepted there should be a clear understanding of how exactly TOASTers are supposed to be used. This should be part of the documentation in the patchset. Additionally there should be an open-soruce or source-available extension that actually demonstrates the benefits of TOASTers with reproducible benchmarks (we didn't even get to that part yet). > TOAST implementation is not necessary for Table AM. What other use cases for TOAST do you have in mind? >> > Have I answered your question? Please don't hesitate to point to any unclear >> > parts, I'd be glad to explain that. >> >> No. To be honest, it looks like you are merely discarding most/any >> feedback the community provided so far. >> >> I really think that pluggable TOASTers would be a great feature. >> However if the goal is to get it into the core I doubt that we are >> going to make much progress with the current approach. To clarify, the concern about "N TOASTers vs M TableAM" was expressed by Robert Haas back in Jan 2022: > I agree ... but I'm also worried about what happens when we have > multiple table AMs. One can imagine a new table AM that is > specifically optimized for TOAST which can be used with an existing > heap table. One can imagine a new table AM for the main table that > wants to use something different for TOAST. So, I don't think it's > right to imagine that the choice of TOASTer depends solely on the > column data type. I'm not really sure how this should work exactly ... > but it needs careful thought. This is the most important open question so far to my knowledge. It was never addressed, it doesn't seem like there is a plan of doing so, the suggested alternative approach was ignored, nor are there any strong arguments that would defend this design choice and/or criticize the alternative one (other than general words "don't worry we know what we are doing"). This what I mean by the community feedback being discarded. -- Best regards, Aleksander Alekseev
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).
>by Robert Haas back in Jan 2022:
>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.
>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").
>This what I mean by the community feedback being discarded.
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.
Hi Nikita,
> Using Table AM Routine and routing AM methods calls via it is a topic for further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open issue.
From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).
> TOAST implementation is not necessary for Table AM.
What other use cases for TOAST do you have in mind?
>> > Have I answered your question? Please don't hesitate to point to any unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.
To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:
> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.
This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").
This what I mean by the community feedback being discarded.
--
Best regards,
Aleksander Alekseev
Hi Nikita, > > > TOAST implementation is not necessary for Table AM. > > >What other use cases for TOAST do you have in mind? > > The main use case is the same as for the TOAST mechanism - storing and retrieving > oversized data. But we expanded this case with some details - > - update TOASTed data (yes, current TOAST implementation cannot update stored > data - is marks whole TOASTED object as dead and stores new one); > - retrieve part of the stored data chunks without fully de-TOASTing stored data (even > with existing TOAST this will be painful if you have to get just a small part of the several > hundreds Mb sized object); > - be able to store objects of size larger than 1 Gb; > - store more than 4 Tb of TOASTed data for one table; > - optimize storage for fast search and retrieval of parts of TOASTed object - this is > must-have for effectively using JSON, PostgreSQL already is in catching-up position > in JSON performance field. I see. Actually most of this is what TableAM does. We just happened to give this part of TableAM a separate name. The only exception is the last case, when you create custom TOASTers for particular types and potentially specify them for the given column. All in all, this makes sense. > Right. that's why we propose a validate method (may be, it's a wrong > name, but I don't known better one) which accepts several arguments, one > of which is table AM oid. If that method returns false then toaster > isn't useful with current TAM, storage or/and compression kinds, etc. OK, I missed this message. So there was some misunderstanding after all, sorry for this. That's exactly what I wanted to know. It's much better than allowing any TOASTer to run on top of any TableAM. -- Best regards, Aleksander Alekseev
Hi Nikita,
> > > TOAST implementation is not necessary for Table AM.
>
> >What other use cases for TOAST do you have in mind?
>
> The main use case is the same as for the TOAST mechanism - storing and retrieving
> oversized data. But we expanded this case with some details -
> - update TOASTed data (yes, current TOAST implementation cannot update stored
> data - is marks whole TOASTED object as dead and stores new one);
> - retrieve part of the stored data chunks without fully de-TOASTing stored data (even
> with existing TOAST this will be painful if you have to get just a small part of the several
> hundreds Mb sized object);
> - be able to store objects of size larger than 1 Gb;
> - store more than 4 Tb of TOASTed data for one table;
> - optimize storage for fast search and retrieval of parts of TOASTed object - this is
> must-have for effectively using JSON, PostgreSQL already is in catching-up position
> in JSON performance field.
I see. Actually most of this is what TableAM does. We just happened to
give this part of TableAM a separate name. The only exception is the
last case, when you create custom TOASTers for particular types and
potentially specify them for the given column.
All in all, this makes sense.
> Right. that's why we propose a validate method (may be, it's a wrong
> name, but I don't known better one) which accepts several arguments, one
> of which is table AM oid. If that method returns false then toaster
> isn't useful with current TAM, storage or/and compression kinds, etc.
OK, I missed this message. So there was some misunderstanding after
all, sorry for this.
That's exactly what I wanted to know. It's much better than allowing
any TOASTer to run on top of any TableAM.
--
Best regards,
Aleksander Alekseev
Hi Nikita, > Aleksander, thanks for the discussion! It seems to me that I have to add some parts > of it to API documentation, to clarify the details on API purpose and use-cases. I'm in the process of testing and reviewing the v23 patchset. So far I just wanted to report that there seems to be certain issues with the formatting and/or trailing whitespaces in the patches: ``` $ git am <v23-0001-toaster-interface.patch Applying: Pluggable TOAST API interface along with reference TOAST mechanics .git/rebase-apply/patch:1122: indent with spaces. TsrRoutine *tsr = makeNode(TsrRoutine); .git/rebase-apply/patch:1123: indent with spaces. PG_RETURN_POINTER(tsr); .git/rebase-apply/patch:6172: new blank line at EOF. + warning: 3 lines add whitespace errors. $ git am <v23-0002-toaster-default.patch Applying: Default TOAST re-implemented using Toaster API. .git/rebase-apply/patch:1879: indent with spaces. if (*value == old_value) .git/rebase-apply/patch:1881: indent with spaces. return; .git/rebase-apply/patch:3069: trailing whitespace. * CREATE TOASTER name HANDLER handler_name .git/rebase-apply/patch:3603: trailing whitespace. fetch_toast_slice(Relation toastrel, Oid valueid, .git/rebase-apply/patch:3654: trailing whitespace. toast_fetch_toast_slice(Relation toastrel, Oid valueid, warning: 5 lines add whitespace errors. $ git am <v23-0003-toaster-docs.patch Applying: Pluggable TOAST API documentation package .git/rebase-apply/patch:388: tab in indent. TsrRoutine *tsrroutine = makeNode(TsrRoutine); .git/rebase-apply/patch:389: tab in indent. tsrroutine->init = custom_toast_init; .git/rebase-apply/patch:390: tab in indent. tsrroutine->toast = custom_toast; .git/rebase-apply/patch:391: tab in indent. tsrroutine->detoast = custom_detoast; .git/rebase-apply/patch:392: tab in indent. tsrroutine->deltoast = custom_delete_toast; warning: squelched 12 whitespace errors warning: 17 lines add whitespace errors. ``` Please don't forget to run `pgindent` before formatting the patches with `git format-patch` next time. I'm going to submit a more detailed code review soon. -- Best regards, Aleksander Alekseev
Hi Nikita, > Please don't forget to run `pgindent` before formatting the patches > with `git format-patch` next time. There are also some compiler warnings, please see the attachment. > I'm going to submit a more detailed code review soon. -- Best regards, Aleksander Alekseev
Вложения
Hi Nikita, > I'm going to submit a more detailed code review soon. As promised, here is my feedback. ``` + Toaster could be assigned to toastable column with expression + <literal>STORAGE external TOASTER <replaceable>toaster_name</replaceable></literal> ``` 1.1. Could you please clarify what is the motivation behind such a verbose syntax? ``` +Please note that custom Toasters could be assigned only to External +storage type. ``` 1.2. That's odd. TOAST should work for EXTENDED and MAIN storage strategies as well. On top of that, why should custom TOASTers have any knowledge of the default four-stage algorithm and the storage strategies? If the storage strategy is actually ignored, it shouldn't be used in the syntax. ``` +Toasters could use any storage, advanced compression, encryption, and ``` 2. Although it's possible to implement some encryption in a TOASTer I don't think the documentation should advertise this. ``` +typedef struct TsrRoutine +{ + NodeTag type; + + /* interface functions */ + toast_init init; + toast_function toast; + update_toast_function update_toast; + copy_toast_function copy_toast; + detoast_function detoast; + del_toast_function deltoast; + get_vtable_function get_vtable; + toastervalidate_function toastervalidate; +} TsrRoutine; ``` 3.1. I believe we should rename this to something like `struct ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is not a routine. 3.2. The names of the fields should be made consistent - e.g. if you have update_toast and copy_toast then del_toast should be renamed to delete_toast. 3.2. Additionally, in some parts of the path del_toast is used, while in others - deltoast. 4. The user documentation should have clear answers on the following questions: 4.1. What will happen if the user tries to delete a TOASTer while still having data that was TOASTed using this TOASTer? Or this is not supported and the TOASTers should exist in the database indefinitely after creation? 4.2. Is it possible to delete and/or alter the default TOASTer? 4.3. Please make sure the previously discussed clarification regarding "N TOASTers vs M TableAMs" and the validation function is explicitly present. ``` Toaster initialization. Initial TOAST table creation and other startup preparations. ``` 5.1. The documentation should clarify how many times init() is called - is it done once for the TOASTer, once per relation, etc. 5.2. Why there is no free() method? ``` Updates TOASTed value. Returns new TOAST pointer. ``` 6. It's not clear for what reason update_toast() will be executed to begin with. This should be clarified in the documentation. How does it differ from copy_toast()? ``` Validates Toaster for given data type, storage and compression options. ``` 7. It should be explicitly stated when validate() is called and what happens depending on the return value. ``` +Virtual Functions Table API Extension ``` 8. IMO this section does a very poor job in explaining what this is for and when I may want to use this; what particular problem are we addressing here? 9. There are typos in the comments and the documentation, s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing articles. Please run your patches through a spellchecker. I suggest we address this piece of feedback before proceeding further. -- Best regards, Aleksander Alekseev
>just wanted to report that there seems to be certain issues with the
>formatting and/or trailing whitespaces in the patches:
>verbose syntax?
>strategies as well. On top of that, why should custom TOASTers have
>any knowledge of the default four-stage algorithm and the storage
>strategies? If the storage strategy is actually ignored, it shouldn't
>be used in the syntax.
>don't think the documentation should advertise this.
>ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
>not a routine.
>have update_toast and copy_toast then del_toast should be renamed to
>delete_toast.
>in others - deltoast.
>4.1. What will happen if the user tries to delete a TOASTer while
>still having data that was TOASTed using this TOASTer? Or this is not
>supported and the TOASTers should exist in the database indefinitely
>after creation?
>4.2. Is it possible to delete and/or alter the default TOASTer?
>4.3. Please make sure the previously discussed clarification regarding
>"N TOASTers vs M TableAMs" and the validation function is explicitly
>present.
>- is it done once for the TOASTer, once per relation, etc.
>5.2. Why there is no free() method?
>begin with. This should be clarified in the documentation. How does it
>differ from copy_toast()?
>happens depending on the return value.
>for and when I may want to use this; what particular problem are we
>addressing here?
Hi Nikita,
> I'm going to submit a more detailed code review soon.
As promised, here is my feedback.
```
+ Toaster could be assigned to toastable column with expression
+ <literal>STORAGE external TOASTER
<replaceable>toaster_name</replaceable></literal>
```
1.1. Could you please clarify what is the motivation behind such a
verbose syntax?
```
+Please note that custom Toasters could be assigned only to External
+storage type.
```
1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
strategies as well. On top of that, why should custom TOASTers have
any knowledge of the default four-stage algorithm and the storage
strategies? If the storage strategy is actually ignored, it shouldn't
be used in the syntax.
```
+Toasters could use any storage, advanced compression, encryption, and
```
2. Although it's possible to implement some encryption in a TOASTer I
don't think the documentation should advertise this.
```
+typedef struct TsrRoutine
+{
+ NodeTag type;
+
+ /* interface functions */
+ toast_init init;
+ toast_function toast;
+ update_toast_function update_toast;
+ copy_toast_function copy_toast;
+ detoast_function detoast;
+ del_toast_function deltoast;
+ get_vtable_function get_vtable;
+ toastervalidate_function toastervalidate;
+} TsrRoutine;
```
3.1. I believe we should rename this to something like `struct
ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
not a routine.
3.2. The names of the fields should be made consistent - e.g. if you
have update_toast and copy_toast then del_toast should be renamed to
delete_toast.
3.2. Additionally, in some parts of the path del_toast is used, while
in others - deltoast.
4. The user documentation should have clear answers on the following questions:
4.1. What will happen if the user tries to delete a TOASTer while
still having data that was TOASTed using this TOASTer? Or this is not
supported and the TOASTers should exist in the database indefinitely
after creation?
4.2. Is it possible to delete and/or alter the default TOASTer?
4.3. Please make sure the previously discussed clarification regarding
"N TOASTers vs M TableAMs" and the validation function is explicitly
present.
```
Toaster initialization. Initial TOAST table creation and other startup
preparations.
```
5.1. The documentation should clarify how many times init() is called
- is it done once for the TOASTer, once per relation, etc.
5.2. Why there is no free() method?
```
Updates TOASTed value. Returns new TOAST pointer.
```
6. It's not clear for what reason update_toast() will be executed to
begin with. This should be clarified in the documentation. How does it
differ from copy_toast()?
```
Validates Toaster for given data type, storage and compression options.
```
7. It should be explicitly stated when validate() is called and what
happens depending on the return value.
```
+Virtual Functions Table API Extension
```
8. IMO this section does a very poor job in explaining what this is
for and when I may want to use this; what particular problem are we
addressing here?
9. There are typos in the comments and the documentation,
s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing
articles. Please run your patches through a spellchecker.
I suggest we address this piece of feedback before proceeding further.
--
Best regards,
Aleksander Alekseev
Hi Nikita, Please, avoid top-posting [1]. > Toaster is set for the table column. Each TOASTable column could have > a different Toaster, so column option is the most obvious place to add it. This is a major limitation. IMO the user should be able to set a custom TOASTer for the entire table as well. Ideally - for the entire database too. This could be implemented entirely on the syntax level, the internals of the patch are not going to be affected. > >1.2. That's odd. TOAST should work for EXTENDED and MAIN storage > >strategies as well. On top of that, why should custom TOASTers have > >any knowledge of the default four-stage algorithm and the storage > >strategies? If the storage strategy is actually ignored, it shouldn't > >be used in the syntax. > > EXTENDED storage strategy means that TOASTed value is compressed > before being TOASTed, so no knowledge of its internals could be of any > use. EXTERNAL strategy means that value is being TOASTed in original > form. Storage strategy is the thing internal to AM used, and TOAST > mechanics is not meant to interfere with it. Again, STORAGE EXTERNAL > explicitly shows that value will be stored out-of-line. Let me rephrase. Will the custom TOASTers work only for EXTERNAL storage strategy or this is just a syntax? > >2. Although it's possible to implement some encryption in a TOASTer I > >don't think the documentation should advertise this. > > It is a good example of what could the Toaster be responsible for No, encryption is an excellent example of what a TOASTer should NOT do. If you are interested in encryption consider joining the "Moving forward with TDE" thread [2]. > >3.1. I believe we should rename this to something like `struct > >ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is > >not a routine. > > It was done similar to Table AM Routine (please check Pluggable > Storage API), along with some other decisions. OK, then maybe we shall keep the "Routine" part for consistency. I still don't like the "Tsr" abbreviation though and find it confusing. > It is not clear because current TOAST mechanics does not have UPDATE > functionality - it doesn't actually update TOASTed value, it marks this value > "dead" and inserts a new one. This is the cause of TOAST tables bloating > that is being complained about by many users. Update method is provided > for implementation of UPDATE operation. But should we really distinguish INSERT and UPDATE cases on this API level? It seems to me that TableAM just inserts new tuples. It's TOASTers job to figure out whether similar values existed before and should or shouldn't be reused. Additionally a particular TOASTer can reuse old values between _different_ rows, potentially even from different tables. Another reason why in practice there is little use of knowing whether the data is INSERTed or UPDATEd. > I already answered this question, maybe the answer was not very clear. > This is just an extension entry point, because for some more advanced > functionality existing pre-defined set of methods would be not enough, i.e. > special Toasters for complex datatypes like JSONb, that have complex > internal structure and may require additional ways to interact with it along > toast/detoast/update/delete. Maybe so, but it doesn't change the fact that the user documentation should clearly describe the interface and its usage. > These too. About free() method - Toasters are not meant to be deleted, > we mentioned this several times. They exist once they are created as long > as the DB itself. Have I answered your question? Users should be able to DROP extension. I seriously doubt that the patch is going to be accepted as long as it has this limitation. [1]: https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics [2]: https://www.postgresql.org/message-id/flat/CAOxo6XJh95xPOpvTxuP_kiGRs8eHcaNrmy3kkzWrzWxvyVkKkQ%40mail.gmail.com -- Best regards, Aleksander Alekseev
>do. If you are interested in encryption consider joining the "Moving
>forward with TDE" thread [2].
>level? It seems to me that TableAM just inserts new tuples. It's
>TOASTers job to figure out whether similar values existed before and
>should or shouldn't be reused. Additionally a particular TOASTer can
>reuse old values between _different_ rows, potentially even from
>different tables. Another reason why in practice there is little use
>of knowing whether the data is INSERTed or UPDATEd.
For TOASTer you SHOULD distinguish insert and update operations, really. Because for
>Users should be able to DROP extension. I seriously doubt that the
>patch is going to be accepted as long as it has this limitation.
Hi Nikita, > Setting TOAST for table and database is a subject for discussion. There is already default > Toaster. Also, there is not much sense in setting Jsonb Toaster as default even for table, do > not say database, because table could contain other TOASTable columns not of Json type. True, but besides Jsonb Toaster one can implement a universal one as well (your own example: encryption, or a Toaster that bypasses a 1 GB value limitation). However we can probably keep this out of scope of this particular patchset for now. As mentioned before this is going to be just an additional syntax for the user convenience. > Custom Toasters will work with Extended storage, but as I answered in previous email - > there is no much use of it, because they would deal with compressed data. Compression is actually a part of the four-stage TOAST algorithm. So what you're doing is using the default TOAST most of the time, but if the storage strategy is EXTERNAL and a custom TOASTer is specified then you use a type-aware "TOASTer". If the goal is to implement true "Pluggable TOASTers" then the TOAST should be substituted entirely. If the goal is to implement type-aware sub-TOASTers we should be honest about this and call it otherwise: e.g. "Type-aware TOASTer" or perhaps "subTOASTer". Additionally in this case there should be no validate() method since this is going to work only with the default heapam that implements the default TOASTer. So to clarify, the goal is to deliver true "Pluggable TOASTers" or only "type-aware sub-TOASTers"? > For TOASTer you SHOULD distinguish insert and update operations, really. Because for > TOASTed data these operations affect many tuples, and AM does know which of them > were updated and which were not - that's very serious limitation of current TOAST, and > TOAST mechanics shoud care about updating affected tuples only instead of marking > whole record dead and inserting new one. This is also an argument for not using EXTENDED > storage mode - because with compressed data you do not have such choice, you should > drop the whole record. This may actually be a good point. I suggest reflecting it in the documentation. > >Users should be able to DROP extension. I seriously doubt that the > >patch is going to be accepted as long as it has this limitation. > > There is a mention in documentation and previous discussion that this operation would lead > to loss of data TOASTed with this custom Toaster. I don't see any reason why the semantics for Toasters should be any different from user-defined types implemented in an extension. If there are columns that use a given Toaster we should forbid DROPping the extension. Otherwise "DROP extension" should succeed. No one says that this should be a fast operation but it should be possible. [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com -- Best regards, Aleksander Alekseev
Hi again, > [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com I added a link but forgot to use it, d'oh! Please note that if the goal is to merely implement "type-aware sub-TOASTers" then compression dictionaries [1] arguably provide the same value with MUCH less complexity. -- Best regards, Aleksander Alekseev
>value limitation). However we can probably keep this out of scope of
>this particular patchset for now. As mentioned before this is going to
>be just an additional syntax for the user convenience.
To transparently bypass the 1 Gb limit you have to increase size of data
>Compression is actually a part of the four-stage TOAST algorithm. So
>what you're doing is using the default TOAST most of the time, but if
>the storage strategy is EXTERNAL and a custom TOASTer is specified
>then you use a type-aware "TOASTer".
We provide several custom Toasters for particular types of data causing
Extended storage mode supports only 2 compression algorithms, though
>If the goal is to implement true "Pluggable TOASTers" then the TOAST
>should be substituted entirely. If the goal is to implement type-aware
>sub-TOASTers we should be honest about this and call it otherwise:
>e.g. "Type-aware TOASTer" or perhaps "subTOASTer". Additionally in
>this case there should be no validate() method since this is going to
>work only with the default heapam that implements the default TOASTer.
>So to clarify, the goal is to deliver true "Pluggable TOASTers" or
>only "type-aware sub-TOASTers"?
Pluggable TOAST does not supposes complete substitution of existing
>I don't see any reason why the semantics for Toasters should be any
>different from user-defined types implemented in an extension. If
>there are columns that use a given Toaster we should forbid DROPping
>the extension. Otherwise "DROP extension" should succeed. No one says
>that this should be a fast operation but it should be possible.
Hi again,
> [1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com
I added a link but forgot to use it, d'oh!
Please note that if the goal is to merely implement "type-aware
sub-TOASTers" then compression dictionaries [1] arguably provide the
same value with MUCH less complexity.
--
Best regards,
Aleksander Alekseev
Hi Nikita, > Pluggable TOAST is provided as an interface to allow developers to plug > in custom TOAST mechanics. It does not determines would it be universal > Toaster or one data type, but syntax for universal Toaster is out of scope > for this patchset. If I understand correctly, what is going to happen - the same interface TsrRoutine is going to be used for doing two things: 1) Implementing type-aware TOASTers as a special case for the default TOAST algorithm when EXTERNAL storage strategy is used. 2) Implementing universal TOASTers from scratch that have nothing to do with the default TOAST algorithm. Assuming this is the case, using the same interface for doing two very different things doesn't strike me as a great design decision. While working on v24 you may want to rethink this. Personally I believe that Pluggable TOASTers should support only case #2. If there is a need of reusing parts of the default TOASTer, corresponding pieces of code should be declared as non-static and called from the pluggable TOASTers directly. Alternatively we could have separate interfaces for case #1 and case #2 but this IMO becomes rather complicated. > I'm currently working on a revision of Pluggable TOAST that would make > dropping Toaster possible if there is no data TOASTed with it, along with > several other major changes. It will be available in this (I hope so) or the > following, if I won't make it in time, commitfest. Looking forward to v24! This is a major change so I hope there will be more feedback from other people on the mailing list. -- Best regards, Aleksander Alekseev
Table "pg_catalog.pg_toastrel"
Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description
--------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
oid | oid | | not null | | plain | | | |
toasteroid | oid | | not null | | plain | | | |
relid | oid | | not null | | plain | | | |
toastentid | oid | | not null | | plain | | | |
attnum | smallint | | not null | | plain | | | |
version | smallint | | not null | | plain | | | |
relname | name | | not null | | plain | | | |
toastentname | name | | not null | | plain | | | |
flag | "char" | | not null | | plain | | | |
toastoptions | "char" | | not null | | plain | | | |
Indexes:
"pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
"pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
"pg_toastrel_rel_index" btree (relid, attnum)
"pg_toastrel_tsr_index" btree (toasteroid)
Access method: heap
Hi Nikita,
> Pluggable TOAST is provided as an interface to allow developers to plug
> in custom TOAST mechanics. It does not determines would it be universal
> Toaster or one data type, but syntax for universal Toaster is out of scope
> for this patchset.
If I understand correctly, what is going to happen - the same
interface TsrRoutine is going to be used for doing two things:
1) Implementing type-aware TOASTers as a special case for the default
TOAST algorithm when EXTERNAL storage strategy is used.
2) Implementing universal TOASTers from scratch that have nothing to
do with the default TOAST algorithm.
Assuming this is the case, using the same interface for doing two very
different things doesn't strike me as a great design decision. While
working on v24 you may want to rethink this.
Personally I believe that Pluggable TOASTers should support only case
#2. If there is a need of reusing parts of the default TOASTer,
corresponding pieces of code should be declared as non-static and
called from the pluggable TOASTers directly.
Alternatively we could have separate interfaces for case #1 and case
#2 but this IMO becomes rather complicated.
> I'm currently working on a revision of Pluggable TOAST that would make
> dropping Toaster possible if there is no data TOASTed with it, along with
> several other major changes. It will be available in this (I hope so) or the
> following, if I won't make it in time, commitfest.
Looking forward to v24!
This is a major change so I hope there will be more feedback from
other people on the mailing list.
--
Best regards,
Aleksander Alekseev
Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog
relation containing TOAST dependencies. NOTE: here is a pre-patch, not
a final version, just to introduce another approach to a Pluggable TOAST
idea, it needs some cleanup, tests rework and some improvements, so
goal of this message is to introduce this different approach. This is the
last patch and it is installed on top of older TOAST API patches, so here
are 3 patches attached:
0001_toaster_interface_v24.patch.gz
This patch introduces new custom TOAST pointer, Pluggable TOAST API and
Toaster support functions - cache, lookup, and new attribute 'atttoaster'
in PG_ATTRIBUTE table which stores Toaster OID;
0002_toaster_default_v24.patch.gz
Here the default TOAST mechanics is routed via TOAST API, but still using
varatt_external TOAST Pointer - so this step does not change overall TOAST
mechanics unless you plug in some custom Toaster;
0003_pg_toastrel_table_v24.patch.gz
Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this
patch introduces new catalog table PG_TOASTREL with its support functions.
Motivation: PG_ATTRIBUTE is already the largest catalog table. We try
to avoid modification of existing catalog tables, and previous solution
had several problems:
1) New field in PG_ATTRIBUTE;
2) No opportunity to save all Toaster assignment history;
3) No opportunity to have multi-TOAST tables assigned to a relation or
an attribute;
4) Toaster cannot be dropped - to drop Toaster we need to scan all tables
with TOASTable columns.
Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided
to store all Table-Toaster relations in a new catalog table PG_TOASTREL.
This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store
full history of Toasters assignments, and allows to drop unused Toasters
from system.
Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command
creates a new row in PG_TOASTREL. To distinguish sequential assignments,
PG_TOASTREL has special attribute - 'version'. With each new assignment
its 'version' attribute is increased, and the row with the biggest 'version'
is the current Toaster for a column.
This approach allows to provide different behavior, even for a single table
we can have one TOAST table for the whole relation (as it is in current TOAST
mechanics), or we can have separate TOAST relation(s) for each TOASTable
column - this requires a slight modification if current approach. The latter
also allows simple invariant of column-oriented storage.
Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete -
current mechanics allows only 1 TOAST table for relation, which limits
greatly TOAST capabilities - because all TOASTed columns are stored in this
table, which in its turn limits overall base relation capacity.
In future, this approach allows us to have a kind of near infinite TOAST
storage, with ability to store large values (larger than 512 Mbytes),
auto-creation of TOAST table only when the first value is actually TOASTed,
and much more.
The approach, along with the TOAST API itself, introduces the catalog table
PG_TOASTREL with a set of support functions.
PG_TOASTREL definition:
postgres@postgres=# \d+ pg_toastrel;
Table "pg_catalog.pg_toastrel"
Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description
-------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
oid | oid | | not null | | plain | | | |
toasteroid | oid | | not null | | plain | | | |
relid | oid | | not null | | plain | | | |
toastentid | oid | | not null | | plain | | | |
attnum | smallint | | not null | | plain | | | |
version | smallint | | not null | | plain | | | |
relname | name | | not null | | plain | | | |
toastentname | name | | not null | | plain | | | |
flag | "char" | | not null | | plain | | | |
toastoptions | "char" | | not null | | plain | | | |
Indexes:
"pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
"pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
"pg_toastrel_rel_index" btree (relid, attnum)
"pg_toastrel_tsr_index" btree (toasteroid)
Access method: heap
Where:
oid - PG_TOASTREL record ID
toasteroid - Toaster OID from PG_TOASTER
relid - base relation OID
toastentid - TOAST entity OID (not necessary to be a table)
attnum - TOASTable attribute index in base relation
version - Toaster assignment version - sequence of assignments
relname - base relation name (optional)
toastentname - TOAST entity name (optional)
flag - special field to mark rows, currently only the value 'x' is used
to mark unused rows
PG_TOASTREL unique key consists of:
toasteroid, relid, attnum, version
All currently assigned Toasters are additionally stored in cache for
fast access. When new row is being TOASTed - Toaster, relation Oid,
TOAST relation Oid, column index are added into Toastrel Cache for fast
access.
Create table, change Toaster, change column type were changed to
add new rows in PG_TOASTREL, to use this table and cache instead
of altering pg_attribute with new column. For table creation from
scratch when no TOAST tables were created is used special condition
with version=0.
DROP TABLE drops rows in PG_TOASTREL for this table. This allows to -
DROP TOASTER command added. When no rows with the according Toaster are
present in PG_TOASTREL - it is considered unused and thus could be safely
dropped from the system.
Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped.
Вложения
On Tue, 27 Dec 2022 at 02:32, Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi hackers! > > Pluggable TOAST API with catalog control table PG_TOASTREL - pre-patch. > > Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog > relation containing TOAST dependencies. NOTE: here is a pre-patch, not > a final version, just to introduce another approach to a Pluggable TOAST > idea, it needs some cleanup, tests rework and some improvements, so > the main > goal of this message is to introduce this different approach. This is the > last patch and it is installed on top of older TOAST API patches, so here > are 3 patches attached: > > 0001_toaster_interface_v24.patch.gz > This patch introduces new custom TOAST pointer, Pluggable TOAST API and > Toaster support functions - cache, lookup, and new attribute 'atttoaster' > in PG_ATTRIBUTE table which stores Toaster OID; > > 0002_toaster_default_v24.patch.gz > Here the default TOAST mechanics is routed via TOAST API, but still using > varatt_external TOAST Pointer - so this step does not change overall TOAST > mechanics unless you plug in some custom Toaster; > > 0003_pg_toastrel_table_v24.patch.gz > Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this > patch introduces new catalog table PG_TOASTREL with its support functions. > > Motivation: PG_ATTRIBUTE is already the largest catalog table. We try > to avoid modification of existing catalog tables, and previous solution > had several problems: > 1) New field in PG_ATTRIBUTE; > 2) No opportunity to save all Toaster assignment history; > 3) No opportunity to have multi-TOAST tables assigned to a relation or > an attribute; > 4) Toaster cannot be dropped - to drop Toaster we need to scan all tables > with TOASTable columns. > > Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided > to store all Table-Toaster relations in a new catalog table PG_TOASTREL. > This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store > full history of Toasters assignments, and allows to drop unused Toasters > from system. > > Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command > creates a new row in PG_TOASTREL. To distinguish sequential assignments, > PG_TOASTREL has special attribute - 'version'. With each new assignment > its 'version' attribute is increased, and the row with the biggest 'version' > is the current Toaster for a column. > > This approach allows to provide different behavior, even for a single table > we can have one TOAST table for the whole relation (as it is in current TOAST > mechanics), or we can have separate TOAST relation(s) for each TOASTable > column - this requires a slight modification if current approach. The latter > also allows simple invariant of column-oriented storage. > > Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete - > current mechanics allows only 1 TOAST table for relation, which limits > greatly TOAST capabilities - because all TOASTed columns are stored in this > table, which in its turn limits overall base relation capacity. > > In future, this approach allows us to have a kind of near infinite TOAST > storage, with ability to store large values (larger than 512 Mbytes), > auto-creation of TOAST table only when the first value is actually TOASTed, > and much more. > > The approach, along with the TOAST API itself, introduces the catalog table > PG_TOASTREL with a set of support functions. > > PG_TOASTREL definition: > > postgres@postgres=# \d+ pg_toastrel; > Table "pg_catalog.pg_toastrel" > Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description > -------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+------------- > oid | oid | | not null | | plain | | | | > toasteroid | oid | | not null | | plain | | | | > relid | oid | | not null | | plain | | | | > toastentid | oid | | not null | | plain | | | | > attnum | smallint | | not null | | plain | | | | > version | smallint | | not null | | plain | | | | > relname | name | | not null | | plain | | | | > toastentname | name | | not null | | plain | | | | > flag | "char" | | not null | | plain | | | | > toastoptions | "char" | | not null | | plain | | | | > Indexes: > "pg_toastrel_oid_index" PRIMARY KEY, btree (oid) > "pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum) > "pg_toastrel_rel_index" btree (relid, attnum) > "pg_toastrel_tsr_index" btree (toasteroid) > Access method: heap > (This is not a final definition) > > Where: > oid - PG_TOASTREL record ID > toasteroid - Toaster OID from PG_TOASTER > relid - base relation OID > toastentid - TOAST entity OID (not necessary to be a table) > attnum - TOASTable attribute index in base relation > version - Toaster assignment version - sequence of assignments > relname - base relation name (optional) > toastentname - TOAST entity name (optional) > flag - special field to mark rows, currently only the value 'x' is used > to mark unused rows > > PG_TOASTREL unique key consists of: > toasteroid, relid, attnum, version > > All currently assigned Toasters are additionally stored in cache for > fast access. When new row is being TOASTed - Toaster, relation Oid, > TOAST relation Oid, column index are added into Toastrel Cache for fast > access. > > Create table, change Toaster, change column type were changed to > add new rows in PG_TOASTREL, to use this table and cache instead > of altering pg_attribute with new column. For table creation from > scratch when no TOAST tables were created is used special condition > with version=0. > > DROP TABLE drops rows in PG_TOASTREL for this table. This allows to - > DROP TOASTER command added. When no rows with the according Toaster are > present in PG_TOASTREL - it is considered unused and thus could be safely > dropped from the system. > > Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped. > > Working branch: > https://github.com/postgrespro/postgres/tree/toastapi_with_ctl > > Would be glad to get any proposals and objections. The patch does not apply on top of HEAD as in [1], please post a rebased patch: patching file src/backend/utils/cache/syscache.c === Applying patches on top of PostgreSQL commit ID 33ab0a2a527e3af5beee3a98fc07201e555d6e45 === === applying patch ./0001_toaster_interface_v24.patch patching file contrib/test_decoding/expected/ddl.out Hunk #2 FAILED at 874. 1 out of 2 hunks FAILED -- saving rejects to file src/backend/utils/cache/syscache.c.rej [1] - http://cfbot.cputube.org/patch_41_3490.log Regards, Vignesh
On Tue, 27 Dec 2022 at 02:32, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi hackers!
>
> Pluggable TOAST API with catalog control table PG_TOASTREL - pre-patch.
>
> Pluggable TOAST - TOAST API rework - introduce PG_TOASTREL catalog
> relation containing TOAST dependencies. NOTE: here is a pre-patch, not
> a final version, just to introduce another approach to a Pluggable TOAST
> idea, it needs some cleanup, tests rework and some improvements, so
> the main
> goal of this message is to introduce this different approach. This is the
> last patch and it is installed on top of older TOAST API patches, so here
> are 3 patches attached:
>
> 0001_toaster_interface_v24.patch.gz
> This patch introduces new custom TOAST pointer, Pluggable TOAST API and
> Toaster support functions - cache, lookup, and new attribute 'atttoaster'
> in PG_ATTRIBUTE table which stores Toaster OID;
>
> 0002_toaster_default_v24.patch.gz
> Here the default TOAST mechanics is routed via TOAST API, but still using
> varatt_external TOAST Pointer - so this step does not change overall TOAST
> mechanics unless you plug in some custom Toaster;
>
> 0003_pg_toastrel_table_v24.patch.gz
> Here Pluggable TOAST is reworked not to modify PG_ATTRIBUTE, instead this
> patch introduces new catalog table PG_TOASTREL with its support functions.
>
> Motivation: PG_ATTRIBUTE is already the largest catalog table. We try
> to avoid modification of existing catalog tables, and previous solution
> had several problems:
> 1) New field in PG_ATTRIBUTE;
> 2) No opportunity to save all Toaster assignment history;
> 3) No opportunity to have multi-TOAST tables assigned to a relation or
> an attribute;
> 4) Toaster cannot be dropped - to drop Toaster we need to scan all tables
> with TOASTable columns.
>
> Instead of extending PG_ATTRIBUTE with ATTTOASTER attribute, we decided
> to store all Table-Toaster relations in a new catalog table PG_TOASTREL.
> This cancels the necessity to modify catalog table PG_ATTRIBUTE, allows to store
> full history of Toasters assignments, and allows to drop unused Toasters
> from system.
>
> Toasters are assigned to a table column. ALTER TABLE ... SET TOASTER command
> creates a new row in PG_TOASTREL. To distinguish sequential assignments,
> PG_TOASTREL has special attribute - 'version'. With each new assignment
> its 'version' attribute is increased, and the row with the biggest 'version'
> is the current Toaster for a column.
>
> This approach allows to provide different behavior, even for a single table
> we can have one TOAST table for the whole relation (as it is in current TOAST
> mechanics), or we can have separate TOAST relation(s) for each TOASTable
> column - this requires a slight modification if current approach. The latter
> also allows simple invariant of column-oriented storage.
>
> Also, this approach makes PG_ATTRIBUTE attribute RELTOASTRELID obsolete -
> current mechanics allows only 1 TOAST table for relation, which limits
> greatly TOAST capabilities - because all TOASTed columns are stored in this
> table, which in its turn limits overall base relation capacity.
>
> In future, this approach allows us to have a kind of near infinite TOAST
> storage, with ability to store large values (larger than 512 Mbytes),
> auto-creation of TOAST table only when the first value is actually TOASTed,
> and much more.
>
> The approach, along with the TOAST API itself, introduces the catalog table
> PG_TOASTREL with a set of support functions.
>
> PG_TOASTREL definition:
>
> postgres@postgres=# \d+ pg_toastrel;
> Table "pg_catalog.pg_toastrel"
> Column | Type | Collation | Nullable | Default | Storage | Toaster | Compression | Stats target | Description
> -------------+----------+-----------+----------+---------+---------+---------+-------------+--------------+-------------
> oid | oid | | not null | | plain | | | |
> toasteroid | oid | | not null | | plain | | | |
> relid | oid | | not null | | plain | | | |
> toastentid | oid | | not null | | plain | | | |
> attnum | smallint | | not null | | plain | | | |
> version | smallint | | not null | | plain | | | |
> relname | name | | not null | | plain | | | |
> toastentname | name | | not null | | plain | | | |
> flag | "char" | | not null | | plain | | | |
> toastoptions | "char" | | not null | | plain | | | |
> Indexes:
> "pg_toastrel_oid_index" PRIMARY KEY, btree (oid)
> "pg_toastrel_name_index" UNIQUE CONSTRAINT, btree (toasteroid, relid, version, attnum)
> "pg_toastrel_rel_index" btree (relid, attnum)
> "pg_toastrel_tsr_index" btree (toasteroid)
> Access method: heap
> (This is not a final definition)
>
> Where:
> oid - PG_TOASTREL record ID
> toasteroid - Toaster OID from PG_TOASTER
> relid - base relation OID
> toastentid - TOAST entity OID (not necessary to be a table)
> attnum - TOASTable attribute index in base relation
> version - Toaster assignment version - sequence of assignments
> relname - base relation name (optional)
> toastentname - TOAST entity name (optional)
> flag - special field to mark rows, currently only the value 'x' is used
> to mark unused rows
>
> PG_TOASTREL unique key consists of:
> toasteroid, relid, attnum, version
>
> All currently assigned Toasters are additionally stored in cache for
> fast access. When new row is being TOASTed - Toaster, relation Oid,
> TOAST relation Oid, column index are added into Toastrel Cache for fast
> access.
>
> Create table, change Toaster, change column type were changed to
> add new rows in PG_TOASTREL, to use this table and cache instead
> of altering pg_attribute with new column. For table creation from
> scratch when no TOAST tables were created is used special condition
> with version=0.
>
> DROP TABLE drops rows in PG_TOASTREL for this table. This allows to -
> DROP TOASTER command added. When no rows with the according Toaster are
> present in PG_TOASTREL - it is considered unused and thus could be safely
> dropped from the system.
>
> Default toaster 'deftoaster' (reference TOAST mechanics) cannot be dropped.
>
> Working branch:
> https://github.com/postgrespro/postgres/tree/toastapi_with_ctl
>
> Would be glad to get any proposals and objections.
The patch does not apply on top of HEAD as in [1], please post a rebased patch:
patching file src/backend/utils/cache/syscache.c
=== Applying patches on top of PostgreSQL commit ID
33ab0a2a527e3af5beee3a98fc07201e555d6e45 ===
=== applying patch ./0001_toaster_interface_v24.patch
patching file contrib/test_decoding/expected/ddl.out
Hunk #2 FAILED at 874.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/cache/syscache.c.rej
[1] - http://cfbot.cputube.org/patch_41_3490.log
Regards,
Vignesh
Вложения
On Sun, 8 Jan 2023 at 01:40, Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi! > > Thank you for your attention. > I've rebased the patchset onto the latest master (from 07.01), but the second part is still > in pre-patch shape - it is working, but tests are failing due to changes in TOAST relations > logic - I haven't adapted 'em yet. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID c44f6334ca6ff6d242d9eb6742441bc4e1294067 === === expanding ./0002_toaster_default_v25.patch.gz === expanding ./0001_toaster_interface_v25.patch.gz === expanding ./0004_drop_toaster_v25.patch.gz === expanding ./0003_pg_toastrel_control_v25.patch.gz === applying patch ./0001_toaster_interface_v25.patch .... patching file src/include/postgres.h Hunk #1 succeeded at 80 with fuzz 2 (offset 4 lines). Hunk #2 FAILED at 148. Hunk #3 FAILED at 315. Hunk #4 FAILED at 344. Hunk #5 FAILED at 359. 4 out of 5 hunks FAILED -- saving rejects to file src/include/postgres.h.rej [1] - http://cfbot.cputube.org/patch_41_3490.log Regards, Vignesh
On Sun, 8 Jan 2023 at 01:40, Nikita Malakhov <hukutoc@gmail.com> wrote:
>
> Hi!
>
> Thank you for your attention.
> I've rebased the patchset onto the latest master (from 07.01), but the second part is still
> in pre-patch shape - it is working, but tests are failing due to changes in TOAST relations
> logic - I haven't adapted 'em yet.
The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
c44f6334ca6ff6d242d9eb6742441bc4e1294067 ===
=== expanding ./0002_toaster_default_v25.patch.gz
=== expanding ./0001_toaster_interface_v25.patch.gz
=== expanding ./0004_drop_toaster_v25.patch.gz
=== expanding ./0003_pg_toastrel_control_v25.patch.gz
=== applying patch ./0001_toaster_interface_v25.patch
....
patching file src/include/postgres.h
Hunk #1 succeeded at 80 with fuzz 2 (offset 4 lines).
Hunk #2 FAILED at 148.
Hunk #3 FAILED at 315.
Hunk #4 FAILED at 344.
Hunk #5 FAILED at 359.
4 out of 5 hunks FAILED -- saving rejects to file src/include/postgres.h.rej
[1] - http://cfbot.cputube.org/patch_41_3490.log
Regards,
Vignesh
On 2023-Jan-14, Nikita Malakhov wrote: > Hi! > Fails due to recent changes. Working on it. Please see my response here https://postgr.es/m/20230203095540.zutul5vmsbmantbm@alvherre.pgsql -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
>but they seem to be trying to fix the same problem: the fact that the
>default TOAST stuff isn't good enough for JSONB.
The problem, actually, is that the default TOAST is often not good for
modern loads and amounts of data.Pluggable TOAST is based not only
on pure enthusiasm, but on demands and tickets from production
databases.The main demand is effective and transparent storage subsystem
for large values for some problematic types of data, which we already have,
with proven efficiency.
So we're really quite surprised that it has got so little feedback. We've got
some opinions on approach but there is no any general one on the approach
itself except doubts about the TOAST mechanism needs revision at all.
Currently we're busy revising the whole Pluggable TOAST API to make it
available as an extension and based on hooks to minimize changes in
the core. It will be available soon.
Hi, On 2023-02-06 00:10:50 +0300, Nikita Malakhov wrote: > The problem, actually, is that the default TOAST is often not good for > modern loads and amounts of data.Pluggable TOAST is based not only > on pure enthusiasm, but on demands and tickets from production > databases. > The main demand is effective and transparent storage subsystem > for large values for some problematic types of data, which we already have, > with proven efficiency. > So we're really quite surprised that it has got so little feedback. We've > got > some opinions on approach but there is no any general one on the approach > itself except doubts about the TOAST mechanism needs revision at all. The problem for me is that what you've been posting doesn't actually fix any problem, but instead introduces lots of new code and complexity. > Currently we're busy revising the whole Pluggable TOAST API to make it > available as an extension and based on hooks to minimize changes in > the core. It will be available soon. I don't think we should accept that either. It still doesn't improve anything about toast, it just allows you to do such improvements out of core. Greetings, Andres Freund
Hi, > > So we're really quite surprised that it has got so little feedback. We've > > got > > some opinions on approach but there is no any general one on the approach > > itself except doubts about the TOAST mechanism needs revision at all. > > The problem for me is that what you've been posting doesn't actually fix > any problem, but instead introduces lots of new code and complexity. > > Currently we're busy revising the whole Pluggable TOAST API to make it > > available as an extension and based on hooks to minimize changes in > > the core. It will be available soon. > > I don't think we should accept that either. It still doesn't improve > anything about toast, it just allows you to do such improvements out of > core. Agree. On top of that referencing non-reproducible benchmarks doesn't help. There were some slides referenced in the thread but I couldn't find exact steps to reproduce the benchmarks. Your desire to improve the TOAST mechanism is much appreciated. I believe we are all on the same side here, the one where people work together to make PostgreSQL an even better DBMS. However in order to achieve this firstly a consensus within the community should be reached about how exactly we are going to do this. Afterwards, all the code and benchmarks should be made publicly available under a proper license so that anyone could explore and reproduce them. Last but not least, the complexity should really be taken into account. There are real people who are going to maintain the code after (and if) it will be merged, and there are not so many of them. The problems I see are that the type-aware TOASTers skipped step (1) right to the step (2) and doesn't seem to consider (3). Even after it was explicitly pointed out that we should take a step back and return to (1). -- Best regards, Aleksander Alekseev
On 2023-Feb-06, Nikita Malakhov wrote: > Currently we're busy revising the whole Pluggable TOAST API to make it > available as an extension and based on hooks to minimize changes in > the core. It will be available soon. Hmm, I'm not sure why would PGDG want to accept such a thing. I read "minimize changes" as "open source Postgres can keep their crap implementation and companies will offer good ones for a fee". I'd rather have something that can give users direct benefit -- not hide it behind proprietary licenses forcing each company to implement their own performant toaster. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 2023-Feb-06, Nikita Malakhov wrote:
> Currently we're busy revising the whole Pluggable TOAST API to make it
> available as an extension and based on hooks to minimize changes in
> the core. It will be available soon.
Hmm, I'm not sure why would PGDG want to accept such a thing. I read
"minimize changes" as "open source Postgres can keep their crap
implementation and companies will offer good ones for a fee". I'd
rather have something that can give users direct benefit -- not hide it
behind proprietary licenses forcing each company to implement their own
performant toaster.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hi, > The main reason behind this decision is that keeping the first implementation > on the side of the vanilla (I mean rebasing it) over time is very difficult due > to the very invasive nature of this solution. > > So we decided to reduce changes in the core to the minimum necessary > to make it available through the hooks, because the hooks part is very > lightweight and simple to keep rebasing onto the vanilla core. We plan > to keep this extension free with the PostgreSQL license, so any PostgreSQL > user could benefit from the TOAST on steroids, and sometimes in the > future it will be a much simpler task to integrate the Pluggable TOAST into > the vanilla, along with our advanced TOAST implementations which > we plan to keep under Open Source licenses too. That's great to hear. I'm looking forward to the link to the corresponding GitHub repository. Please let us know when this effort will be available for testing and benchmarking! I would like to point out however that there were several other pieces of feedback that could have been missed: * No one wants to see this as an extension. This was my original proposal (adding ZSON to /contrib/) and it was rejected. The community explicitly wants this to be a core feature with its syntax, autocompletion, documentation and stuff. * The community wants the feature to have a simple implementation. You said yourself that the idea of type-aware TOASTers is very invasive, and I completely agree. * People also want this to be simple from the user perspective, as simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd]; At least this is my personal summary/impression from following the mailing list. Anyhow since we are back to the stage where we discuss the RFC I suggest continuing it in the compression dictionaries thread [1] since we made noticeable progress there already. [1]: https://postgr.es/m/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com -- Best regards, Aleksander Alekseev
Hi, On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote: > So we decided to reduce changes in the core to the minimum necessary > to make it available through the hooks, because the hooks part is very > lightweight and simple to keep rebasing onto the vanilla core. At least I don't think we should accept such hooks. I don't think I am alone in that. Greetings, Andres Freund
On Mon, 6 Feb 2023 at 20:24, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-02-06 16:38:01 +0300, Nikita Malakhov wrote: > > So we decided to reduce changes in the core to the minimum necessary > > to make it available through the hooks, because the hooks part is very > > lightweight and simple to keep rebasing onto the vanilla core. > > At least I don't think we should accept such hooks. I don't think I am alone > in that. +1 Assuming type-aware TOASTing is the goal, I don't think hooks are the right design to implement that. -Matthias van de Meent
On Mon, 6 Feb 2023 at 15:38, Aleksander Alekseev <aleksander@timescale.com> wrote: > I would like to point out however that there were several other pieces > of feedback that could have been missed: > > * No one wants to see this as an extension. This was my original > proposal (adding ZSON to /contrib/) and it was rejected. The community > explicitly wants this to be a core feature with its syntax, > autocompletion, documentation and stuff. I believe that is a misrepresentation of the situation. ZSON had (has?) several systemic issues and could not be accepted to /contrib/ in the way it was implemented, and it was commented that it would make sense that the feature of compression assisted by dictionaries would be implemented in core. Yet still, that feature is only closely related to pluggable TOAST, but it is not the same as making TOAST pluggable. > * The community wants the feature to have a simple implementation. You > said yourself that the idea of type-aware TOASTers is very invasive, > and I completely agree. I'm not sure that this is correct either. Compression (and TOAST) is inherently complex, and I don't think we should reject improvements because they are complex. The problem that I see being raised here, is that there was little discussion and no observed community consensus about the design of this complex feature *before* this patch with high complexity was provided. The next action that was requested is to take a step back and decide how we would want to implement type-aware TOASTing (and the associated patch compression dictionaries) *before* we look into the type-aware toasting. > * People also want this to be simple from the user perspective, as > simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd]; Could you provide a reference to this? I have yet to see a COMPRESSED TABLE feature or syntax, let alone users asking for TOAST to be as easily usable as that feature or syntax. Kind regards, Matthias van de Meent
>discussion and no observed community consensus about the design of
>this complex feature *before* this patch with high complexity was
>provided.
>The next action that was requested is to take a step back and decide
>how we would want to implement type-aware TOASTing (and the associated
>patch compression dictionaries) *before* we look into the type-aware
>toasting.
Hi, > I believe that is a misrepresentation of the situation. ZSON had > (has?) several systemic issues and could not be accepted to /contrib/ > in the way it was implemented, and it was commented that it would make > sense that the feature of compression assisted by dictionaries would > be implemented in core. Yet still, that feature is only closely > related to pluggable TOAST, but it is not the same as making TOAST > pluggable. > > > * The community wants the feature to have a simple implementation. You > > said yourself that the idea of type-aware TOASTers is very invasive, > > and I completely agree. > > I'm not sure that this is correct either. Compression (and TOAST) is > inherently complex, and I don't think we should reject improvements > because they are complex. > The problem that I see being raised here, is that there was little > discussion and no observed community consensus about the design of > this complex feature *before* this patch with high complexity was > provided. Strictly speaking there is no such thing as "community opinion". There are different people, everyone has their own opinion. To make things more interesting the opinions change with time. I did my best to make a brief summary of 100+ messages from different people in something like 4 threads. These are things that were requested and/or no one disagrees with (at least no one said "no, put all this out of the core! and make it complicated too!"). Focusing on something (almost) no one disagrees with seems to be more productive than arguing about something everyone disagrees with. As I see it, the goal is not to be right, but rather to find a consensus most of us will be not unhappy with. > The next action that was requested is to take a step back and decide > how we would want to implement type-aware TOASTing (and the associated > patch compression dictionaries) *before* we look into the type-aware > toasting. Yes, I thought we already agreed to forget about type-aware TOASTing and compression dictionaries, and are looking for a consensus now. To clarify, I don't think that pluggable TOASTing is an absolutely bad idea. We are just not discussing this particular idea anymore, at least for now. > > * People also want this to be simple from the user perspective, as > > simple as just CREATE COMPRESSED TABLE ... [USING lz4|zstd]; > > Could you provide a reference to this? I have yet to see a COMPRESSED > TABLE feature or syntax, let alone users asking for TOAST to be as > easily usable as that feature or syntax. I was referring to the recent discussion of the new RFC. Please see [1] and below. [1]: https://www.postgresql.org/message-id/flat/20230203095540.zutul5vmsbmantbm%40alvherre.pgsql#7cce6acef0cb7eb2490715ec9d835e74 -- Best regards, Aleksander Alekseev
Hi, hackers! Maybe I've read the thread too superficially, but for me, it seems like more of a discussion on what TOAST should NOT be. Maybe someone more in the topic could explain what is the consensus on what we require and what we like to to have in a new TOAST? For me, a good toast should be chunk-updatable, so that we don't need to rewrite the whole TOAST and WAL-replicate the whole thing at every small attribute modification. But obviously, it's just another opinion. Kind regards, Pavel Borisov
On Wed, 14 Jun 2023 at 14:05, Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi! > > I have a question for the community - why was this patch silently put to a "rejected" status? > Should there no be some kind of explanation? > > During this discussion I got the impression that for some reason some members of the community > do not want the TOAST functionality, which has such drawbacks that make it really a curse for > in many ways very good DBMS, to be touched. We cannot get rid of it because of backwards > compatibility, so the best way is to make it more adaptable and extensible - this is what this thread > is about. We proposed our vision on how to extend the TOAST Postgres-way, like Pluggable > Storage some time before. > > There are some very complex subjects left in this topic that really need a community' attention. > I've mentioned them above, but there was no feedback on them. > > Pavel, we've already had an update implementation for TOAST. But it is a part of a Pluggable > TOAST and it hardly would be here without it. I've started another thread on extending the TOAST > pointer, maybe you would want to participate there [1]. > > We still would be grateful for feedback. > > [1] Extending the TOAST Pointer I don't see a clear reason it's rejected, besides technically it's Waiting on Author since January. If it's a mistake and the patch is up-to-date you can set an appropriate status. Regards, Pavel Borisov, Supabase.
Hi, > > I have a question for the community - why was this patch silently put to a "rejected" status? > > Should there no be some kind of explanation? I wouldn't say that it happened "silently" nor that the reason is so mysterious [1]. [1]: https://www.postgresql.org/message-id/flat/CAM-w4HPjg7NwEWBtXn1empgAg3fqJHifHo_nhgqFWopiYaNxYg%40mail.gmail.com -- Best regards, Aleksander Alekseev
Hi Nikita, > We already have a lot of changes in Pluggable TOAST that were not committed > to the main GIT branch of this thread, so it seems that I have to merge them and > reopen it. Pretty sure that reopening an already rejected patch that is competing with compression dictionaries (which the rest of us are currently focusing on) will achieve anything. Consider joining the compression dictionaries effort instead [1]. During the discussion with the community it ended up being a TOAST improvement after all. So we could use your expertise in this area. [1]: https://commitfest.postgresql.org/43/3626/ -- Best regards, Aleksander Alekseev
Hi, > Pretty sure that reopening an already rejected patch that is competing > with compression dictionaries (which the rest of us are currently > focusing on) will achieve anything. Ooops, I didn't mean to be sarcastic: s/will achieve/will not achieve/ My apologies. > Consider joining the compression > dictionaries effort instead [1]. During the discussion with the > community it ended up being a TOAST improvement after all. So we could > use your expertise in this area. > > [1]: https://commitfest.postgresql.org/43/3626/ -- Best regards, Aleksander Alekseev