Обсуждение: Per-table storage parameters for TableAM/IndexAM extensions

Поиск
Список
Период
Сортировка

Per-table storage parameters for TableAM/IndexAM extensions

От
Sadhuprasad Patro
Дата:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com/

Вложения

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Dilip Kumar
Дата:
On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.

+1 for the idea.
 

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

This will work for CREATE TABLE as well I guess as normal relation storage parameter works now right?


The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

IMHO, if the user is changing the access method for the table then it should be fine to throw an error if there are some parameters which are not supported by the new AM.  So that user can take a calculative call and first remove those storage options before changing the AM.

I have a few comments on the patch, mostly cosmetics.

1.
+ Assert(routine->taboptions != NULL);

Why AM is not allowed to register the NULL function, if NULL is registered that means the AM
does not support any of the storage parameters.
2.
@@ -1358,6 +1358,7 @@ untransformRelOptions(Datum options)
  return result;
 }
 
+
 /*
  * Extract and parse reloptions from a pg_class tuple.
  *

Unwanted hunk (added extra line)

3.
+ * Parse options for heaps, views and toast tables. This is
+ * implementation of relOptions for access method heapam.
  */

Better to say access method heap instead of heapam.
4.
+ * Parse options for tables.
+ *
+ * taboptions tables AM's option parser function
+ *      reloptions options as text[] datum
+ *      validate error flag

Function header comment formatting is not proper, it also has uneven spacing between words.
5.
-extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc)
+extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc,
+ reloptions_function taboptions)

Indentation is not proper, run pgindent on this.

5.
>Currently all the storage options for a table are very much specific to the heap but a different AM might need some user defined AM specific parameters to help tune the AM. So here is a patch which provides an AM level routine so that instead of getting >parameters validated using “heap_reloptions” it will call the registered AM routine.

Wrap these long commit message lines at 80 characters.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Rushabh Lathia
Дата:


On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.


This is a good idea. +1.

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER <new_tableam_handler>;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

I syntax here is,  ALTER TABLE <table_name> SET ( attribute_option = value );


The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

I think throwing an error makes more sense, so that the user can clean that. 

Here are a few quick cosmetic review comments:

1)
@@ -1372,7 +1373,8 @@ untransformRelOptions(Datum options)
  */
 bytea *
 extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
-  amoptions_function amoptions)
+ amoptions_function amoptions,
+ reloptions_function taboptions)

Indentation has been changed and needs to be fixed.

2)

  case RELKIND_MATVIEW:
            options = table_reloptions(taboptions, classForm->relkind, datum, false);
break;

Going beyond line limit.

3)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 9befe01..6324d7e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2581,6 +2581,7 @@ static const TableAmRoutine heapam_methods = {
  .index_build_range_scan = heapam_index_build_range_scan,
  .index_validate_scan = heapam_index_validate_scan,
 
+ .taboptions = heap_reloptions,

Instead of taboptions can name this as relation_options to be in sink with other members. 

4) 

@@ -2427,7 +2428,7 @@ do_autovacuum(void)
  */
  MemoryContextSwitchTo(AutovacMemCxt);
  tab = table_recheck_autovac(relid, table_toast_map, pg_class_desc,
- effective_multixact_freeze_max_age);
+ classRel->rd_tableam->taboptions, effective_multixact_freeze_max_age);
  if (tab == NULL)

Split the another added parameter to function in the next line.
 
5)

Overall patch has many indentation issues, I would suggest running the
pgindent to fix those.



Regards
Rushabh Lathia

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Jelte Fennema
Дата:
Big +1, this is a great addition!

I think it would be very useful if there were some tests for this new
feature. Something similar to the tests for storage parameters for
index AMs in src/test/modules/dummy_index_am.

Apart from that I think the documentation for table storage parameters
needs to be updated in doc/src/sgml/ref/create_table.sgml. It now
needs to indicate that these parameters are different for each table
access method. Similar to this paragraph in the create index storage
parameter section of the docs:

> Each index method has its own set of allowed storage parameters.
> The B-tree, hash, GiST and SP-GiST index methods all accept this
> parameter

Jelte



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Sadhuprasad Patro
Дата:
On Mon, Jan 17, 2022 at 4:47 PM Jelte Fennema <postgres@jeltef.nl> wrote:
>
> Big +1, this is a great addition!
>
> I think it would be very useful if there were some tests for this new
> feature. Something similar to the tests for storage parameters for
> index AMs in src/test/modules/dummy_index_am.
>
Sure, I will refer to the index AM test and add the test cases needed.

> Apart from that I think the documentation for table storage parameters
> needs to be updated in doc/src/sgml/ref/create_table.sgml. It now
> needs to indicate that these parameters are different for each table
> access method. Similar to this paragraph in the create index storage
> parameter section of the docs:

Sure, I will add the documentation part for this.

As of now, I have fixed the comments from Dilip & Rushabh and have
done some more changes after internal testing and review. Please find
the latest patch attached.

Thanks & Regards
SadhuPrasad
www.EnterpriseDB.com

Вложения

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Jeff Davis
Дата:
On Tue, 2022-01-18 at 22:44 +0530, Sadhuprasad Patro wrote:
> As of now, I have fixed the comments from Dilip & Rushabh and have
> done some more changes after internal testing and review. Please find
> the latest patch attached.

Hi,

Thank you for working on this! Some questions/comments:

At a high level, it seems there are some options that are common to all
tables, regardless of the AM. For instance, the vacuum/autovacuum
options. (Even if the AM doesn't require vacuum, then it needs to at
least be able to communicate that somehow.) I think parallel_workers
and user_catalog_table also fit into this category. That means we need
all of StdRdOptions to be the same, with the possible exception of
toast_tuple_target and/or fillfactor.

The current patch just leaves it up to the AM to return a bytea that
can be cast to StdRdOptions, which seems like a fragile API.

That makes me think that what we really want is to have *extra* options
for a table AM, not an entirely custom set. Do you agree?

If so, I suggest you refactor so that if validation doesn't recognize a
parameter, it calls a table AM method to validate it, and lets it in if
validation succeeds. That way all the stuff around StdRdOptions is
unchanged. When the table AM needs the parameter value, it can parse
pg_class.reloptions for itself and save it in rd_amcache.

Regards,
    Jeff Davis





Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Robert Haas
Дата:
On Wed, Dec 29, 2021 at 12:08 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
> Open Question: When a user changes AM, then what should be the
> behavior for not supported storage options? Should we drop the options
> and go with only system storage options?
> Or throw an error, in which case the user has to clean the added parameters.

A few people have endorsed the error behavior here but I foresee problems.

Imagine that I am using the "foo" tableam with "compression=lots" and
I want to switch to the "bar" AM which does not support that option.
If I remove the "compression=lots" option using a separate command,
the "foo" table AM may rewrite my whole table and decompress
everything. Then when I convert to the "bar" AM it's going to have to
be rewritten again. That's painful. I clearly need some way to switch
AMs without having to rewrite the table twice.

It's also interesting to consider the other direction. If I am
switching from "bar" to "foo" I would really like to be able to add
the "compression=lots" option at the same time I make the switch.
There needs to be some syntax for that.

One way to solve the first of these problem is to silently drop
unsupported options. Maybe a better way is to have syntax that allows
you to specify options to be added and removed at the time you switch
AMs e.g.:

ALTER TABLE mytab SET ACCESS METHOD bar OPTIONS (DROP compression);
ALTER TABLE mytab SET ACCESS METHOD foo OPTIONS (ADD compression 'lots');

I don't like that particular syntax a ton personally but it does match
what we already use for ALTER SERVER. Unfortunately it's wildly
inconsistent with what we do for ALTER TABLE. Another idea might be
something like:

ALTER TABLE mytab SET ACCESS METHOD bar RESET compression;
ALTER TABLE mytab SET ACCESS METHOD foo SET compression = 'lots';

You'd need to be able to do multiple things with one command e.g.

ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
preferred_fruit = 'banana';

Regardless of the details, I don't think it's viable to just say,
well, rewrite the table multiple times if that's what it takes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Dilip Kumar
Дата:
On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:

Imagine that I am using the "foo" tableam with "compression=lots" and
I want to switch to the "bar" AM which does not support that option.
If I remove the "compression=lots" option using a separate command,
the "foo" table AM may rewrite my whole table and decompress
everything. Then when I convert to the "bar" AM it's going to have to
be rewritten again. That's painful. I clearly need some way to switch
AMs without having to rewrite the table twice.

I agree with you, if we force users to drop the option as a separate command then we will have to rewrite the table twice.
 
It's also interesting to consider the other direction. If I am
switching from "bar" to "foo" I would really like to be able to add
the "compression=lots" option at the same time I make the switch.
There needs to be some syntax for that.

One way to solve the first of these problem is to silently drop
unsupported options. Maybe a better way is to have syntax that allows
you to specify options to be added and removed at the time you switch
AMs e.g.:

+1


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Sadhuprasad Patro
Дата:
> On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>
>> Imagine that I am using the "foo" tableam with "compression=lots" and
>> I want to switch to the "bar" AM which does not support that option.
>> If I remove the "compression=lots" option using a separate command,
>> the "foo" table AM may rewrite my whole table and decompress
>> everything. Then when I convert to the "bar" AM it's going to have to
>> be rewritten again. That's painful. I clearly need some way to switch
>> AMs without having to rewrite the table twice.
>
Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.


> You'd need to be able to do multiple things with one command e.g.

> ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> preferred_fruit = 'banana';

+1
Silently dropping some options is not right and it may confuse users
too. So I would like to go
for the command you have suggested, where the user should be able to
SET & RESET multiple
options in a single command for an object.

Thanks & Regards
SadhuPrasad
http://www.enterprisedb.com



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Simon Riggs
Дата:
On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
>
> > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >>
> >> Imagine that I am using the "foo" tableam with "compression=lots" and
> >> I want to switch to the "bar" AM which does not support that option.
> >> If I remove the "compression=lots" option using a separate command,
> >> the "foo" table AM may rewrite my whole table and decompress
> >> everything. Then when I convert to the "bar" AM it's going to have to
> >> be rewritten again. That's painful. I clearly need some way to switch
> >> AMs without having to rewrite the table twice.
> >
> Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.
>
>
> > You'd need to be able to do multiple things with one command e.g.
>
> > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> > preferred_fruit = 'banana';
>
> +1
> Silently dropping some options is not right and it may confuse users
> too. So I would like to go
> for the command you have suggested, where the user should be able to
> SET & RESET multiple
> options in a single command for an object.

I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning
accurately to me.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Sadhuprasad Patro
Дата:
On Fri, Feb 18, 2022 at 10:48 PM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 17 Feb 2022 at 17:55, Sadhuprasad Patro <b.sadhu@gmail.com> wrote:
> >
> > > On Sat, Feb 12, 2022 at 2:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >>
> > >>
> > >> Imagine that I am using the "foo" tableam with "compression=lots" and
> > >> I want to switch to the "bar" AM which does not support that option.
> > >> If I remove the "compression=lots" option using a separate command,
> > >> the "foo" table AM may rewrite my whole table and decompress
> > >> everything. Then when I convert to the "bar" AM it's going to have to
> > >> be rewritten again. That's painful. I clearly need some way to switch
> > >> AMs without having to rewrite the table twice.
> > >
> > Agreed. Better to avoid multiple rewrites here. Thank you for figuring out this.
> >
> >
> > > You'd need to be able to do multiple things with one command e.g.
> >
> > > ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET
> > > preferred_fruit = 'banana';
> >
> > +1
> > Silently dropping some options is not right and it may confuse users
> > too. So I would like to go
> > for the command you have suggested, where the user should be able to
> > SET & RESET multiple
> > options in a single command for an object.
>
> I prefer ADD/DROP to SET/RESET. The latter doesn't convey the meaning
> accurately to me.

I have added a dummy test module for table AM and did the document
change in the latest patch attached...
The Next plan is to provide users to change the AM storage parameters
swiftly through a single command. I will work on the same and give
another version.

As of now I will go with the ADD/DROP keywords for "ALTER TABLE" command.

Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com

Вложения

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Andres Freund
Дата:
Hi,

On 2022-02-24 12:26:08 +0530, Sadhuprasad Patro wrote:
> I have added a dummy test module for table AM and did the document
> change in the latest patch attached...

The test module doesn't build on windows, unfortunately... Looks like you need
to add PGDLLIMPORT to a few variables:
[01:26:18.539] c:\cirrus\src\test\modules\dummy_table_am\dummy_table_am.c(488): warning C4700: uninitialized local
variable'rel' used [c:\cirrus\dummy_table_am.vcxproj]
 
[01:26:18.539] dummy_table_am.obj : error LNK2001: unresolved external symbol synchronize_seqscans
[c:\cirrus\dummy_table_am.vcxproj]
[01:26:18.539] .\Debug\dummy_table_am\dummy_table_am.dll : fatal error LNK1120: 1 unresolved externals
[c:\cirrus\dummy_table_am.vcxproj]
[01:26:18.539]     1 Warning(s)
[01:26:18.539]     2 Error(s)

https://cirrus-ci.com/task/5067519584108544?logs=build#L2085

Marked the CF entry as waiting-on-author.

Greetings,

Andres



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Sadhuprasad Patro
Дата:
On Tue, Mar 22, 2022 at 7:24 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-24 12:26:08 +0530, Sadhuprasad Patro wrote:
> > I have added a dummy test module for table AM and did the document
> > change in the latest patch attached...
>
> The test module doesn't build on windows, unfortunately... Looks like you need
> to add PGDLLIMPORT to a few variables:
> [01:26:18.539] c:\cirrus\src\test\modules\dummy_table_am\dummy_table_am.c(488): warning C4700: uninitialized local
variable'rel' used [c:\cirrus\dummy_table_am.vcxproj]
 
> [01:26:18.539] dummy_table_am.obj : error LNK2001: unresolved external symbol synchronize_seqscans
[c:\cirrus\dummy_table_am.vcxproj]
> [01:26:18.539] .\Debug\dummy_table_am\dummy_table_am.dll : fatal error LNK1120: 1 unresolved externals
[c:\cirrus\dummy_table_am.vcxproj]
> [01:26:18.539]     1 Warning(s)
> [01:26:18.539]     2 Error(s)
>
> https://cirrus-ci.com/task/5067519584108544?logs=build#L2085
>
> Marked the CF entry as waiting-on-author.

HI,
Thank you for the feedback Andres. I will take care of the same.

As of now attached is a new patch on this to support the addition of
new option parameters or drop the old parameters through ALTER TABLE
command.
Need some more testing on this, which is currently in progress.
Providing the patch to get early feedback in case of any major
comments...

New Command:
ALTER TABLE name SET ACCESS METHOD amname [ OPTIONS ( ADD | DROP
option 'value' [, ... ] ) ];


Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com

Вложения

Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Greg Stark
Дата:
This patch still has code warnings on the cfbot and I don't think
they're platform-specific:

[00:28:28.236] gram.y: In function ‘base_yyparse’:
[00:28:28.236] gram.y:2851:58: error: passing argument 2 of
‘makeDefElemExtended’ from incompatible pointer type
[-Werror=incompatible-pointer-types]
[00:28:28.236] 2851 | $$ = makeDefElemExtended(NULL, $2, NULL,
DEFELEM_DROP, @2);
[00:28:28.236] | ~~~~~~~~~^~~~~~~
[00:28:28.236] | |
[00:28:28.236] | DefElem *
[00:28:28.236] In file included from gram.y:58:
[00:28:28.236] ../../../src/include/nodes/makefuncs.h:102:60: note:
expected ‘char *’ but argument is of type ‘DefElem *’
[00:28:28.236] 102 | extern DefElem *makeDefElemExtended(char
*nameSpace, char *name, Node *arg,
[00:28:28.236] | ~~~~~~^~~~

I gather the patch is still a WIP and ideally we would want to give
feedback on patches in CFs when the author's are looking for it but
this is the last week before feature freeze and the main focus is on
committable patches. I'll move it to next CF.



Re: Per-table storage parameters for TableAM/IndexAM extensions

От
Jacob Champion
Дата:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3495/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob