Обсуждение: Inheriting table AMs for partitioned tables
Hi, In the pluggable storage patch [1], one thing that I'm wondering about is how exactly to inherit the storage AM across partitions. I think that's potentially worthy of a discussion with a wider audience than I'd get in that thread. It seems also related to the recent discussion in [2] Consider (excerpted from the tests): CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; SET default_table_access_method = 'heap'; CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); SET default_table_access_method = 'heap2'; CREATE TABLE tableam_parted_b_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('b'); CREATE TABLE tableam_parted_c_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('c') USING heap; CREATE TABLE tableam_parted_d_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('d') USING heap2; It seems pretty clear that tableam_parted_heap2, tableam_parted_d_heap2 would be stored via heap2, and tableam_parted_c_heap2 via heap. But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't quite as clear. I think it'd both be sensible for new partitions to inherit the AM from the root, but it'd also be sensible to use the current default. Out of laziness (it's how it works rn) I'm inclined to to go with using the current default, but I'd be curious if others disagree. Greetings, Andres Freund [1] https://postgr.es/m/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de [2] https://www.postgresql.org/message-id/201902041630.gpadougzab7v%40alvherre.pgsql
On Tue, Mar 5, 2019 at 5:17 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > In the pluggable storage patch [1], one thing that I'm wondering about > is how exactly to inherit the storage AM across partitions. I think > that's potentially worthy of a discussion with a wider audience than I'd > get in that thread. It seems also related to the recent discussion in [2] > > Consider (excerpted from the tests): > > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; > > SET default_table_access_method = 'heap'; > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); > > SET default_table_access_method = 'heap2'; > CREATE TABLE tableam_parted_b_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('b'); > > CREATE TABLE tableam_parted_c_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('c') USING heap; > CREATE TABLE tableam_parted_d_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('d') USING heap2; > > It seems pretty clear that tableam_parted_heap2, tableam_parted_d_heap2 > would be stored via heap2, and tableam_parted_c_heap2 via heap. > > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > quite as clear. I think it'd both be sensible for new partitions to > inherit the AM from the root, but it'd also be sensible to use the > current default. > Yeah, we can go either way. > Out of laziness (it's how it works rn) I'm inclined to to go with using > the current default, but I'd be curious if others disagree. > I think using the current default should be okay as that will be the behavior for non-partitioned tables as well. However, if people have good reasons to go other way, then that is fine too. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2019/03/05 8:47, Andres Freund wrote: > Hi, > > In the pluggable storage patch [1], one thing that I'm wondering about > is how exactly to inherit the storage AM across partitions. I think > that's potentially worthy of a discussion with a wider audience than I'd > get in that thread. It seems also related to the recent discussion in [2] > > Consider (excerpted from the tests): > > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; > > SET default_table_access_method = 'heap'; > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); > > SET default_table_access_method = 'heap2'; > CREATE TABLE tableam_parted_b_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('b'); > > CREATE TABLE tableam_parted_c_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('c') USING heap; > CREATE TABLE tableam_parted_d_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('d') USING heap2; > > It seems pretty clear that tableam_parted_heap2, tableam_parted_d_heap2 > would be stored via heap2, and tableam_parted_c_heap2 via heap. > > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > quite as clear. I think it'd both be sensible for new partitions to > inherit the AM from the root, but it'd also be sensible to use the > current default. Given that many people expected this behavior to be the sane one in other cases that came up, +1 to go this way. Thanks, Amit
On Tue, 5 Mar 2019 at 12:47, Andres Freund <andres@anarazel.de> wrote: > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; > > SET default_table_access_method = 'heap'; > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > quite as clear. I think it'd both be sensible for new partitions to > inherit the AM from the root, but it'd also be sensible to use the > current default. I'd suggest it's made to work the same way as ca4103025dfe26 made tablespaces work. i.e. if they specify the storage type when creating the partition, then always use that, unless they mention otherwise. If nothing was mentioned when they created the partition, then use default_table_access_method. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, 5 Mar 2019 at 16:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > I'd suggest it's made to work the same way as ca4103025dfe26 made > tablespaces work. i.e. if they specify the storage type when creating > the partition, then always use that, unless they mention otherwise. If > nothing was mentioned when they created the partition, then use > default_table_access_method. "when creating the partition" should read "when creating the partitioned table" -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2019/03/05 11:59, Amit Langote wrote: > On 2019/03/05 8:47, Andres Freund wrote: >> Hi, >> >> In the pluggable storage patch [1], one thing that I'm wondering about >> is how exactly to inherit the storage AM across partitions. I think >> that's potentially worthy of a discussion with a wider audience than I'd >> get in that thread. It seems also related to the recent discussion in [2] >> >> Consider (excerpted from the tests): >> >> CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; >> >> SET default_table_access_method = 'heap'; >> CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); >> >> SET default_table_access_method = 'heap2'; >> CREATE TABLE tableam_parted_b_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('b'); >> >> CREATE TABLE tableam_parted_c_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('c') USING heap; >> CREATE TABLE tableam_parted_d_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('d') USING heap2; >> >> It seems pretty clear that tableam_parted_heap2, tableam_parted_d_heap2 >> would be stored via heap2, and tableam_parted_c_heap2 via heap. >> >> But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't >> quite as clear. I think it'd both be sensible for new partitions to >> inherit the AM from the root, but it'd also be sensible to use the >> current default. > > Given that many people expected this behavior to be the sane one in other > cases that came up, +1 to go this way. Reading my own reply again, it may not be clear what I was +1'ing. I meant to vote for the behavior that David described in his reply. Thanks, Amit
On Tue, Mar 5, 2019 at 10:47 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
In the pluggable storage patch [1], one thing that I'm wondering about
is how exactly to inherit the storage AM across partitions. I think
that's potentially worthy of a discussion with a wider audience than I'd
get in that thread. It seems also related to the recent discussion in [2]
Consider (excerpted from the tests):
CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2;
SET default_table_access_method = 'heap';
CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a');
SET default_table_access_method = 'heap2';
CREATE TABLE tableam_parted_b_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('b');
CREATE TABLE tableam_parted_c_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('c') USING heap;
CREATE TABLE tableam_parted_d_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('d') USING heap2;
It seems pretty clear that tableam_parted_heap2, tableam_parted_d_heap2
would be stored via heap2, and tableam_parted_c_heap2 via heap.
But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't
quite as clear. I think it'd both be sensible for new partitions to
inherit the AM from the root, but it'd also be sensible to use the
current default.
Out of laziness (it's how it works rn) I'm inclined to to go with using
the current default, but I'd be curious if others disagree.
As other said that, I also agree to go with default_table_access_method to be
preferred if not explicitly specified the access method during the table creation.
access method of a table later once it is created, currently there is no option.
currently there are no other alternative table access methods that are available
for the user to switch, but definitely it may be required later.
I will provide a patch to alter the access method of a table for v13.
Regards,
Haribabu Kommi
Fujitsu Australia
Hi, On 2019-03-05 16:01:50 +1300, David Rowley wrote: > On Tue, 5 Mar 2019 at 12:47, Andres Freund <andres@anarazel.de> wrote: > > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; > > > > SET default_table_access_method = 'heap'; > > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); > > > > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > > quite as clear. I think it'd both be sensible for new partitions to > > inherit the AM from the root, but it'd also be sensible to use the > > current default. > > I'd suggest it's made to work the same way as ca4103025dfe26 made > tablespaces work. Hm, is that actually correct? Because as far as I can tell that doesn't have the necessary pg_dump code to make this behaviour persistent: CREATE TABLESPACE frak LOCATION '/tmp/frak'; CREATE TABLE test_tablespace (a text, b int) PARTITION BY list (a) TABLESPACE frak ; CREATE TABLE test_tablespace_1 PARTITION OF test_tablespace FOR VALUES in ('a'); CREATE TABLE test_tablespace_2 PARTITION OF test_tablespace FOR VALUES in ('b') TABLESPACE pg_default; CREATE TABLE test_tablespace_3 PARTITION OF test_tablespace FOR VALUES in ('c') TABLESPACE frak; SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE 'test_tablespace%' ORDER BY 1; ┌───────────────────┬─────────┬───────────────┐ │ relname │ relkind │ reltablespace │ ├───────────────────┼─────────┼───────────────┤ │ test_tablespace │ p │ 16384 │ │ test_tablespace_1 │ r │ 16384 │ │ test_tablespace_2 │ r │ 0 │ │ test_tablespace_3 │ r │ 16384 │ └───────────────────┴─────────┴───────────────┘ (4 rows) but a dump outputs (abbreviated) SET default_tablespace = frak; CREATE TABLE public.test_tablespace ( a text, b integer ) PARTITION BY LIST (a); CREATE TABLE public.test_tablespace_1 PARTITION OF public.test_tablespace FOR VALUES IN ('a'); SET default_tablespace = ''; CREATE TABLE public.test_tablespace_2 PARTITION OF public.test_tablespace FOR VALUES IN ('b'); SET default_tablespace = frak; CREATE TABLE public.test_tablespace_3 PARTITION OF public.test_tablespace FOR VALUES IN ('c'); which restores to: postgres[32125][1]=# SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE 'test_tablespace%' ORDER BY1; ┌───────────────────┬─────────┬───────────────┐ │ relname │ relkind │ reltablespace │ ├───────────────────┼─────────┼───────────────┤ │ test_tablespace │ p │ 16384 │ │ test_tablespace_1 │ r │ 16384 │ │ test_tablespace_2 │ r │ 16384 │ │ test_tablespace_3 │ r │ 16384 │ └───────────────────┴─────────┴───────────────┘ (4 rows) because public.test_tablespace_2 assumes it's ought to inherit the tablespace from the partitioned table. I also find it far from clear that: <listitem> <para> The <replaceable class="parameter">tablespace_name</replaceable> is the name of the tablespace in which the new table is to be created. If not specified, <xref linkend="guc-default-tablespace"/> is consulted, or <xref linkend="guc-temp-tablespaces"/> if the table is temporary. For partitioned tables, since no storage is required for the table itself, the tablespace specified here only serves to mark the default tablespace for any newly created partitions when no other tablespace is explicitly specified. </para> </listitem> is handled correctly. The above says that the *specified* tablespaces - which seems to exclude the default tablespace - is what's going to determine what partitions use as their default tablespace. But in fact that's not true, the partitioned table's pg_class.retablespace is set to what default_tablespaces was at the time of the creation. > i.e. if they specify the storage type when creating > the partition, then always use that, unless they mention otherwise. If > nothing was mentioned when they created the partition, then use > default_table_access_method. Hm. That'd be doable, but given the above ambiguities I'm not convinced that's the best approach. As far as I can see that'd require: 1) At relation creation, for partitioned tables only, do not take default_table_access_method into account. 2) At partition creation, if the AM is not specified and if the partitioned table's relam is 0, use the default_table_access_method. 3) At pg_dump, for partitioned tables only, explicitly emit a USING ... rather than use the method of manipulating default_table_access_method. As far as I can tell, the necessary steps are also what'd need to be done to actually implement the described behaviour for TABLESPACE (with s/default_table_access_method/default_tablespace/ and s/USING/TABLESPACE of course). Greetings, Andres Freund
On 2019-03-04 22:08:04 -0800, Andres Freund wrote: > Hi, > > On 2019-03-05 16:01:50 +1300, David Rowley wrote: > > On Tue, 5 Mar 2019 at 12:47, Andres Freund <andres@anarazel.de> wrote: > > > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) USING heap2; > > > > > > SET default_table_access_method = 'heap'; > > > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR VALUES IN ('a'); > > > > > > > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > > > quite as clear. I think it'd both be sensible for new partitions to > > > inherit the AM from the root, but it'd also be sensible to use the > > > current default. > > > > I'd suggest it's made to work the same way as ca4103025dfe26 made > > tablespaces work. > > Hm, is that actually correct? Because as far as I can tell that doesn't > have the necessary pg_dump code to make this behaviour persistent: > > CREATE TABLESPACE frak LOCATION '/tmp/frak'; > CREATE TABLE test_tablespace (a text, b int) PARTITION BY list (a) TABLESPACE frak ; > CREATE TABLE test_tablespace_1 PARTITION OF test_tablespace FOR VALUES in ('a'); > CREATE TABLE test_tablespace_2 PARTITION OF test_tablespace FOR VALUES in ('b') TABLESPACE pg_default; > CREATE TABLE test_tablespace_3 PARTITION OF test_tablespace FOR VALUES in ('c') TABLESPACE frak; > > SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE 'test_tablespace%' ORDER BY 1; > ┌───────────────────┬─────────┬───────────────┐ > │ relname │ relkind │ reltablespace │ > ├───────────────────┼─────────┼───────────────┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 0 │ > │ test_tablespace_3 │ r │ 16384 │ > └───────────────────┴─────────┴───────────────┘ > (4 rows) > > but a dump outputs (abbreviated) > > SET default_tablespace = frak; > CREATE TABLE public.test_tablespace ( > a text, > b integer > ) > PARTITION BY LIST (a); > CREATE TABLE public.test_tablespace_1 PARTITION OF public.test_tablespace > FOR VALUES IN ('a'); > SET default_tablespace = ''; > CREATE TABLE public.test_tablespace_2 PARTITION OF public.test_tablespace > FOR VALUES IN ('b'); > SET default_tablespace = frak; > CREATE TABLE public.test_tablespace_3 PARTITION OF public.test_tablespace > FOR VALUES IN ('c'); > > which restores to: > > postgres[32125][1]=# SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE 'test_tablespace%' ORDER BY1; > ┌───────────────────┬─────────┬───────────────┐ > │ relname │ relkind │ reltablespace │ > ├───────────────────┼─────────┼───────────────┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 16384 │ > │ test_tablespace_3 │ r │ 16384 │ > └───────────────────┴─────────┴───────────────┘ > (4 rows) > > because public.test_tablespace_2 assumes it's ought to inherit the > tablespace from the partitioned table. > > > I also find it far from clear that: > <listitem> > <para> > The <replaceable class="parameter">tablespace_name</replaceable> is the name > of the tablespace in which the new table is to be created. > If not specified, > <xref linkend="guc-default-tablespace"/> is consulted, or > <xref linkend="guc-temp-tablespaces"/> if the table is temporary. For > partitioned tables, since no storage is required for the table itself, > the tablespace specified here only serves to mark the default tablespace > for any newly created partitions when no other tablespace is explicitly > specified. > </para> > </listitem> > is handled correctly. The above says that the *specified* tablespaces - > which seems to exclude the default tablespace - is what's going to > determine what partitions use as their default tablespace. But in fact > that's not true, the partitioned table's pg_class.retablespace is set to > what default_tablespaces was at the time of the creation. > > > > i.e. if they specify the storage type when creating > > the partition, then always use that, unless they mention otherwise. If > > nothing was mentioned when they created the partition, then use > > default_table_access_method. > > Hm. That'd be doable, but given the above ambiguities I'm not convinced > that's the best approach. As far as I can see that'd require: > > 1) At relation creation, for partitioned tables only, do not take > default_table_access_method into account. > > 2) At partition creation, if the AM is not specified and if the > partitioned table's relam is 0, use the default_table_access_method. > > 3) At pg_dump, for partitioned tables only, explicitly emit a USING > ... rather than use the method of manipulating default_table_access_method. > > As far as I can tell, the necessary steps are also what'd need to be > done to actually implement the described behaviour for TABLESPACE (with > s/default_table_access_method/default_tablespace/ and s/USING/TABLESPACE > of course). Based on this mail I'm currently planning to simply forbid specifying USING for partitioned tables. Then we can argue about this later. Greetings, Andres Freund
On Tue, Mar 5, 2019 at 12:59 PM Andres Freund <andres@anarazel.de> wrote: > Based on this mail I'm currently planning to simply forbid specifying > USING for partitioned tables. Then we can argue about this later. +1. I actually think that might be the right thing in the long-term, but it undeniably avoids committing to any particular decision in the short term, which seems good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 5 Mar 2019 at 19:08, Andres Freund <andres@anarazel.de> wrote: > > On 2019-03-05 16:01:50 +1300, David Rowley wrote: > > I'd suggest it's made to work the same way as ca4103025dfe26 made > > tablespaces work. > > Hm, is that actually correct? Because as far as I can tell that doesn't > have the necessary pg_dump code to make this behaviour persistent: > > CREATE TABLESPACE frak LOCATION '/tmp/frak'; > CREATE TABLE test_tablespace (a text, b int) PARTITION BY list (a) TABLESPACE frak ; > CREATE TABLE test_tablespace_1 PARTITION OF test_tablespace FOR VALUES in ('a'); > CREATE TABLE test_tablespace_2 PARTITION OF test_tablespace FOR VALUES in ('b') TABLESPACE pg_default; > CREATE TABLE test_tablespace_3 PARTITION OF test_tablespace FOR VALUES in ('c') TABLESPACE frak; > > SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE 'test_tablespace%' ORDER BY 1; > ┌───────────────────┬─────────┬───────────────┐ > │ relname │ relkind │ reltablespace │ > ├───────────────────┼─────────┼───────────────┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 0 │ > │ test_tablespace_3 │ r │ 16384 │ > └───────────────────┴─────────┴───────────────┘ [pg_dump/pg_restore] > ┌───────────────────┬─────────┬───────────────┐ > │ relname │ relkind │ reltablespace │ > ├───────────────────┼─────────┼───────────────┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 16384 │ > │ test_tablespace_3 │ r │ 16384 │ > └───────────────────┴─────────┴───────────────┘ frak... that's a bit busted. I can't instantly think of a fix, but I see the same problem does not seem to exist for partition indexes, so that's a relief since that's already in PG11. I'll take this up on another thread once I have something good to report. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, 6 Mar 2019 at 07:19, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Mar 5, 2019 at 12:59 PM Andres Freund <andres@anarazel.de> wrote: > > Based on this mail I'm currently planning to simply forbid specifying > > USING for partitioned tables. Then we can argue about this later. > > +1. I actually think that might be the right thing in the long-term, > but it undeniably avoids committing to any particular decision in the > short term, which seems good. I've not really been following the storage am patch, but given that a partition's TABLESPACE is inherited from its partitioned table, I'd find it pretty surprising that USING wouldn't do the same. They're both storage options, so I think having them behave differently is going to cause some confusion. I think the patch I just submitted to [1] should make it pretty easy to make this work the same as TABLESPACE does. [1] https://www.postgresql.org/message-id/CAKJS1f_iyBpAuYBPQv_GGeME%3Dg9Rpr8yWjCaYV4E685yQ1uzkw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services