Re: pg_dump is broken for partition tablespaces

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: pg_dump is broken for partition tablespaces
Дата
Msg-id 20190423222633.GA8364@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: pg_dump is broken for partition tablespaces  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pg_dump is broken for partition tablespaces  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
Thanks for taking the time to think through this.

On 2019-Apr-22, Robert Haas wrote:

> If we know what the feature is supposed to do, then it should be
> possible to look at each relevant piece of code and decides whether it
> implements the specification correctly or not.  But if we don't have a
> specification -- that is, we don't know precisely what the feature is
> supposed to do -- then we'll just end up whacking the behavior around
> trying to get some combination that makes sense, and the whole effort
> is probably doomed.

Clearly this is now the crux of the issue ... the specification was
unclear, as I cobbled it together without thinking about the overall
implications, taking pieces of advice from various posts and my own
ideas of how it should work.

I am trying to split things in a way that makes the most sense to offer
the best possible combination of functionality.  I think there are three
steps to getting this done:

1. for pg10 and pg11, make pg_dump work correctly.  I think having
pg_dump use CREATE TABLE PARTITION OF is not correct when the partition
has a mismatching column definition.  So my proposal is to back-patch
David's pg_dump patch to pg10 and pg11.  Thread:
https://postgr.es/m/20190423185007.GA27954@alvherre.pgsql
Note that this changes pg_dump behavior in back branches.

2. for pg12, try to keep as much of the tablespace inheritance
functionality as possible (more below on how this works), without
running into weird cases.  I think if we just forbid the case of
the tablespace being defined to the database tablespace, all the
weird cases disappear from the code.

3. For pg13, we can try to put back the functionality of the database
tablespace as default for a partition.  You suggest using the value 1,
and Tom suggests adding a new predefined row in pg_tablespace that has
the meaning of "whatever the default tablespace is for the current
database".  I have a slight preference towards having the additional
magical row(s).

Maybe people would like to see #3 put it pg12.  I don't oppose that, but
it seems to be new functionality development; RMT would have to exempt
this from feature freeze.

Robert wrote:

> - I don't have any idea how to salvage things in v11, where the
> catalog representation is irredeemably ambiguous.  We'd probably have
> to be satisfied with fairly goofy behavior in v11.

This behavior is in use only for indexes in pg11, and does not allow
setting the database tablespace anyway (it just gets set back to 0 if
you do that), so I don't think there's anything *too* goofy about it.
(But whatever we do, pg11 behavior for tables would differ from pg12;
and it would differ for indexes too, depending on how we handle the
default_tablespace thing.)

> - It's not enough to have a good catalog representation.  You also
> have to be able to recreate those catalog states.  I think you
> probably need a way to set the reltablespace to whatever value you
> need it to have without changing the tables under it.  Like ALTER
> TABLE ONLY blah {TABLESPACE some_tablespace_name | NO TABLESPACE} or
> some such thing.  That exact idea may be wrong, but I think we are not
> going to have much luck getting to a happy place without something of
> this sort.

David proposed ALTER TABLE .. TABLESPACE DEFAULT as a way to put back
the database tablespace behavior (and that's implemented in my previous
patch), which seems good to me.  I'm not sure I like "NO TABLESPACE"
better than DEFAULT, but if others do, I'm not really wedded to it.


Now I discuss how tablespace inheritance works.  To define the
tablespace of a regular table, there's a simple algorithm:

 a1. if there's a TABLESPACE clause, use that.
 b1. otherwise, if there's a default_tablespace, use that.
 c1. otherwise, use the database tablespace.
 d1. if we end up with the database tablespace, overwrite with 0.

If creating a partition, there is the additional rule that parent's
tablespace overrides default_tablespace:

 a2. if there's a TABLESPACE clause, use that.
 b2. otherwise, if the parent has a tablespace, use that.
 c2. otherwise, if there's a default_tablespace, use that.
 d2. otherwise, use the database tablespace.
 e2. if we end up with the database tablespace, overwrite with 0.

Finally, there's the question of what defines the parent's tablespace.
I think the best algorithm to determine the tablespace when creating a
partitioned relation is the same as for partitions:

 a3. if there's a TABLESPACE clause, use that.
 b3. otherwise, if the parent has a tablespace, use that.  (We have this
      because partitions might be partitioned.)
 c3. otherwise, if there's default_tablespace, use that.
 d3. otherwise, use the database tablespace.
 e3. if we end up with the database tablespace, overwrite with 0.

Tracing through the sets of rules, they are all alike -- the only
difference is that regular tables lack the rule about parent's
tablespace.

Some notes on this:

1. We have rule c3, that is, we still honor default_tablespace when
   creating partitioned tables.  Andres said, upthread, that we
   shouldn't do that -- if the tablespace isn't *explicitly* mentioned,
   he said, the partitions have no business landing on that tablespace.
   https://postgr.es/m/20190306223741.lolaaimhkkp4kict@alap3.anarazel.de
   However I think it would be more inconsistent to not have that than
   to have it, for the case when you create a partitioned partition.
   I tried to implement things the way he suggests and ended up with
   more warts.
   Would it be surprising for users that all subsequent partitions are
   created in that partition?  Perhaps, but in Tom's words, "there are
   many things [in this area] that are surprising"; we should be sure to
   display the situation clearly, and if they don't like it, they can
   simply do ALTER TABLE SET TABLESPACE and things are good again.

   default_tablespace does not seem to be a particularly popular
   feature; I suspect pg_dump is by far its biggest user, and we make it
   not fall afoul of this problem with patch #1, so it never creates any
   partitions anyway.

2. we have rule b2 ahead of c2; however Tom said users are unlikely to
   remember which rule comes first.  I don't care much for that
   argument; surely they can just read the docs.  This is not the first
   not-completely-intuitive thing for a DBA, so.

3. If you really want a partitioned rel to end up in the database
   tablespace overriding default_tablespace, you have two options:
   i) reset default_tablespace ahead of time (SET LOCAL), or
   ii) use ALTER TABLE ... TABLESPACE afterwards.
   Maybe we should have a iii) "CREATE TABLE ... TABLESPACE DEFAULT"
   clause that does it, but I'm not sure it's necessary.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: TM format can mix encodings in to_char()
Следующее
От: David Rowley
Дата:
Сообщение: Re: Why does ExecComputeStoredGenerated() form a heap tuple