Re: ALTER TYPE 2: skip already-provable no-work rewrites

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: ALTER TYPE 2: skip already-provable no-work rewrites
Дата
Msg-id 20110211180808.GB30425@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: ALTER TYPE 2: skip already-provable no-work rewrites  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: ALTER TYPE 2: skip already-provable no-work rewrites  (Robert Haas <robertmhaas@gmail.com>)
Re: ALTER TYPE 2: skip already-provable no-work rewrites  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Robert,

On Fri, Feb 11, 2011 at 10:27:11AM -0500, Robert Haas wrote:
> On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch <noah@leadboat.com> wrote:
> > Done as attached. ?This preserves compatibility with our current handling of
> > composite type dependencies. ?The rest you've seen before.
> 
> OK, I finally got back to this.  Sorry for the delay.  I still feel
> that it's possible to accomplish the same goal with less complexity.
> See what you think of the attached.

It's a nice, clean patch.  However, it achieves that by targeting a smaller
goal.  Specifically, it omits:

1) Test cases and DEBUG messages sufficient to facilitate them
2) Skip rewrite for T -> constraint-free domain over T
3) Downgrade rewrite to scan for T -> constrained domain over T

> Upon examination, it appeared to me that trying to make the
> AlteredTableInfo contain an enum indicating whether we were doing a
> scan, a rewrite, or nothing was requiring more code change than I
> could really justify.

Offhand, I count 12 changed lines to introduce the enum.  There may be other
good reasons not to do it, but surely that wasn't it?

> What I ended up doing is replacing the 'bool
> new_changedoids' member with a 'bool rewrite' member.  I'm pretty sure
> this is safe.  The only reason we ever tested the new_changeoids
> member was to see whether we needed to do a rewrite; it wasn't used
> for anything else.  The new member gets set either when we need to do
> a rewrite, or when a column type changes in a fashion that requires a
> rewrite.  It's pretty easy to verify that this doesn't result in any
> behavior change, except for one corner case: currently, if you add or
> remove OIDs to/from a table and in the same ALTER TABLE command add a
> constraint that involves creating an index, such as a primary key, the
> new primary key index will get built, thrown away, and rebuilt a
> second time.  With this change, we just build it once, which seems
> like an improvement, even though the case is vanishingly unlikely to
> ever happen in practice.

This is fairly similar to the design in my first patch.  The name was different
(new_bits), and that patch had an extra bool for scan-only cases.  I won't
object to moving back to this, but I did find that your enum suggestion worked
out significantly better.

Even supposing we push off all scan-only cases to another patch, it would be
good to have the tablecmds.c-internal representation of that in mind.  No sense
in simplifying a 12-line change to an 8-line change, only to redo it next patch.

> I also have to say that after playing with a little bit, I find the
> new debugging messages you added to be quite invaluable for
> understanding what's really going on.  The old output told you
> basically nothing useful, even if you cranked it all the way up to
> DEBUG3.  This is much better.

Thanks.  I added them solely to make automated testing possible, but they did
turn out to have user value.  I'll certainly make use of them when rewriting an
inheritance tree of tables or a large table with many indexes.


Hunk-specific comments follow, largely concerning documentation.  I really,
really don't want to get mired in a discussion of exact documentation text.  In
fact I think I'd rather hunt crocodiles barefoot, armed with nothing but ALTER
TABLE ... DISABLE TRIGGER ALL than have such a discussion.  But I'll articulate
the rationale behind my original constructions, in case they include facts you
did not consider when rewriting them.

> diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
> index 9f02674..e3ceea8 100644
> --- a/doc/src/sgml/ref/alter_table.sgml
> +++ b/doc/src/sgml/ref/alter_table.sgml
> @@ -766,9 +766,13 @@ ALTER TABLE <replaceable class="PARAMETER">name</replaceable>
>     <para>
>      Adding a column with a non-null default or changing the type of an
>      existing column will require the entire table and indexes to be rewritten.
> -    This might take a significant amount of time for a large table; and it will
> -    temporarily require double the disk space.  Adding or removing a system
> -    <literal>oid</> column likewise requires rewriting the entire table.
> +    As an exception, if the old type and new types are binary compatible and

In the documentation for CREATE CAST, we define "binary compatible" using "Two
types that are binary coercible both ways are also referred to as binary
compatible."  This feature does not require binary compatibility, merely a
one-way binary coercion.  Generally, though, I like where you're going.  My
version was accurate but obtuse.

> +    the <literal>USING</> clause does not change the column contents, the

This is probably fine, but note that "USING col || ''" does not change the
column contents, yet we won't optimize it.

> +    table rewrite will be skipped, but the index rebuild is still required.

This wrongly suggests that the rebuild mentioned earlier in the paragraph,
affecting all indexes, will take place.  It's a more-limited rebuild.

> +    Adding or removing a system <literal>oid</> column likewise requires
> +    rewriting the entire table.  Table and/or index rebuilds may take a
> +    significant amount of time for a large table; and will temporarily require
> +    as much as double the disk space.
>     </para>
>  
>     <para>
> @@ -797,9 +801,9 @@ ALTER TABLE <replaceable class="PARAMETER">name</replaceable>
>     <para>
>      To force an immediate rewrite of the table, you can use
>      <link linkend="SQL-VACUUM">VACUUM FULL</>, <xref linkend="SQL-CLUSTER">
> -    or one of the forms of ALTER TABLE that forces a rewrite, such as
> -    SET DATA TYPE.  This results in no semantically-visible change in the
> -    table, but gets rid of no-longer-useful data.
> +    or one of the forms of ALTER TABLE that forces a rewrite.  This results in
> +    no semantically-visible change in the table, but gets rid of
> +    no-longer-useful data.

This presents three options without any indication of how to choose between
them.  A user wanting just a rewrite should look no further than VACUUM FULL.
We should document that ALTER TABLE can rewrite, both for general user planning
and to dispel thoughts of doing both a VACUUM FULL and an ALTER TABLE in short
succession.  However, it doesn't help to advertise ALTER TABLE or CLUSTER as
rewrite tools per se.

Before 9.0, the situation was entirely different.  ALTER TABLE was far and away
*the best* rewrite tool.  That's why I specifically documented the fact that
pre-9.0 users should transition to VACUUM FULL.

>     </para>
>  
>     <para>
> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
> index 9e60121..452ced6 100644
> --- a/src/backend/catalog/index.c
> +++ b/src/backend/catalog/index.c
> @@ -1685,6 +1685,11 @@ index_build(Relation heapRelation,
>      procedure = indexRelation->rd_am->ambuild;
>      Assert(RegProcedureIsValid(procedure));
>  
> +    ereport(DEBUG1,
> +            (errmsg("building index \"%s\" on table \"%s\"",
> +                    RelationGetRelationName(indexRelation),
> +                    RelationGetRelationName(heapRelation))));
> +
>  
>      /*
>       * Switch to the table owner's userid, so that any index functions are run
>       * as that user.  Also lock down security-restricted operations and
> diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
> index e4f352c..6b83647 100644
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -71,6 +71,7 @@
>  #include "storage/smgr.h"
>  #include "utils/acl.h"
>  #include "utils/builtins.h"
> +#include "utils/datum.h"

This header is not needed in your version, is it?

>  #include "utils/fmgroids.h"
>  #include "utils/inval.h"
>  #include "utils/lsyscache.h"
> @@ -141,7 +142,7 @@ typedef struct AlteredTableInfo
>      List       *constraints;    /* List of NewConstraint */
>      List       *newvals;        /* List of NewColumnValue */
>      bool        new_notnull;    /* T if we added new NOT NULL constraints */
> -    bool        new_changeoids; /* T if we added/dropped the OID column */
> +    bool        rewrite;        /* T if we a rewrite is forced */
>      Oid            newTableSpace;    /* new tablespace; 0 means no change */
>      /* Objects to rebuild after completing ALTER TYPE operations */
>      List       *changedConstraintOids;    /* OIDs of constraints to rebuild */
> @@ -336,6 +337,7 @@ static void ATPrepAlterColumnType(List **wqueue,
>                        AlteredTableInfo *tab, Relation rel,
>                        bool recurse, bool recursing,
>                        AlterTableCmd *cmd, LOCKMODE lockmode);
> +static bool ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno);

Moving this from parse_coerce.c to tablecmds.c does make sense.

Thanks,
nm


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David E. Wheeler"
Дата:
Сообщение: Re: Careful PL/Perl Release Not Required
Следующее
От: "Erik Rijkers"
Дата:
Сообщение: Re: Range Types: << >> -|- ops vs empty range