Обсуждение: CLUSTER, REINDEX and VACUUM on batch ops

Поиск
Список
Период
Сортировка

CLUSTER, REINDEX and VACUUM on batch ops

От
François Beausoleil
Дата:
Hi all!

I have a partitioned table with millions of rows per weekly partition. I am adding new fields, with null values and no
defaultvalues to ensure I had a reasonable migration time. All downstream code knows how to work with null fields. 

Presently, I'm migrating each partition individually to add NOT NULL, set a default value and update the table to have
correctvalues. Essentially, I'm doing this: 

ALTER TABLE parent ADD COLUMN new_field int; -- adds the field to all child tables - runs quickly

-- the bulk of the data transfer
for each partition in partitions:
  BEGIN;
  UPDATE partition SET new_field = 0;
  ALTER TABLE partition
      ALTER COLUMN new_field SET NOT NULL
    , ALTER COLUMN new_field SET DEFAULT 0;
  COMMIT;

  CLUSTER partition USING partition_pkey;
  REINDEX TABLE partition;
  VACUUM ANALYZE partition;
done

After I've clustered the table, must I reindex and vacuum as well? It is unclear to me if clustering a table reindexes
ornot: the docs at http://www.postgresql.org/docs/current/static/sql-cluster.html are silent on the matter, but do
mentionthat an ANALYZE is in order. 

Thanks!
François Beausoleil
Вложения

Re: CLUSTER, REINDEX and VACUUM on batch ops

От
Sergey Konoplev
Дата:
On Wed, Apr 24, 2013 at 8:49 AM, François Beausoleil
<francois@teksol.info> wrote:
> Presently, I'm migrating each partition individually to add NOT NULL, set a default value and update the table to
havecorrect values. Essentially, I'm doing this: 
>
> ALTER TABLE parent ADD COLUMN new_field int; -- adds the field to all child tables - runs quickly
>
> -- the bulk of the data transfer
> for each partition in partitions:
>   BEGIN;
>   UPDATE partition SET new_field = 0;
>   ALTER TABLE partition
>       ALTER COLUMN new_field SET NOT NULL
>     , ALTER COLUMN new_field SET DEFAULT 0;
>   COMMIT;
>
>   CLUSTER partition USING partition_pkey;
>   REINDEX TABLE partition;
>   VACUUM ANALYZE partition;
> done
>
> After I've clustered the table, must I reindex and vacuum as well? It is unclear to me if clustering a table
reindexesor not: the docs at http://www.postgresql.org/docs/current/static/sql-cluster.html are silent on the matter,
butdo mention that an ANALYZE is in order. 

CLUSTER does full table rewrite including all its indexes so REINDEX
is not required after it. It is mentioned in the docs implicitly:

<<
When an index scan is used, a temporary copy of the table is created
that contains the table data in the index order. Temporary copies of
each index on the table are created as well. Therefore, you need free
space on disk at least equal to the sum of the table size and the
index sizes.

When a sequential scan and sort is used, a temporary sort file is also
created, so that the peak temporary space requirement is as much as
double the table size, plus the index sizes.
>>

BTW, you do not need to do the ALTERs for each partition. What you
need to do is:

1. add the column to the parent,
2. set the default constraint on the column of the parent,
3. update the column in partitions to the value,
4. set the not null constraint on the parent.

It will be better from the point of view of inheritance as the new
column will be fully inherited from the parent rather then partially
overloaded in partitions.

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com