Обсуждение: Adding a non-null column without noticeable downtime
Hi all, I'm sure this has been answered somewhere, but I was not able to find anything in the list archives. I'm conceptually trying to do ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; without taking any noticeable downtime. I know I can divide the query up like so: ALTER TABLE "foo" ADD COLUMN "bar" boolean; UPDATE foo SET bar = False; -- Done in batches ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; The first 3 queries shouldn't impact other concurrent queries on the system. My question is about the sequential scan that occurs when setting the column NOT NULL. Will that sequential scan block other inserts or selects on the table? If so, can it be sped up by using an index (which would be created concurrently)? Thanks, Zev
To be clear, this is with PostgreSQL 9.1. Also, if there is some other way of doing this, I'd be interested in other methodologies as well. Zev On 02/24/2014 10:41 PM, Zev Benjamin wrote: > Hi all, > > I'm sure this has been answered somewhere, but I was not able to find > anything in the list archives. > > I'm conceptually trying to do > ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; > > without taking any noticeable downtime. I know I can divide the query > up like so: > > ALTER TABLE "foo" ADD COLUMN "bar" boolean; > UPDATE foo SET bar = False; -- Done in batches > ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; > > The first 3 queries shouldn't impact other concurrent queries on the > system. My question is about the sequential scan that occurs when > setting the column NOT NULL. Will that sequential scan block other > inserts or selects on the table? If so, can it be sped up by using an > index (which would be created concurrently)? > > > Thanks, > Zev > >
I think index should help.
Why don't you try it out and check the explain plan of it?
If you are planning to break it down as below:
1. ALTER TABLE "foo" ADD COLUMN "bar" boolean;
2. UPDATE foo SET bar = False; -- Done in batches
3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
2. UPDATE foo SET bar = False; -- Done in batches
3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
I would suggest on interchanging operation 2 and 3 in sequence
EXPLAIN does not appear to work on ALTER TABLE statements:
=> EXPLAIN ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
ERROR: syntax error at or near "ALTER"
LINE 1: explain ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
^
Zev
On 02/25/2014 01:56 PM, Sameer Kumar wrote:
> I think index should help.
> Why don't you try it out and check the explain plan of it?
> If you are planning to break it down as below:
> 1. ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> 2. UPDATE foo SET bar = False; -- Done in batches
> 3. ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> 4. ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
>
>
> I would suggest on interchanging operation 2 and 3 in sequence
On Mon, Feb 24, 2014 at 7:41 PM, Zev Benjamin
<zev-pgsql@strangersgate.com> wrote:
[...]
> ALTER TABLE "foo" ADD COLUMN "bar" boolean;
> UPDATE foo SET bar = False; -- Done in batches
> ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False;
> ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL;
You should set default before performing updates, otherwise new rows
will be with nulls in this column. The template sniplet for your case
is below.
ALTER TABLE foo ADD bar boolean;
ALTER TABLE foo ALTER bar SET DEFAULT false;
CREATE INDEX CONCURRENTLY foo_migration_tmp
ON foo (id) WHERE bar IS NULL;
/*
PSQL=/usr/local/bin/psql
total_updated=0
updated=1
time (
while [ $updated -gt 0 ]; do
updated=$(($PSQL -X Game2 <<EOF
UPDATE foo SET bar = false
WHERE id IN (
SELECT id FROM foo
WHERE bar IS NULL LIMIT 100);
EOF
) | cut -d ' ' -f 2)
(( total_updated+=updated ))
echo -ne "\r$total_updated"
done
) 2>&1
*/
DROP INDEX foo_migration_tmp;
ANALYZE foo;
ALTER TABLE foo ALTER bar SET NOT NULL;
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA
http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com
On 02/25/2014 04:41 AM, Zev Benjamin wrote: > I'm conceptually trying to do > ALTER TABLE "foo" ADD COLUMN "bar" boolean NOT NULL DEFAULT False; > > without taking any noticeable downtime. I know I can divide the query > up like so: > > ALTER TABLE "foo" ADD COLUMN "bar" boolean; > UPDATE foo SET bar = False; -- Done in batches > ALTER TABLE "foo" ALTER COLUMN "bar" SET DEFAULT False; > ALTER TABLE "foo" ALTER COLUMN "bar" SET NOT NULL; You need to set the default before doing the update. Also, make sure the update is in its own transaction. > The first 3 queries shouldn't impact other concurrent queries on the > system. My question is about the sequential scan that occurs when > setting the column NOT NULL. Will that sequential scan block other > inserts or selects on the table? Yes, because ALTER TABLE will have taken an AccessExclusiveLock. > If so, can it be sped up by using an index (which would be created > concurrently)? Unfortunately not. -- Vik